From audiobook to audio novel with SubPlease

Visual Novels, in the past few years, have increasingly become a very popular way to immerse in Japanese. It is arguably an excellent medium as it combines audio and text, in a much higher density than anime and with often “richer” vocabulary (口語 vs 文語) by virtue of being a literary medium, which asks for more description than anime. All the while featuring supporting artwork helping the reader imagine the scenes (a big plus for those of us with aphantasia).

However, people can also feel alienated by the medium and refuse to use it, be it because of prejudges, because of the often sexual nature of them, not finding a title that interests them or many other reasons.

The compromise I offer you today, in no small part thanks to the excellent work of KanjiEater with SubPlease will bring you a lot of the benefits of visual novels without many of the drawbacks and qualms people might have with them. The big plus side is that nowadays, a lot of popular anime draw their source material from Light Novels, and often, you’ll find that the narrator of the audiobook is one of the voice actors from the anime (The highest budget ones sometimes even have an ensemble cast, with different actors voicing multiple chracters, or use of SFX)! And of course, it goes without saying, but this method works for more traditional novels, your コンビニ人間s, your こころs or your 同志少女よ、敵を撃てs, there’s even some translated english books that got Japanese audiobooks counterparts. It works for anything that the market decided should have an audiobook (which, sometimes might surprise you in one way or another, it’s possible that that one specific very niche anime you like has an audiobook for its novels, while something like Re:Zero, being one of the biggest sellers of a generation, does not).

Concretely, SubPlease automagically creates very accurate subtitles of a novel to its audiobook. It creates a .srt file that you can use with the audio file in a video player, much like subbed anime/movies. As it stands, most people use mpv and keyboard shortcuts (to go from one sub to the other) as well as extensions to use a clipboard page with it for lookups. For all intents and purposes it’s probably good enough to read as is.

Unfortunately, I’m an idiot, I’m a perfectionist, my eureka moments bring out curses to this world. And as you guessed from the title of the article, and the foreshadowing in the first paragraph, we’re going to use the battle-tested Ren’py engine instead. This gives us a few nice features easily:

An user interface that looks much more like something along the lines of a normal epub reader like ッツ or Readwok
Furigana rendering
Personalization (font, colors, background, etc)
(TODO Test this more) vertical text
Save/Load features
Easy back and forth/skip features without tweaking around config files (just use the mouse wheel)
Out of the box websocket/clipboard copying for texthooker pages
It’s not as thoroughly tested so far, but the whole setup should generally work with languages other than japanese as well.

So, what does it look like in practice?

Cool huh? It’s become my prefered way to read books, the line pauses at the end, so it leaves you time to look stuff up, forces a slower pace on you (by following the speech) if you’re prone to whitenoising, saves you time second guessing yourself and yomitan-ing a word just for its reading, gives you non-verbal cues about what’s going on (emotion in the voice, knowing which character is speaking) and it breaks the monotony of scrolling while reading a book or just listening to the audiobook on autoplay. It can be considered a crutch in some effects but it’s not really detrimental to learning in the long run. That said, because there’s no visuals here, I’ve come to call them Audio Novels (ANs) instead. Please help me coin the term if you like this setup !

From Audiobook to Subtitle

Our first step will be to turn the audiobook into subtitles. The good news is that if you’re not interested by the whole audio novel shtick, you can stop after this step and just use this blog post as a handy step-by-step guide for SubPlease.

We will be using google colab to create the subtitle files here. It’s possible to create them locally but you’d need a beefy Nvidia GPU as well as a development environment, so the setup is way more cumbersome. Google colab meanwhile lends us one of their GPUs for free for a couple hours at a time, with an already set up environment, so it’s much easier to use and explain. You can use both this page and the github documentation to guide you into running it offline however if you so wish.

I also recommend reading each sub-chapter before starting the steps (not only for this part, but for all of the guide), it could clear confusion if a question in an earlier step is answered in a later step.

Preparing the .txt file

From Calibre, right click the epub(s) and convert (or bulk convert), from epub to txt
In search and replace, make it so
- <rt>(.*?)<\/rt> is replaced to nothing
- 《(.+?)》 is replaced to nothing as well
This’ll make it so furigana isn’t put in the .txt file, to make audiobooktextsync more accurate and render less ugly subtitles
If done properly, it should look like this:

Note: you can save this configuration to a file, and load it later, to make it faster in the future.

Open the text file (and view the epub in calibre), and make sure there is indeed no furigana, I do this once per series, because it’s very unlikely that if it works for vol1 it won’t for the suceeding volumes.
Remove the bloat at the beginning and end of the text file, typically at the top, I keep the title and first chapter name, removing the ToC and publisher info. At the bottom, I delete everything after the あとがき (I also delete it if it’s not present in the audiobook, sometimes it isn’t, sometimes it is).

Uploading to Google Drive

This should be mostly intuitive, you just need to make sure of a few things:

The audiobook file has to be in .m4b format, normally that’s almost always the case so it should be fine (especially from audible). But some stores give you a zip with each chapter as its own audiofile so it needs a bit more tweaking, but that’s out of the scope of this post, sorry.
The folder name, audiobook name and script file name should be the same. Personally, I just name them 5, 6, 7 and so on, according to their volume number, or a very short name, in latin characters, like “yamai”. It avoids one point of failure, simplest is easiest and prevents mistakes and errors.

So, you essentially want something like this:

folder1/
       folder1.txt
       folder1.m4b
folder2/
       folder2.txt
       folder2.m4b

TODO test calling them just `script.txt` or `anything.txt`, in theory, it should work anyway

Google collab time!

Get to the notebook here
The top left should show something about not being able to edit, click it and save a copy in drive. TODO screenshot for this (just link)
Change the type of the connection to T4 (using GPU) if it’s not showing that already (it normally should, if it is, just click connect and you should have the same output as the last screenshot)

If it changed correctly, you should be seeing this:

Edit the last line of code to match the paths with the path of your files, for instance "/content/drive/MyDrive/p3v4". Note that the root of google drive is always in /content/drive/MyDrive/, then it’s up to you to match the file structure. You can run multiple volumes in a row (be wary of how long it might take however, the colab has a time limit before it kicks you out, aim for ~30hrs of audiobooks combined at once maximum roughly) by putting paths one after the other. For instance: "/content/drive/MyDrive/1" "/content/drive/MyDrive/2" "/content/drive/MyDrive/3" "/content/drive/MyDrive/4"
Run the code blocks one after the other, pressing the play button and waiting for each one to complete to start the next. At one point you’ll get a prompt asking to allow access to google drive, accept it and follow the steps to connect to the drive you uploaded the files to.
Wait a pretty long time (expect close to 30mins-1hr for a 10 hours audiobook). You can keep tab on the progress by the color of the favicon (grey when it’s running, orange when done). Check the tab itself somewhat frequently because google likes asking you to click a captcha to see if you’re still around. I generally pop the tab out into its own window to keep an eye on it easily and rapidly.
Grab your .srt file from google drive, download it as you would any other file. Note that if you get a .vtt, it means that the process ran into an error somewhere, possibly because you made an mistake somewhere.
Disconnect from the runtime as soon as it’s done, google uses time limits of sorts so if you let it longer than needed you might not be able to use it at all for the next couple days. I typically do my runs once every other day which is way faster than you could possibly read anyway.

Misc notes:

You can use the folder icon on the left to have a view of the folders and make sure your paths are correct.
You might get an error like “pip’s dependency resolver does not currently take into account all the packages that are installed…”, despite it being all red, this is not a problem for us so you can proceed safely.
The steps have to be taken from 0 every time you reconnect to the notebook, but if you’re still connected you can just relaunch the final command (with a new audiobook) one more time without problem. Think of it as reinstalling windows every time you boot up your pc, you have to reinstall all your programs every time, but once they’re there, you can use them as much as you want so long as the pc stays turned on.
If your pc crashes or you disconnect from the internet, you can return to https://colab.research.google.com/ and you should be able to catch the running instance as it runs.
In general, once you saved the notebook to your local copy, you can head to that link instead of the one in the guide, and use the notebook from your recent files.

From Subtitle to Audio Novel

First, let’s download everything we’ll need:

For windows
On linux: TODO
On mac: TODO

The software itself is intuitive to use, but there’s a few preparatory steps (that you’ll only go through once) to take care of beforehand.

Customizing our ren’py

The template folder next to the executable will be copied every time you make a new AN, so, let’s start by making it look good from the get-go. For that purpose, the template is a very basic exerpt from 吾輩は猫である. That way we have an example to test our customizations on.

Feel free to skip any step here, the template is based on my configuration so it’s perfectly usable as is.

Let’s head to the game folder, as it is where everything will take place.

TODO Vertical writing ?

Copy to Clipboard / Websockets

A classic for VN readers, both are enabled by default (websocket on port 6677 by default), but of course you might have reasons to disable either.

To do that, simply delete texthooking.rpy and texthooking.rpyc to disable the clipboard, or websocket_server.rpy to disable the websockets.

If you want to change the port, open websocket_server.rpy and change the 6677 in server = WebSocketServer('', 6677, SimpleEcho) to the number of your choice.

Font

Noto Sans CJK by default, if you want to change it, find a .ttf file of your font, and replace font.ttf with it. Make sure it is named that way and beware of uppercase letters.

Background

The background is in gui/nvl.png, feel free to decorate it as you please, using any color, patterns or anything you please.

`gui.rpy` changes

Everything has a description, but here’s a few recommendations of things to change:

define gui.text_size = 33 Self explanatory, I like my font pretty big, but you might like it smaller.
define gui.nvl_list_length = 2 This controls the number of lines displayed at once, be wary of setting it too high or it’ll overflow and some text might become hidden (this is my biggest gripe that I might look into improving in the future).

`top.txt` changes

This one isn’t in game but back in the executable folder, but I recommend trying your changes out on script.rpy first.

replace #1d1f20, with the text color (first for normal text, then for ruby/furigana, you can use different colors if you’d like)
window_background is the color of the rectangles behind the text, I recommend setting it close or equal to the background color to make it transparent, unless you have a more complex background.

Don’t forget to mimic your changes to top.txt once you’re done experimenting!

TODO list good color combinations from ttsu

It’s audiobookin' time

Phew, that was a doozy, but that’s all the hard stuff out of the way now. You can finally run audiobooktorenpy and you’ll be greeted by a pretty clear UI.

Name: Will be the name of the folder, just use a new one every time, nothing crazy here.
Path to the epub will be used to calculate the furigana and put them back in the text, it’s not mandatory and a bit shaky but pretty neat.
Offsets: this’ll offset the audio splitting a little, because sometimes the subtitle starts a bit too late which causes a weird half-sylable at the start of the line. Typically anywhere from -120 to -200 sounds good to capture a bit of silence beforehand, no magic here, it’s on an audiobook to audiobook basis, but the process takes a couple minutes so trial and error is the key.

After filling in the fields, we’re done, just have to press the button and wait a few minutes, all that’s left is to enjoy and read!

From Subtitle to SRS

We can also transform our subtitle into an anki deck. We already could have done that with subs2srs this has a couple drawbacks. Firstly, like for ren’py, I modify the subtitles slightly for them to be generally more accurate, secondly, this way is several times (approximately 10 to 50 times) faster than subs2srs, thanks to optimizations from not needing the video to take a screenshot of, as audiobooks obviously don’t have video.

Running it is very simple, download it here:

Windows
TODO Linux
TODO mac

Then extract and run audiobook2srs, give a unique name (otherwise your media collection might start having duplicates and trouble importing, I didn’t check this directly but better safe than sorry), select the .srt and .m4b, and set the offset, read up the renpy part for clearer explanations.

After a couple of minutes, you’ll get an .apkg file, which you can import in anki, tada!

TODO Addentum: Sentence Banking and automatically bringing the audio to your mined cards

To be expanded in a future update!