Novel and Other Stuff Updates

I’m using AI for my audio book

I’ve had some health and logistics issues in getting the audiobook I wanted to do done. Mostly it’s because doing professional voice work is a lot harder than I thought. I’ve done radio, I’ve done stand-up, I’ve emceed karaoke. But focusing on pacing, delivery, and accuracy while fighting sinus issues, using my Blue Yeti mic at my desk… It was just too much for me. You do not want to know how many times I recorded the first chapter and then hated it.

I tried various TTS (text-to-speech) options, but none of them ever really got it… until two did. One was Google Gemini 2.5, the other being InWorld. Both reached the level of expressiveness to not only feel natural, but to feel like they were getting the context and emoting appropriately. The problem was that comes at a cost of consistency. While the voices were very capable of delivering good performances, they could sound so different from one take to the next, it sounded like different readers or different studio set-ups. And once in each of two four-generation tests, InWorld actually used a different voice (not simply sounding different. but another voice model altogether).

I made sure to provide my feedback to InWorld. They’re small enough to want customer feedback. Google doesn’t give a shit. They, like Meta, put up SO many walls between you and reaching an actual human, it’s nigh impossible to overcome.

So, I went back and tried ElevenLabs. While Google or InWorld would have been cheaper, having to do so many generations just to get two voices that sounded close enough just wasn’t going to work and coule have blown the budget out of the water.

So, I tried ElevenLabs and Hume. Both were good. Both had studios for building long-form content. But Hume’s studio was very hard to use while ElevenLabs’ studio was much more intuitive. Besides the fact that I could regenerate specific lines or paragraphs I didn’t like the line-reading of, if it just wasn’t getting it, I could highlight the section, record my own reading of just that bit, and have the AI reproduce my reading in the AI voice.

Yes, I can dictate line readings

The first time I used it was on a sentence of dialogue in the first chapter: “We’re showing you to him.” It’s a correcting statement in response to “why are you showing him to us?” In that context, “you” and “him” needed special emphasis to point out the reversal and the AI just wasn’t getting the context. So I read it to the AI and the AI got it right from then on.

Where I’m at

Right now I have 26 of the 40ish chapters in the ElevenLabs studio. There’s nothing secret, the book is already published. I’m making small corrections and phrasing improvements as I go and will release a “First Revised Edition” alongside the audiobook, but nothing major is changing.

Once it’s all in the studio comes the expensive bit, both in terms of time and effort. I’ll have to have it generate a chapter at a time, then go through that first reading and do regenerations or guide recordings where I feel the AI didn’t capture the right tone/feel. This will cost me about $400-$500 altogether and two weeks of my time (estimated value of $2,800 a week). That said, doing it in a recording studio with a professional engineer, professional VO artist, and professional director would cost double to triple. If I value a week of my time at $2800 (the lower end of what I made in tech over the last 10 years), here are comparative costs:

Studio Home + AI
  • 30 hrs Studio & Engineer @ $250 hr: $7500
  • 30 hrs VO Artist @ $150 hr: $4500
  • 30 hrs VO Director @ $150 hr: $4500
  • 30 hrs my time @ $70 hr: $2100
  • 1 month ElevenLabs Pro: $99
  • Up to 1.5 million extra credits: $360
  • 2 weeks my time @ $2800 ea: $5600
Total: $18,600 Total: $6,059

I may come in under on both, but neither is an unreasonable estimate. Plus I don’t have to schedule around the schedules of myself, the studio, engineer, director, and actor. The ElevenLabs “studio” and my narrator voice model are available 24/7 or close to it and I have A LOT more creative control. I can suggest a line reading to a human. I can force a line reading on ElevenLabs. These AI solutions are getting better every quarter, even the open source models you can run locally. By the time I’m ready to record the audiobook of Sodom All Over Again, either Google or InWorld will have solved their consistency issues or there will be an open source model that delivers both a good reading and consistency.

A logo!

New Heroes of Old logoI mentioned The New Heroes of Old™ in a prior post and you may have noticed it’s now a category on this site. Over the summer, I turned my oldest into a “nepo-baby” by hiring him to design a logo. The logo is the text and the image in the O of “Heroes.” The sword behind it is a public domain image.

Not at all bad, I think. He’s currently in his 3rd year at art school, studying illustration and graphic design, so it wasn’t like I was hiring someone totally unqualified just because they were related to me. I also spent time taking him to professional networking events like the Seattle Independent Game Developers Association social in Northgate. It’s all well and good to be talented, but if you’re going to be a professional artist, you need to learn the art of shmoozing.

Here’s a bigger version of the O.

Bulmash Music is on a brief hiatus

Some of you may have found this blog through my musical creations. If not you can check out the Bulmash Music playlist at YouTube or in the Bulmash Music category on this site. Many songs are also available via Spotify, Deezer, SoundCloud, Apple, and Amazon.

There are two reasons for it going dormant for a while… 1) I’ve run out of lyrics that I like from my files and need to write more. 2) While I’m focused on finishing the audio book and the first draft of Sodom All Over Again, I’m not very focused on writing lyrics. I expect to put out at least a couple of new songs in December.

Thanks for reading. You can subscribe to be notified of new posts using the form in the side navigation on desktop, below if you’re on mobile. Or follow me on LinkedIn or BlueSky or YouTube.

Yes, no Twitter, no Facebook, no Instagram, no Threads. Twitter is because it’s a cesspool of hate speech and radicalization, the Meta properties because they thought I was a hacker trying to hack my account and now it’s permanently locked. I’m not going to lie and cheat to get back on Meta with different credentials, so I just have to do without.

Add a Comment

Your email address will not be published. Required fields are marked *