Apple today quietly unveiled digital narration technology, which uses artificial intelligence to generate human-sounding narration for books. While it sounds like a dangerously bad idea at first — how the AI knows what to emphasize, where to get excited and where to slow down — the small sample Apple shared sounded surprisingly human.
The original goal: Long-tail books that are never worth paying for a human narrator.
“A growing number of book lovers are listening to audiobooks, but only a small fraction of books are converted to audio — leaving millions of books unheard,” Apple said. “Many authors — especially independent authors and those associated with small publishers — are unable to create audiobooks due to the cost and complexity of production.”
Apple began releasing four voices, two female and two male. The voices are optimized for specific types of books, so Jackson, designed for fiction or romance, has a deep, somewhat husky voice, while Helen, a soprano, is designed for non-fiction and self-development.
“Mitchell” and “Madison” round out Apple’s original four voices.
This is yet another example of generative AI, which is exploding today thanks to OpenAI’s ChatGPT and many other startups and projects, including Dall-E, Midjourney, and others. New York schools have banned ChatGPT over concerns about cheating, but the industry as a whole is expected to grow from almost nothing to more than $110 billion in revenue by 2030.
The question, of course, is human work in art and design, copyright of training images and drawings, and now with AI narrators, human work in audiobook creation.
But AI also creates some jobs.
“Apple Books Digital Narrative combines advanced speech synthesis technology with the critical work of a team of linguists, quality control specialists, and audio engineers to produce high-quality audiobooks from eBook files,” Apple said. “Apple has long been at the forefront of innovative voice technology, and is now using it for long-form reading, working with publishers, authors and narrators.”
All four of the original Apple voices were default non-American accent voices with slightly different intonations, indicating subtle differences in ethnic background. While Apple isn’t saying anything about future voices, if it finds success in including accents from other countries, like English or Australian, and possibly regional or racial voices, like American Southern or black, or even traditional, the company may. Expand the program with a Boston or New York accent.
Of course, English is just the beginning: Spanish, French, German, and others are all waiting for similar capabilities.
Apple won’t just apply AI voice to every title in its library. There’s actually a long process that starts with signing up with a preferred partner who will manage the process, pick your title, pick the sound, pick the cover art, and wait a month or two to process the book and quality an examination.
Apple says there is no guarantee of publication: Books narrated must meet Apple’s quality and content standards.
However, Apple will foot the bill, according to a report in The Guardian.
Just a few months ago, Spotify, which has a significant presence in audiobooks and podcasts in addition to its core music product, complained that Apple was engaging in “anticompetitive behavior” when it made purchases of audiobooks on the Spotify app on the iPhone. Spotify will be watching these developments closely, as will Audible, the audiobook market giant owned by Amazon.
Early returns sound good, but it’s important to remember that Apple is only sharing small snippets. It is important to see how the whole book turns out.