It's always great to hear from technologies at the cutting edge - TTS [Text To Speech] has been around for years, and Matthew Aylett from Cereproc has seen first hand the changes not only in the technology, but the way people think about use cases. Featured in the BBC last month as they brought to the life the speech JFK was going to give the day he was assassinated, they've had a string of high profile projects.
Oli: Tell us a bit about the team and where the idea came from.
Matthew: I used to work for another startup called Rhetorical Systems, with Chris [Pidcock] and Graham [Leary], which was bought by an American company and disappeared without trace. It was clear in the world of TTS things were changing - the market was focused mostly around IVR [Interactive Voice Response]. It was voted the most hated technology of the 20th centurty in Wired magazine in 2011. People were trying to replace call centres with machines, and the neutral dull TTS available undermined the user expereince
We wanted to create a service that people volunteer to use, not one they have to. Where we could give voices with character available for digital services, robots, avatars. We didn’t think this was weird - it’s not all going to be like Black Mirror. Computers can talk to you and it’s fine, it doesn’t need to feel like they’re tricking you into thinking they’re human. Speech then became really relevant across the board, with expressive behaviour, regional accents, and character all becoming more valuable. You need to use these features carefully, for example you probably don’t want your bank using a sad voice to tell you you’re out of money. But for games, PAs etc. it's very important.
O: What are some of the applications of Cereproc?
M: There’s a great many use cases, for instance if you’re losing your voice (for instance to motor neuron disease) then you’ll still be able to communicate using your own voice. We did this for Roger Ebert, as well as Steve Gleason of the New Orleans Saints. We have an online service that can set up your own voice for £500. Sophia the social robot who was interviewed at the UN was another of our projects and all schools in Scotland use Cereproc to help children and young people with dyslexia sit exams.
It’s pretty wide ranging - we can effectively clone any voice, which is interesting in a world where the amount of fake news is increasing. Years ago I created The Bush-o-matic, that let you say anything in Bush’s voice. And similar projects exist for Trump/Obama, etc. However we won’t generally sell voices of people who haven’t agreed to it, not even Donald Trump.
O: What about audio books?
M: For most books a human actor is still the best choice. Voice talents are incredibly skilled in the way they use there voice. Synthesis is better for text which changes fast or its not worth recording. For example, for something a bit drier, say Bloomberg Financial Reports 2001 Vol. 1, or the weather forecast, synthesis works out to be much more efficient.
O: Why pick Edinburgh as a place to grow Cereproc?
M: Two answers - first, I like living here. When I left university I got the job at Rhetorical, then started my own business. Second, it’s a great place to start a business - Edinburgh’s got great communications, and the pool of technical people that are coming out of the Universities is really strong. Great to have engineers that are committed, passionate and stick with you.
O: What are some of the biggest challenges you're facing?
M: We’re right at the edge of what the technology is doing - it’s the same stuff people are doing in universities and research centres. It’s all changing really really quickly. We’ve got a small extremely expereinced team, and we have to keep on top of this. Also, getting enough visibility - when a company like Facebook does something in TTS its all over the news, its harder for us, and we need to work much harder to get visibility on the international stage.
O: What's the strangest project you've worked on?
M: Probably the work we did with KFC. It’s a common misconception, people assume the point of TTS is to sound natural. It’s not, it’s to sound appropriate. I love the way the voice and the robot work so well together, it makes me smile.
O: Speech is very much one of the focal points of tech right now - withAlexa, Siri, Google Home all battling for space on our shelves. What do you find exciting in TTS at the moment?
M: Google’s DeepMind/Wavenet, certainly. They took a huge amount of audio and put it through the network and created something that sounded really good - a classic brute force approach. There’s always been two processes to speech synthesis - one take lots of little bits and reorder like cutting up a cassette tape, which maintains the quality, but the problem here is data sparsity. The other approach is parametric speech synthesis where you build a model of the speaker. The problem here is final voice quality - it sounds unnatural.
DeepMind recently made significant progress with this. The only problem is it’s still quite slow - the initial system took days to generate a sentence. We’re building new systems to take advantage of this brute force approach, but making it much faster.
There’s a war going on in the tech world and TTS in in the front line - when someone at Google took Siri out the box for the first time, and asked it “How high is Everest?”, what was important to Google was that there were none of their adverts. Siri bypasses the entire business model of Google, eg ads. The war is user control, and the big four want to gain a monopoly on their relationship with the user.
O: In the future, do you see yourselves focusing more on specific use cases (either B2B or B2C), or becoming a tool that other people build solutions on? Or both?
M: To a certain extent what we build is a component technology, but we also sell voices to the public. This makes money, and boosts visibility. With the technology changing so fast, we have to move really fast to work out where we fit in, both commercially and technically. Our technology is always improving, and we are happy to change the way we do business as the market changes.