The Year of AI

UK Creative
Mar 3
4 min read

Alex Lodge, Founder & Toby Slade-Baker, Founders of Thirty Two TV & Virtual Sound

Monday 3rd March 2025

There’s a name for people that hear disembodied voices.

At Virtual Sound however, his name is Sam, our ‘Head of Robots’, and over the last few months he’s been developing a unique workflow to revolutionise the way we deliver A.I. voices to our clients.

Let’s rewind a bit though, and ask ‘why’?. The A.I. conversation is changing all the time, in what seems like weeks (but is actually years by this point), we’ve gone from scare stories about looming spectre of the singularity, to finding out Gen Z are using Chat GPT to help craft WhatsApp messages to their difficult mothers. In that time the general uptake in the use of tools like ChatGPT, and now DeepSeek, seems ubiquitous, and the conversation in the creative industries has become less about who it’s going to replace and more about how we can make the most of these tools, where their capabilities end and, most frequently, what are the legal implications (are we going to get sued for this)?

Alongside persistent questions both practical and ethical, our audio team at Virtual Sound are being asked more and more frequently by our clients how they can use A.I. It reminds us of the time 6 years ago when the frequency of our clients asking us to do sound work for them alongside our music work allowed us to see a gap in the market and create Virtual Sound in the first place.

Virtual Sound has always sought new ways of working, new ways of thinking about the application and delivery of audio for brand communications. Before COVID hit we developed a remote voiceover workflow that, when we took it to our agency clients, was met with a notable lack of interest. Then COVID hit and it became the norm. Now, at the start of 2025, we have to think about what the norm will be in 2 years time, in three years time, in ten years time, and how we can develop our interactions with A.I. to deliver that now.

For us, it’s A.I. voices. Our clients have all used tools like ElevenLabs and realised quickly that the utopian ground of ‘unlimited realistic speech for your projects’ isn’t quite as fertile as it was made out to be. They’re just not that versatile. Yes, you have access to a lot of realistic sounding voices, it’s the direction that’s the issue. The great thing about a human voiceover is the dialogue you have in the session. You can experiment, you can iterate, you can come in with a predefined brief and have the direction change based on the performance of the human behind the microphone. It’s not just choosing something off the shelf, it’s a creative exchange. A couple of years ago when we started interacting with the various A.I voice generation tools we asked ourselves the same question as we did when we were developing remote VO sessions. “How can we make this as similar to a normal VO session as possible”?.

And there’s the rub with A.I., the questions ‘how can we do what we’re already doing in an indistinguishable way, but cheaper’, and ‘what can we do that we haven’t imagined before’ present two distinct paths to tread. We are currently striding confidently down the first while making eyes at the second.

As mentioned above, the current A.I. voice platforms fall down when it comes to directing ‘voice actor’ performances. “Yes”, we hear you say, “it’s great that we can get a movie trailer style voice, or a friendly northern female that, if she were human, would be voicing supermarket ads all day long. It’s great that we don’t have to pay usage, but can we just have another read, and can you put a little more ‘bounce’ into it?”.

What we have done at VS is develop a workflow involving a combination of A.I. 's that allows us to both direct a read and get closer to the kind of mood, inflection and accent changes that are impossible within the closed systems of the big A.I. voice tools. We can then feed those reads into these tools and choose the voice type. Importantly, it allows us to offer creatives the chance to participate in those sessions, bringing us closer to a world where we can emulate the creativity and flexibility of a VO session.

Despite all our learning and our innovation and successful application of a unique workflow, the open questions remain the same. What changes will we see as a result of the evolution and wider adoption of A.I. voice tools? Will it put hard working VO artists out of work, or will it open up more opportunities for them within the A.I. workflow (as models for cloning, for example)? Will the widespread adoption of A.I. change the way agency and brand clients approach pre-production budgeting in the same way as it has with music? The availability of cheap / free options reduces the necessity of a hard figure on the budget, will it become the norm that whatever is left over once the other service providers are considered is what is available to spend on a voice? How can we use these tools ethically, sustainably and with an eye on a non-dystopian future? And finally, what aren’t we seeing? What new developments haven’t we even begun to imagine? We don’t pretend to have all the answers. In the short term though, we think we have a solution.

If you missed the brilliant ‘ Exploration into the Music within the Fortnite Universe’ discussion at the UK Creative Festival check it out here now.

Alex Lodge & Toby Slade-Baker, Founders of Thirty Two TV & Virtual Sound

Monday 3rd March 2025

Got something to say? To contribute please get in touch here