THE BEST THING ABOUT AI IS ITS ERRORS.

AN INTERVIEW WITH TOM BATOY.

Interview: 2030 / Paul Wagner
Photos: Leopold Fiala

Tom Batoy is a composer and sound producer through and through. Not only has he been using AI at his mixing desk for years, he is also passionate about driving its development, particularly in the field of voice recording. Sometimes perhaps in a different way than intended. But more about that later. As managing director of Giesing Team, one of Germany's leading audio production companies, and of the music production company Mona Davis Music, he regularly commutes between Los Angeles, Berlin and Munich. Many of his clients, especially in the US, can be traced back to a legendary project that made Tom and his partner Franco Tortora quite famous overnight: the development of the audio logo for McDonald's global. Ba da ba ba da... I met Tom in his lab studio in the old Gasteig theater. Live and in the flesh. After all, his voice could have been AI-generated online and no one would have known.

gamma-2030-magazine-tom-batoy-giesing-team-155

Tom Batoy / Photo: Leopold Fiala

Tom, before we get into the AI topic: please tell us how the Giesing Team came about.

That was a long time ago... In the early eighties, I started mixing music at Konstantin Wecker's Kaffee Giesing and also recorded my first radio commercials on the side. Wecker's studio then ran into financial difficulties, so we founded the Giesing Team, grew quickly and finally moved to Wolfratshauser Straße.


That was a great place. Advertising history was made there, wasn’t it?

Oh yes. Many very well-known sound productions were created there. We now have four studios in Munich. The main studio is on Türkenstraße at ARRI.

 
If you don't come from the industry, you might not even be aware of all the places you've heard you. What exactly do you do? Who do you work for?

Our client list impresses even us. We've worked for practically every major brand you can think of. We do everything related to sound, in all formats: voice recordings, soundtracks, mixing, movie trailers. The only exception: we don't do dubbing. And everything to do with music is done by our music production company, Mona Davis Music, which I founded years ago with Franco Tortora. We produce in all genres, from rock, jazz, techno to classical. We have mastered the most diverse styles, we know which registers to pull to create certain feelings. This is of course extremely important, especially for advertising productions. You can feel our passion for sound in everything we do – and especially for voices. Every speaker has their own fingerprint, every voice is unique and conveys a very specific vibe.

 
The topic of voice is a good starting point for discussing AI. Before we talk about the opportunities and challenges of AI for speakers, let's ask a very general question: Does AI reinvent something or does it just spit out what is already there?

More the latter. The counter-question is: what do humans do differently? They read, they acquire knowledge, including musical knowledge, and use it to create something new. What's the big difference? In the field of advertising, the hundred-thousandth toothpaste commercial looks just as uninspired as all the ones before. It's always the same, everything is copied. So why reproach AI for this? But when we talk about music and look at tools like Suno or Udio, I immediately think of a completely different topic than the question of originality and quality: both apps stand for copyright infringement on a grand scale. Everything is copied without the consent of the labels, publishers or artists. They just throw in every song and piece of music they can find. That's why the results are so amazing. I like to think I have a good ear for music, but I can't tell which songs are AI-generated and which aren't. Universal is suing, Warner is suing, everyone is suing. I'm sure there will be big deals in the near future.

When were you first confronted with the topic of AI?

Ten years ago.


But not that long ago?

Yes, AI tools have been playing a role in audio for a long time. Say you have a voice recording where, unfortunately, a tractor passes by in the background, then we use AI tools to remove the noise. These tools do a good job, especially in the mastering area when we have to create a mix. But the importance of AI in audio production has completely changed since generative AI came along. It actually learns and creates new things by recognizing patterns in existing data and combining them creatively. It's like a playground. With generative AI, you can let off steam, experiment, and be a kid again. It's amazing, just great! Precisely because the results of generative AI are not really predictable at the beginning. The more you research and try out, the more controllable they become. But many, many days and nights go into it.


What fascinates you most about AI?

The coolest thing is the hallucinations. The big bloopers. Yes, I think the best thing about AI is its mistakes. They're ingenious. That's when AI gets totally creative, unconsciously and unplanned. For example, I recently had Midjourney generate 56 images, always related to musical genres. I entered the prompt: “80s rock singer in a leather outfit standing on the wing of a parked airplane.” The picture was exactly as I had imagined it – except that he had three arms, which I didn't even notice at first. He made a rock singer gesture with one, held the microphone with the second, and casually leaned against the fuselage with the third. Brilliant. Hallucinations like that inspire me, it gets exciting.

 
If you ask the AI for ten reasons for climate change, it will give you ten reasons – even if there are only five.

Exactly. I made a game with the AI and asked it a fundamental question to which it was only allowed to answer with a single word. The question was: Is AI creative? The answer: No! Then I asked it: Can AI be creative? The answer was: Yes! Then I prompted it: Create a creative video – the result was devastating, totally irrelevant and anything but creative. Nobody wants to see that. So I wondered how we could get AI to hallucinate in order to come up with crazy, interesting results.

 
And?

To be honest, I have no idea. I still haven't found a method that can be planned in the sense of an instruction manual. It remains wild... But I am sure that we will make progress, for example, with information overload when prompting. We are doing everything we can to challenge the AI so that the unexpected can happen.

gamma-2030-magazine-tom-batoy-giesing-team-303

Tom Batoy / Photo: Leopold Fiala

That's how artists generally proceed. Right?

Exactly. And that makes it difficult to use generative AI in the context of a sound production, which has to deliver predictable results in a tight schedule. What is needed then is a creative framework on which the AI can build. For example, I record a sentence that could be, “The new summer special. Now on ARD.” Then I put a cover over this framework with an AI-generated voice. My voice then becomes, for example, the voice of the well-known speaker XY …

 
Well, he’ll be pleased.

This works really well because, unlike when I would just prompt the text, I have already defined the tonality of what is being said through my audio specification. Incidentally, we have already digitized many first-class speaker voices and developed one of the first AI contracts with speakers. So Speaker XY knows that the use of his voice is completely transparent to him and only takes place with his consent. It is already a milestone that today – for the first time ever – you can preserve a voice. It will remain sellable and usable even when you are over ninety. This will change a lot of things. For example, in the area of children's voices. Some have become very well known – until they break, that is, and then that's it. Now you can sell them forever and even bequeath them. Or your voice can be used even if you have vocal cord surgery. Most people in the industry have never even thought about advantages like that.


And what is the reaction of the voice actors?

It's a mixed bag. It's the number one topic, of course. Many are afraid. What will happen to my voice? Am I still in control? The German Voice Actors Association (VDS) is critical of the digitization of voices, e.g. via elevenlabs.io, and advises its members against it because these voices could in turn be used to train the AI. I see it a little differently. The train has been rolling for a long time. I believe that no voice actor will be able to avoid digitization. At Giesing Team, we make the following offer to voice actors: you can have us digitize you, and you get your own account that only you have access to. You give us the access data to your account so that we can request you whenever we have something. We fix the price and off we go – with full transparency. The voice actors can see every prompt we have created for their voice in their account. And they always know what content their voice is used for. If, in the unlikely event, an actor decides that they no longer want to work with us, they simply change their password and we are out. With us, the voice actors have everything under their control.


Is that the future?

Yes. But in general, AI and sound are in the wild west at the moment. The app providers are taking the easy way out and shifting the entire responsibility onto the users. The terms and conditions always say: you can generate something here, you can use it, but it's your responsibility!


That's actually a showstopper for the commercial sector, isn't it?

Almost. What is clear, however, is that the legal departments will have a lot of work to do on the subject of copyright. The majors like Sony, Universal or Disney often say: no AI! But that's not so easy. Try to find an operator who works in the 3D area without AI tools... Ultimately, the enticing savings potential will drive the whole thing. I fear that dubbing actors in the film industry will be hit hard because all the bread-and-butter jobs will be lost. It is still too expensive today, but it is technically feasible to create the language variants of entire feature films automatically. And in sync with the lips.


Is it conceivable that only the most distinctive, outstanding voices will survive – as AI clones?

I'm afraid so. There are already masses of run-of-the-mill voices in digital form. Exceptional voices will always be interesting..


Thanks for the chat, Tom!

Continue reading

gamma-2030-benjamin-heirich-teaser

Das wichtigste Personal Accessory ist eine Brille

Im Gespräch mit dem Head of Design Benjamin Heirich, der mit seinem Team seit vielen Jahren die Brillenkollektionen für Porsche Design Eyewear entwirft.

gamma-2030-axel-schmid-teaser

Many hands make light work

Paul Wagner spricht mit Axel Schmid, Head of Product & Project Design bei der legendären Licht-Marke Ingo Maurer.

gamma-2030-dr-thomas-girst-teaser

Das Auto ist unsere Skulptur der Gegenwart

Ein Interview mit Dr. Thomas Girst, Leiter BMW Group Kulturengagement über die legendäre BMW Art Car Reihe.

Back to top Arrow