Skip to content

Interview: How Songs & Music Catalogues Get a New Lease of Life with AudioShake

From mixing & mastering to music sync & lyrics localisation: Talking to AudioShake.

Photo by James Owen / Unsplash

Media has been endlessly buzzing about generative AI since its proliferation in the music industry, and things like stem separation often remain behind the scenes, even though their value for the industry is a lot more tangible. One of the companies that tries to change that is AudioShake that sees music source separation as something bigger than just extracting stems from a track. They propagate it as a business asset that various industry players can benefit from.

Besides, it's arguably the only music source separation company that caters specifically to businesses. AudioShake's know-how is now used by major labels and indies, music publishers, sync agencies, artists, and gaming studios alike.

For this story, we sat down with Jessica Powell, CEO and Co-Founder of AudioShake, to talk about the diverse use cases of stem separation, AI, trust, and the projects they're most proud of.

We thought stem separation brought lots of opportunities for developers, labels & artists.

Music source separation isn't exactly the simplest technology but its value is huge. We asked Jessica what made them focus AudioShake specifically on B2B and if they saw this potential right from the start.

"My co-founder and I lived in Japan and we did a lot of karaoke. And we thought, 'Why can't you just rip the vocals off everything and then karaoke to the original song?' We were into really old punk and hip-hop and couldn't find those in karaoke catalogue.

"It really started off as something we were just interested in. It wasn't like we were trying to build a business. There are probably much smarter ways to try and go and build a business. Don't do music, first of all, but it was just something that really interested us for our own creative purposes.

"And then, in terms of why to make it B2B, I think there are two reasons. First, we thought there were a lot of opportunities for businesses like developers, labels, artists—there are a lot of opportunities for people to build these kinds of experiences. But that means you need to be able to cater to them and their needs. And the needs of consumers are different; it's different from the needs of businesses. Businesses have lots of extra questions around security and reliability and getting access to things that other people don't have access to. They have questions around copyright and tons and tons of questions. So that's the first part—it's hard to be one-size-fits-all and cater to both those crowds.

It really started off as something we were just interested in. It wasn't like we were trying to build a business.

"And we went on the B2B side because we were excited about the power to deliver really fast, really high-quality stem separation in a reliable way to developers and others. And then, we're all from tech backgrounds and we're all musicians.

"There are a lot of examples of where tech people basically just impose things onto the music industry. And I think we just wanted to be a little bit more cautious in terms of how we rolled out these models, that some artists are going to be super cool with it and excited about being able to split their tracks and other artists might not want that. And we wanted to be respectful of that.

"In all the debate around music tech now, with generative AI across media generally, there's something different when you can do something at scale."

I think the questions around stem separation should be different from what they are for generative AI.

Many people in the industry don't fully grasp the difference between gen AI and the technology that at AudioShake's core. Labels and artists are now suspicious when a new AI music startup emerges. But Jessica says the question of trust was never really an issue for the company.

"Most of alternative solutions are built on two and really now one main open source model that was developed by Facebook a few years ago. Maybe people do some fine-tuning on top of that, but it's essentially the same underlying model. We have our own technology, our own proprietary models are patented, which is state of the art. We have the highest quality separation. We do both stem separation and lyric transcription, and we're state-of-the-art on both of those tasks.

"We also can have it run on devices, like DJ Pro, and make it really fast. It can be compatible with streaming, for example. We aim for quality and speed. The third thing is that this is a very different approach to working with the industry, we want to see all the use cases. We think it's awesome to allow people to engage with their favourite content, recreate, remix and mashup it. But we wanted to be a positive contributor to the ecosystem and help encourage that artists get paid for all of that.  

"That's why we started working with the labels, publishers, artists, and managers directly.

Today, we're used by all the major label groups, large indies, large publishers, tons of smaller indies, and emerging artists as well.

"I think that also encourages a lot of trust, and over time helps you expand what can be done for people. From day one, we had an ethical approach to AI development. I think the questions around stem separation are or should be different from what they are for generative AI—you're not actually generating anything when you're separating.

"If you want to clone Taylor Swift's voice or you want to train a music model that can process something realistic when someone types in a text prompt or a voice prompt, you need to have a concept of Taylor Swift. Or you want to design the Eiffel Tower, but in the style of Monet—you need to know what Monet is, which means you need to have it trained on Monet or on Taylor Swift, or a sound alike or look alike to then be able to do that.

"Stem separation doesn't work that way. We don't need to train on Elvis's vocals to be able to separate Elvis. It's fine for us to train on production library music that you license. It's pretty uncontroversially beneficial to the artists and the rights owners because you're enabling them to take this track, split it apart and then do things with it that either are new creative opportunities or it's actual monetisation.

"There are lots of generative AI music companies that are working with artists and rightsholders, but in those cases, they are typically working with a smaller data set of tracks, loops, or samples created by composers. Then there are companies that are just training on everything."

Futuristic tech that can also be super practical.

When tech people come and try to "fix" the music industry, it may raise some eyebrows. AudioShake didn't try to fix it, though; they've offered a solution that is beneficial for all.

"We started off in sync licensing and going to sync departments, we had learned that generally 30 to 50% of the sync licensing requests that came through couldn't fulfil because they didn't have an instrumental. If you're just going to them with something super practical and saying, 'We can help you unlock sync for catalogue tracks or newer tracks that for whatever reason don't have the instrumental,' that's a big win for them.

"So while there are a bunch of music experiences and workflows that we think this tech applies to in a few years' time, we went to these teams with something super practical. We didn't try to sell them on some vision that won't come to life for five years' time. It's not speculative, it's just saying you have this business problem today and here's how we can help you solve it.

"We're essentially an audio infrastructure that's helping different kinds of use cases, different kinds of developers, different kinds of rights owners, whoever it might be, further their products and their businesses using the technology."

Music aside, you can use stem separation for things like gaming or other environments.

Stem separation, as it turns out, can be useful for different domains and industries. In some of them, it's even hard to figure out how exactly this tech can be of actual use. We asked Jessica how stem separation is used to make audio immersive and where else we can see that apart from music itself.

"If you think about the Dolby Atmos or Sony 360 formats, which are supported by Apple, Tidal, Amazon, that's taking different sound objects and placing them in different perceptual fields. So the guitar is here and the drums or the bass is back behind me—more like-real life audio, right? In order to put those sound objects in different places, you have to have those sound objects.

"What do you do if you only have a full mix of a track? You need to be able to take the stems and put them in different places. So that's how stem separation is used here. For example, for Nina Simone's first album, BMG used our software to split them because those are old tracks and they don't have stems. They use that to create the stems to then create the immersive track. Same thing with De La Soul: They used our tech for remastering, but they also used it to create the Dolby Atmos mix. They came back onto streaming last year.

"But you could also see it for things like gaming or other environments, basically any environment where you need to separate the audio into its different components and you weren't given those components.

"We work with some gaming companies because they don't always have the components that they need. They've licensed tracks, but those tracks don't have stems. If you think about audio at scale, what happens in gaming today is that a developer platform will go and license, say, 20 tracks for the music in the game. And if they can get the stems, they have to take them and do all kinds of things to them to standardise and prepare them for the game.

"When you're delivered stems normally, you might send me your track and its stems, and there are, say, twelve stems. My track might have eight stems, and you labelled your guitar as guitar, and I labelled my guitar as wah-wah. Imagine, you're working with that in a programming environment and you've given me eight of something and labelled it one thing, and I've given you twelve of something and labelled it something different.

"If I want make it that every time the hero enters the cave, all the audio drops except for the bass. How do I do that when you called the bass one thing and I called the bass another thing? And when I want to do that with a thousand tracks or a million tracks? So what becomes interesting at scale is that you can not only create all these assets, but you can deal with them in a programming environment because they're consistently labelled, organised sonically in a consistent way."

The reason we do one thing & do it really well is that we don't want to contribute to watering down experiences.

Since Deezer introduced its open source stem separation model Spleeter back in 2020, many similar solutions emerged—some also used open-source tech, some developed their own. Now, the market of music source separation is very diverse, but AudioShake doesn't mind.

"We're a company that's very, very focused on stem separation and creating the highest-quality stem separation. As long as people are looking for quality, we have a business.

"The more people doing stems, the better. The more it's normalised and that people can work with these things, the more it moves the entire ecosystem forward. All of it's good because it also forces the industry to get better at licensing different use cases and at tracking them. I think it's a good thing.

And I think AI has this tremendous opportunity to help us a lot in our daily lives. I also think it absolutely will lead to a flood of mediocre content.

"The proliferation shows that there's a need for it. And the more this need is, the more people focus on getting good stuff, which serves artists better, right? One thing that I think a lot about with AI across the board, not related to music specifically, is that being able to do stuff at scale really changes everything.

"And I think AI has this tremendous opportunity to help us a lot in our daily lives. I also think it absolutely will lead to a flood of mediocre content. The reason we essentially do one thing and do it really, really well is that we don't want to contribute to that kind of watering down of experiences."

It's rewarding to see both the practical and emotional "uses" for sound separation.

The list of AudioShake's clients is just as varied as the use cases of the source separation tech. We asked Jessica about partnerships and projects she's most proud of.

"When we worked on the Nina Simone album, my parents used to play that for me when I was a kid, and getting to isolate her piano playing and which I've never heard separated from her voice, it's a moving experience.

"Or there's a family estate that we're working with right now where the singer died very young and left a bunch of children who barely have ever heard his voice and never heard his voice outside of the full mix of the song. And for them, when they heard his voice isolated the first time, it was also a very moving experience.

"Two other things are Ryan Tedder and OneRepublic using Audioshake to isolate his vocals, and you could just hear how great his vocals were, or Green Day using it to let their fans become the guitarist.

"It really runs the gamut from these very practical uses that have helped artists make new revenue. That could be one of the labels we worked with recently that landed a seven-figure sync for a 1970s track for an institution because they were able to create the instrumental, that otherwise they wouldn't have gotten. That's life-changing money. And then there's stuff that's much more emotional, that's hard to put a value on. But it's super rewarding to feel that you're doing that."