Mati Staniszewski, co-founder and CEO of Eleven Labs, says voice is becoming the next major interface for AI, meaning models are moving beyond text and screens and increasingly the way people interact with machines.
Speaking at the Web Summit in Doha, Staniszewski told TechCrunch that speech models like the one developed by Eleven Labs are now moving beyond simply imitating human speech (including emotion and intonation) to working in conjunction with the inference capabilities of larger language models. As a result, he argued, the way people interact with technology will change.
In the next few years, he said, “hopefully all mobile phones will be back in our pockets and we will be able to immerse ourselves in the real world around us, using our voice as a mechanism to control the technology.”
That vision was the driving force behind 11Labs raising $500 million this week at an $11 billion valuation, and it’s a vision increasingly shared across the AI industry. OpenAI and Google are both putting voice at the center of their next-generation models, while Apple appears to be quietly building voice-adjacent, always-on technology through acquisitions like Q.ai. As AI becomes more pervasive in wearables, cars, and other new hardware, control is becoming less about tapping a screen and more about speaking, making voice a key battleground for the next stage of AI development.
Seth Pierrepont, general partner at Iconiq Capital, echoed that view on stage at Web Summit, arguing that while screens will continue to be important for gaming and entertainment, traditional input methods like keyboards are starting to feel “outdated.”
And as AI systems become more agentic, the interactions themselves will change, Pierrepont said, as models gain guardrails, integrations, and the context they need to respond to less explicit prompts from users.
Staniszewski pointed to the shift in agents as one of the biggest changes underway. He said future voice systems will increasingly rely on persistent memory and context built over time, rather than spelling out every instruction, making interactions feel more natural and requiring less effort from the user.
tech crunch event
boston, massachusetts
|
June 23, 2026
That evolution will impact how voice models are deployed, he added. While high-quality audio models have primarily existed in the cloud, Staniszewski said Eleven Labs is working on a hybrid approach that blends cloud and on-device processing. This is a move aimed at supporting new hardware, including headphones and other wearables, where audio will always be an accessory, rather than a feature that dictates when you use it.
Eleven Labs has already partnered with Meta to bring the company’s voice technology to products such as Instagram and Horizon World, the company’s virtual reality platform. Staniszewski said he is open to collaborating on Meta’s Ray-Ban smart glasses as voice-driven interfaces are expanded to new form factors.
But as voice becomes more persistent and integrated into everyday hardware, it opens the door to serious concerns about privacy, surveillance, and the amount of personal data stored as voice-based systems move closer to users’ daily lives. This is something companies like Google have already been accused of exploiting.
Source link
