DeepL, a translation company best known for its text tools, today released a speech-to-speech translation suite that covers use cases such as meetings, mobile and web conversations, and frontline worker group conversations through custom apps. The company is also releasing an API that allows external developers and companies to build on DeepL’s technology for customized use cases such as call centers.
“Voice was a natural step for us after spending years working on text translation,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We’ve come a long way when it comes to text and document translation, but we thought there was a great product for real-time speech translation.”
Kutylowski said the challenges in creating real-time translation products are focused on striking a balance between reducing latency (the delay between someone speaking and the translated audio being played) and maintaining accurate results.
DeepL releases add-ons for platforms such as Zoom and Microsoft Teams. Listeners can hear real-time translations as other listeners speak in their native language, or they can follow real-time translated text on screen. The program is currently in early access, and the company is inviting organizations to join its waiting list. The company also offers products for mobile and web-based conversations that occur in person or remotely.
DeepL allows users to participate in group conversations in settings such as training sessions and workshops, where participants can participate through QR codes.
DeepL said its speech synthesis technology can also learn and adapt industry-specific terms and custom vocabulary such as company and personal names.
Kutylowski said AI is reimagining what customer service will look like in the years to come. He noted that a translation layer can help companies provide support in languages where qualified staff is scarce and expensive to hire.
tech crunch event
San Francisco, California
|
October 13-15, 2026
The company said it controls the entire stack between voices. However, current systems convert speech to text, apply translations, and then convert it back to speech. DeepL has been working on text translation for many years, so we believe we have an advantage in translation quality. In the future, the company hopes to develop an end-to-end speech translation model that completely skips the text step.
DeepL faces competition from several well-funded startups operating in adjacent corners of the space. Sanas, which raised $65 million from Quadrille Capital and Teleperformance last year, uses AI to correct a speaker’s accent in real time, a tool aimed primarily at call center agents.
Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies Amazon Web Services, helping them dub and localize video content at scale.
Palabra, backed by Reddit co-founder Alexis Ohanian’s company Seven Seven Six, is building a real-time speech translation engine designed to preserve both the meaning and the speaker’s original audio, putting it in more direct competition with what DeepL is currently building.
Source link
