Openai launches next-generation audio AI models: smarter speeches and texts and expressive AI voices

Openai has dropped a fresh batch of audio models, shaking how voice AI works. The new lineup of audio models is designed to push audio AI forward. The new release includes text-to-speech and speech-to-text models that move things forward with speech recognition and production.

This release includes GPT-4O-MINI-TTS, a text-to-speech model that accurately controls tone and timing, and two advanced speech-to-text models, GPT-4O transcription and GPT-4O-MINI trans access.

These models are now available through Openai’s API and Agents SDK, making it easier for developers to build sophisticated, voice-powered applications. Openai also launched Openai FM, a platform for testing speech models from texts, and introduced a contest to encourage creative use of technology. The announcement creates great interest from the developers and the tech community, highlighting the potential to rebuild voice-driven software.

Speech and transcription upgraded from text

The latest models include GPT-4O-MINI-TTS for text-to-speech, built to handle subtle speeches with better control over tone and timing. Developers tweak how the words are spoken, opening up the possibilities for a more expressive, AI-driven voice.

For speech and text, Openai introduced GPT-4O transcription and GPT-4O-MINI-Transcribe. Both models outperform previous versions such as whispers by improving transfer accuracy across noisy settings and various accents. It handles real-world conversations more effectively and helps you with customer service, content creation and accessibility tools.

Bring AI speech to more developers

Openai involves these models in its APIs, allowing developers to connect to their applications. Pricing is competitive:

GPT-4O Transcription: $6 Audio Input Token per Million ($0.006 per Minute)

These updates streamline the process of integrating high-quality audio processing into your app, such as live customer support, automatic note-taking, or interactive voice assistants.

Openai FM and Community Engagement

To show what these models can do, Openai launched Openai.FM, a platform that allows users to test the functionality of text-to-speech. In addition to this, they have launched a contest to encourage creative applications of the latest technology. From personalized assistants to generating audio content, expect to see developers experimenting with new ways to use AI voices.

An interactive demo for developers to try out speech models from new text in the OpenAI API. (Credit: Openai)

According to Openai, three new cutting edge audio models of the API include “two voice-to-text models – out-performed whispers, new TTS models – allowing you to *tell how you * speak it, and the Agents SDK supports audio and allows you to easily build voice agents.”

Three new cutting edge audio models for the API:

🗣Chasing two speech to text model – perform a whisper
💬New TTS Model – You can tell it *How*

🤖Agent SDK now supports audio, making it easy to build voice agents.

Try TTS now at https://t.co/mbtolnyyca.

– Openai Developer (@openaidevs) March 20, 2025

Early reactions and industry impact

The launch has been well received, especially among developers looking for better transcription and speech synthesis options. Some early adopters, like Eliseai, have already integrated speech models from Openai’s text into their property management platform, reporting more natural and expressive speech interactions.

Openai has not stopped here. The company is working to expand its voice technology with more voice options, ultimately bringing AI-driven conversations closer to human-like exchanges. The huge movement of audio generated by AI has intensified the competition for voice technology.

Source link

What's Hot

How to find AI chatbots on AdultFriendFinder

The fastest-growing jobs in the creator economy aren’t in front of the camera.

Lee Suk-Quin explores the truth with new album “72RHR”

Openai launches next-generation audio AI models: smarter speeches and texts and expressive AI voices

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

How to find AI chatbots on AdultFriendFinder

The fastest-growing jobs in the creator economy aren’t in front of the camera.

Lee Suk-Quin explores the truth with new album “72RHR”

Vote for Sombre, Phoebe Bridgers and more

Vote for Sombre, Phoebe Bridgers and more

Bettina Anderson reveals the designer of her wedding dress

Queen Letizia of Madrid Sports Sleeveless Hugo Boss Dress

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Openai launches next-generation audio AI models: smarter speeches and texts and expressive AI voices

Speech and transcription upgraded from text

Bring AI speech to more developers

Openai FM and Community Engagement

Early reactions and industry impact

Related Posts