Particle News: OpenAI Launches Advanced Audio Models for Transcription and Voice Customization

Overview

OpenAI's new gpt-4o-transcribe models outperform the previous Whisper model, offering enhanced transcription accuracy in challenging conditions like background noise and varied accents.
The gpt-4o-mini-tts model introduces 'steerability,' allowing developers to control tone, emotion, and delivery style in generated speech for more natural interactions.
An updated Agents SDK enables developers to integrate voice capabilities into existing AI systems with minimal code, simplifying the transition from text-based to voice-based agents.
The models are priced competitively, with transcription costs ranging from $0.003 to $0.006 per minute and text-to-speech at $0.015 per minute, making them accessible to developers.
Despite improvements, challenges remain in supporting certain languages like Tamil and Telugu, and OpenAI has opted to keep the models proprietary due to their size and complexity.