Particle.news

Download on the App Store

OpenAI Launches Advanced Audio Models for Transcription and Voice Customization

The new models, gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts, offer improved accuracy, steerable voice generation, and seamless integration with existing AI systems.

  • OpenAI's new gpt-4o-transcribe models outperform the previous Whisper model, offering enhanced transcription accuracy in challenging conditions like background noise and varied accents.
  • The gpt-4o-mini-tts model introduces 'steerability,' allowing developers to control tone, emotion, and delivery style in generated speech for more natural interactions.
  • An updated Agents SDK enables developers to integrate voice capabilities into existing AI systems with minimal code, simplifying the transition from text-based to voice-based agents.
  • The models are priced competitively, with transcription costs ranging from $0.003 to $0.006 per minute and text-to-speech at $0.015 per minute, making them accessible to developers.
  • Despite improvements, challenges remain in supporting certain languages like Tamil and Telugu, and OpenAI has opted to keep the models proprietary due to their size and complexity.
Hero image