Speech 2.6 – A speech generation model launched by MiniMax

What is Speech 2.6？

Speech 2.6 is a new speech generation model launched by MiniMax, designed for the next generation of voice agents. It features ultra-low latency (under 250 ms) to ensure smooth real-time conversations, and supports the direct conversion of non-standard text formats such as URLs, email addresses, and phone numbers in multiple languages — without any complex preprocessing.
With Fluent LoRA technology, the model further enhances speech naturalness and voice cloning fluency, allowing it to generate high-quality speech even from imperfect or accented source material.
Speech 2.6 is suitable for various applications such as intelligent customer service and smart devices, supports 40+ languages, and delivers efficient, natural voice interaction experiences. The model is available through the MiniMax Open Platform and the MiniMax Audio official website.

Key Features of Speech 2.6

Ultra-Low Latency:
End-to-end latency is below 250 milliseconds, ensuring rapid and smooth audio generation in real-time scenarios like live conversations.
Professional Format Compatibility:
Supports direct conversion of URLs, email addresses, phone numbers, dates, and amounts across multiple languages — eliminating the need for tedious text preprocessing.
Enhanced Naturalness with Fluent LoRA:
Improves rhythm and tone naturalness, supports voice cloning while retaining unique voice traits such as accent or speech habits.
The Fluent LoRA technique enables more fluent and natural speech synthesis, producing high-quality output even from imperfect recordings.
Multilingual Support:
Supports 40+ languages, making it adaptable for global speech interaction scenarios.
Efficient Voice Interaction:
Ideal for intelligent customer service, smart devices, and other real-time voice interaction environments, delivering smooth and human-like communication.

How to Use Speech 2.6

Register and Log In:
Visit the MiniMax Audio official website, create an account, and log in.
Select Speech Synthesis:
In the left navigation panel, click on “Speech Synthesis” to enter the speech generation page.
Enter Text:
Input the text you wish to convert into speech in the text box.
Choose Voice and Model:
Below the text box, select your preferred voice tone (e.g., “Calm Executive”) and model (e.g., “speech-2.6-hd”).
Select Application Scenario:
Choose the application type according to your needs, such as “News Broadcasting”, “Storytelling”, or “Film Dubbing”.
Generate Audio:
Click the “Generate Audio” button. The platform will create the voice output based on your text and selected parameters.
Play or Download Audio:
You can play the generated voice online or download it locally for further use.

Application Scenarios of Speech 2.6

Customer Service:
Provides smooth and natural voice interactions in call centers or online customer support systems, enhancing user experience.
Audiobooks:
Generates high-quality speech for e-books, online articles, or educational content.
Voice Assistants:
Powers voice assistants in smart home devices, mobile phones, and automotive systems.
Broadcasting and Podcasts:
Produces professional-grade voice content for radio programs, news, and podcasts.
Language Learning:
Offers accurate pronunciation and speaking examples for language learning applications.