EVI 3 – A voice and language model launched by Hume AI

What is EVI 3?

EVI 3 is a next-generation voice language model launched by Hume AI. It processes both text and vocal tokens simultaneously, enabling natural and expressive voice-based interactions. The model supports a high degree of personalization, allowing users to generate any voice or personality based on prompts, while adjusting emotional tone and speaking style in real time. Compared to models like OpenAI’s GPT-4o, EVI 3 performs better in emotional understanding, expressiveness, naturalness, and response speed. It features ultra-low latency, capable of producing speech responses in as little as 300 milliseconds.

EVI 3 Key Features

Multimodal Interaction:
EVI 3 processes both text and voice inputs to produce rich, expressive voice and language responses, seamlessly merging spoken and written communication.
High Personalization:
Users can generate any voice or persona from prompts. EVI 3 can dynamically create over 100,000 distinct voice variations in real time.
Emotion and Style Control:
The model supports real-time modulation of emotional tone (from “excited” to “sad”) and speaking style (such as “pirate” or “whisper”), based on user instructions.
Real-Time Interaction:
EVI 3 delivers speech and language outputs within normal dialogue latency, ensuring smooth, real-time engagement.

Technical Foundations

Autoregressive Model:
EVI 3 uses a single autoregressive model to process both text (T) and voice (V) tokens simultaneously. This unified input method enables fluid and coherent voice synthesis.
System Prompts:
Prompts include both text and vocal tokens, acting as language instructions that shape the assistant’s speaking style and emotional tone.
Reinforcement Learning:
Leveraging reinforcement learning techniques, EVI 3 identifies and optimizes preferred traits of human speech to generate highly personalized voices.
Streaming Architecture:
The model uses streaming to deliver voice responses within normal dialogue latency, ensuring responsive and fluid real-time conversations.

Project Links

Official Site: https://www.hume.ai/blog/introducing-evi-3
Live Demo: https://demo.hume.ai/

Application Scenarios

Smart Customer Support:
Delivers smooth, natural voice conversations to efficiently resolve customer inquiries.
Voice Assistants:
Integrates into devices to offer highly personalized voice-based services.
Educational Tutoring:
Simulates conversations to assist with language learning and social skills training.
Emotional Support:
Provides comforting, emotionally aware responses tailored to users’ feelings.
Content Creation:
Generates emotionally rich voice content for use in audiobooks, storytelling, and more.