Stable Audio Open Small – A text-to-audio generation model launched by Stability AI and Arm

What is Stable Audio Open Small?

Stable Audio Open Small is a lightweight text-to-audio generation model jointly developed by Stability AI and Arm. It is based on the original Stable Audio Open model but has been significantly reduced in size — from 1.1 billion to 341 million parameters — enabling faster generation and efficient deployment on mobile and edge devices, such as smartphones.

The model utilizes Arm’s KleidiAI technology to optimize performance on edge devices, reduce computational costs, and eliminate the need for complex hardware support. It’s particularly suited for real-time audio generation tasks like drum loops and sound effects on resource-constrained devices.

Key Features of Stable Audio Open Small

Text-to-Audio Generation: Converts user-provided text prompts into corresponding audio outputs — such as instrument sounds, ambient effects, or simple music clips.
Fast Audio Generation: Capable of generating audio on mobile devices in under 8 seconds, ideal for real-time applications.
Lightweight Design: Reduced parameter size from 1.1B to 341M, making it suitable for devices with limited computing resources.
Efficient On-Device Performance: Optimized for high-efficiency performance on edge devices, minimizing computation and power usage.
Versatile Audio Outputs: Supports the generation of short audio samples, sound effects, instrument riffs, and ambient textures — useful for creative audio production and interactive applications.

Technical Foundations of Stable Audio Open Small

Deep Learning-Based Generation: Trained on large-scale audio datasets using advanced neural network architectures (e.g., Transformer-based models), the system encodes and decodes both text and audio for accurate generation.
Parameter Optimization: The model is significantly compressed (from 1.1B to 341M parameters), reducing complexity and computational load while maintaining high-quality output. Techniques such as quantization and pruning further enhance efficiency.
Edge Computing Optimization: Built with Arm’s KleidiAI library to optimize model execution on Arm CPUs, enabling high-performance audio generation on mobile and edge hardware.
Efficient Inference Engine: The inference pipeline is highly optimized for real-time generation, ensuring fast response times and smooth user experiences on mobile platforms through enhanced algorithms and hardware compatibility.

Project Links

Official Website: https://stability.ai/news/stability-ai-and-arm-release-stable-audio-open-small
GitHub Repository: https://github.com/Stability-AI/stable-audio-tools
Hugging Face Model Hub: https://huggingface.co/stabilityai/stable-audio-open-small
arXiv Technical Paper: https://arxiv.org/pdf/2505.08175

Application Scenarios for Stable Audio Open Small

Mobile Music Creation: Instantly generate music clips and sound effects on smartphones, enabling on-the-go music production.
Game Sound Effects: Real-time background music and sound effect generation for games, enhancing immersion.
Video Scoring: Helps video creators quickly generate suitable background music and audio cues, improving production efficiency.
Smart Device Audio: Enables smart speakers and other devices to generate customized sound effects for a more intelligent user experience.
Educational Tools: Generates instructional sounds and background music to make educational content more engaging and appealing.