Qwen3Guard – A safety and protection model launched by Alibaba Tongyi

What is Qwen3Guard？

Qwen3Guard is the first safety-protection guardrail model in the Qwen family, launched by Alibaba’s Tongyi team. Built on the powerful Qwen3 architecture, the model is specially fine-tuned for safety classification tasks. It can efficiently identify potential risks in both user prompts and model-generated responses, and outputs fine-grained risk levels and classification labels.

Qwen3Guard comes in two specialized versions:

Qwen3Guard-Gen (Generative Version): Designed for safety annotation of offline datasets.
Qwen3Guard-Stream (Streaming Detection Version): Designed for real-time safety detection in online services.

Supporting 119 languages and dialects, Qwen3Guard provides comprehensive multilingual coverage, delivering precise and reliable safety assurance for AI interactions.

Qwen3Guard – A safety and protection model launched by Alibaba Tongyi

Main Features

Efficient Risk Identification: Accurately detects potential risks in user inputs and model outputs, providing fine-grained risk levels (Safe, Controversial, Unsafe) and classification labels (e.g., violence, illegal activity, sexual content).
Real-time Streaming Detection: Performs content moderation in real time as responses are generated word by word, ensuring safety without sacrificing response speed.
Multilingual Support: Covers 119 languages and dialects, enabling global deployment and cross-lingual applications with stable and high-quality safety detection.
Flexible Safety Policies: Introduces a “Controversial” label, allowing safety strategies to be flexibly adjusted. Depending on the scenario, controversial content can be dynamically reclassified as “Safe” or “Unsafe.”
Reinforcement Learning & Dynamic Intervention: Serves as a reward signal in reinforcement learning to enhance intrinsic safety, or instantly intercepts risky content during generation to ensure controllable outputs.

Technical Principles of Qwen3Guard

Architecture Design:
- Qwen3Guard-Gen: Built on the Qwen3 architecture, trained with Supervised Fine-Tuning (SFT), transforming safety classification tasks into instruction-following tasks to generate structured safety evaluation outputs.
- Qwen3Guard-Stream: Adds two lightweight classification heads to the final Transformer layer, receiving token-by-token outputs during generation to provide real-time classification results for streaming detection.
Data Collection & Annotation: Uses the Self-Instruct framework to synthesize diverse prompts, combined with human-written and model-generated responses. An automatic multi-model voting mechanism ensures data quality and consistent annotation.
Training Methods: Applies data rebalancing strategies to construct the “Controversial” label, adjusting the Safe/Unsafe ratio to approximate decision boundaries. Knowledge distillation is used to filter labeling noise and improve classification accuracy.
Real-time Detection Mechanism: With token-level classification heads, the system continuously monitors generation. Once risky content is detected, intervention mechanisms are triggered immediately to ensure safe generation.

Project Resources

Official Website: https://qwen.ai/blog?id=f0bbad0677edf58ba93d80a1e12ce458f7a80548&from=research.research-list
GitHub Repository: https://github.com/QwenLM/Qwen3Guard
HuggingFace Model Hub: https://huggingface.co/collections/Qwen/qwen3guard-68d2729abbfae4716f3343a1
Technical Paper: https://github.com/QwenLM/Qwen3Guard/blob/main/Qwen3Guard_Technical_Report.pdf

Application Scenarios

Content Moderation: Real-time detection and filtering of harmful information on social media and online forums to ensure content safety.
Intelligent Customer Service: Ensures that customer service systems do not generate inappropriate responses, improving user experience and protecting privacy.
Education: Prevents online education platforms and tutoring systems from producing misleading or inappropriate content, ensuring a safe and healthy learning environment.
Healthcare: Ensures that medical consultation and mental health support systems generate content aligned with medical ethics, avoiding potential harm to users.
Government & Public Safety: Detects and flags potential security threats in public information in real time, ensuring government communications comply with laws and regulations.