KaLM-Embedding – A Text Embedding Model Series Launched by Tencent

What is KaLM-Embedding？

KaLM-Embedding is a series of high-performance text embedding models developed by Tencent. It enhances text representation quality through advanced training techniques and high-quality data. The latest version, KaLM-Embedding-V2, introduces several architectural and training innovations—such as removing the causal attention mask to enable bidirectional representation learning, and adopting a multi-stage training process (including pre-training, fine-tuning, and contrastive distillation)—significantly improving the model’s generalization and semantic understanding capabilities.
The newest release, KaLM-Embedding-Gemma3-12B-2511, is a major milestone in the series. With a larger parameter scale (12B parameters), it delivers even higher precision and performance, making it ideal for complex tasks requiring advanced semantic understanding.

Key Features of KaLM-Embedding

Efficient Text Embedding Generation:
KaLM-Embedding efficiently converts text into fixed-length embedding vectors, suitable for a wide range of NLP tasks such as retrieval, classification, and semantic matching.
Multilingual and Cross-Lingual Capability:
Supports multilingual text embeddings, enabling semantic alignment and cross-lingual retrieval between different languages, improving performance in multilingual applications.
Flexible Embedding Dimensions:
Supports flexible embedding dimensions using Matryoshka representation learning, maintaining high performance across different dimensional settings to suit diverse application needs.
Strong Adaptability for Downstream Tasks:
Designed to perform well across tasks such as text classification, semantic matching, information retrieval, and clustering, providing comprehensive NLP support.

Technical Principles

Bidirectional Attention Mechanism:
Removes the traditional causal attention mask and adopts bidirectional attention, allowing the model to consider both left and right context, thus improving semantic accuracy.
Mean Pooling:
Converts token sequences into fixed-length embeddings using simple mean pooling, ensuring compatibility across multiple downstream applications.
Multi-Stage Training Process:
Combines pre-training, fine-tuning, and contrastive distillation stages to progressively enhance embedding quality.
- Pre-training uses large-scale weakly supervised data.
- Fine-tuning leverages high-quality labeled datasets.
- Contrastive distillation transfers fine-grained knowledge from stronger teacher models.
Focal Reweighting Mechanism:
Applies focal-style reweighting to focus more on difficult samples, improving learning efficiency for complex cases.
Online Hard Negative Mixing:
Dynamically generates hard negative samples during training to maintain challenging contrasts, enhancing the model’s discriminative power.
Matryoshka Representation Learning:
Enables flexible embedding dimensions while maintaining robust performance across sizes, making the model adaptable to various environments.
High-Quality Data Foundation:
Trained on diverse, high-quality datasets incorporating instruction tuning, hard negative mining, and multi-label tasks to ensure embedding robustness.
Contrastive Learning & Distillation:
Employs the InfoNCE loss function for contrastive learning and uses contrastive distillation to capture fine-grained soft signals from teacher models, further improving performance.
Temperature Scaling:
Introduces temperature coefficients in contrastive distillation to optimize the distribution of learning signals and enhance learning efficiency.
Flexible Model Architecture:
Built on compact yet efficient architectures (e.g., 0.5B parameters), offering high performance with resource efficiency.

Model Versions

KaLM-Embedding-V1:
The initial version with a compact architecture and causal attention mask, designed for foundational embedding tasks.
KaLM-Embedding-V2:
Removes the causal mask to enable bidirectional representation learning and introduces a multi-stage training pipeline (pre-training, fine-tuning, contrastive distillation), leading to major performance improvements.
KaLM-Embedding-V2.5:
Further refines V2 through enhanced contrastive distillation from stronger teacher models, boosting embedding quality and generalization.
KaLM-Embedding-Gemma3-12B-2511:
The latest version with 12B parameters, delivering superior accuracy and performance for complex, high-precision tasks.

Project Links

Official Website: https://kalm-embedding.github.io/
Hugging Face Model Hub: https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511
arXiv Paper: https://arxiv.org/pdf/2506.20923

Application Scenarios

Text Classification:
Efficiently classifies text to identify topics or categories.
Semantic Matching:
Accurately measures semantic similarity between texts, widely applicable in search engines and recommendation systems.
Information Clustering:
Automatically groups semantically similar texts, facilitating large-scale data management and analysis.
Search and Recommendation:
Improves search relevance and recommendation precision through deeper semantic understanding, enabling more personalized user experiences.
Multilingual Understanding:
Excels in cross-lingual semantic alignment, enhancing retrieval and translation accuracy across multiple languages.