KaLM-Embedding – A Text Embedding Model Series Launched by Tencent
What is KaLM-Embedding?
KaLM-Embedding is a series of high-performance text embedding models developed by Tencent. It enhances text representation quality through advanced training techniques and high-quality data. The latest version, KaLM-Embedding-V2, introduces several architectural and training innovations—such as removing the causal attention mask to enable bidirectional representation learning, and adopting a multi-stage training process (including pre-training, fine-tuning, and contrastive distillation)—significantly improving the model’s generalization and semantic understanding capabilities.
The newest release, KaLM-Embedding-Gemma3-12B-2511, is a major milestone in the series. With a larger parameter scale (12B parameters), it delivers even higher precision and performance, making it ideal for complex tasks requiring advanced semantic understanding.

Key Features of KaLM-Embedding
-
Efficient Text Embedding Generation:
KaLM-Embedding efficiently converts text into fixed-length embedding vectors, suitable for a wide range of NLP tasks such as retrieval, classification, and semantic matching. -
Multilingual and Cross-Lingual Capability:
Supports multilingual text embeddings, enabling semantic alignment and cross-lingual retrieval between different languages, improving performance in multilingual applications. -
Flexible Embedding Dimensions:
Supports flexible embedding dimensions using Matryoshka representation learning, maintaining high performance across different dimensional settings to suit diverse application needs. -
Strong Adaptability for Downstream Tasks:
Designed to perform well across tasks such as text classification, semantic matching, information retrieval, and clustering, providing comprehensive NLP support.
Technical Principles
-
Bidirectional Attention Mechanism:
Removes the traditional causal attention mask and adopts bidirectional attention, allowing the model to consider both left and right context, thus improving semantic accuracy. -
Mean Pooling:
Converts token sequences into fixed-length embeddings using simple mean pooling, ensuring compatibility across multiple downstream applications. -
Multi-Stage Training Process:
Combines pre-training, fine-tuning, and contrastive distillation stages to progressively enhance embedding quality.-
Pre-training uses large-scale weakly supervised data.
-
Fine-tuning leverages high-quality labeled datasets.
-
Contrastive distillation transfers fine-grained knowledge from stronger teacher models.
-
-
Focal Reweighting Mechanism:
Applies focal-style reweighting to focus more on difficult samples, improving learning efficiency for complex cases. -
Online Hard Negative Mixing:
Dynamically generates hard negative samples during training to maintain challenging contrasts, enhancing the model’s discriminative power. -
Matryoshka Representation Learning:
Enables flexible embedding dimensions while maintaining robust performance across sizes, making the model adaptable to various environments. -
High-Quality Data Foundation:
Trained on diverse, high-quality datasets incorporating instruction tuning, hard negative mining, and multi-label tasks to ensure embedding robustness. -
Contrastive Learning & Distillation:
Employs the InfoNCE loss function for contrastive learning and uses contrastive distillation to capture fine-grained soft signals from teacher models, further improving performance. -
Temperature Scaling:
Introduces temperature coefficients in contrastive distillation to optimize the distribution of learning signals and enhance learning efficiency. -
Flexible Model Architecture:
Built on compact yet efficient architectures (e.g., 0.5B parameters), offering high performance with resource efficiency.
Model Versions
-
KaLM-Embedding-V1:
The initial version with a compact architecture and causal attention mask, designed for foundational embedding tasks. -
KaLM-Embedding-V2:
Removes the causal mask to enable bidirectional representation learning and introduces a multi-stage training pipeline (pre-training, fine-tuning, contrastive distillation), leading to major performance improvements. -
KaLM-Embedding-V2.5:
Further refines V2 through enhanced contrastive distillation from stronger teacher models, boosting embedding quality and generalization. -
KaLM-Embedding-Gemma3-12B-2511:
The latest version with 12B parameters, delivering superior accuracy and performance for complex, high-precision tasks.
Project Links
-
Official Website: https://kalm-embedding.github.io/
-
Hugging Face Model Hub: https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511
-
arXiv Paper: https://arxiv.org/pdf/2506.20923
Application Scenarios
-
Text Classification:
Efficiently classifies text to identify topics or categories. -
Semantic Matching:
Accurately measures semantic similarity between texts, widely applicable in search engines and recommendation systems. -
Information Clustering:
Automatically groups semantically similar texts, facilitating large-scale data management and analysis. -
Search and Recommendation:
Improves search relevance and recommendation precision through deeper semantic understanding, enabling more personalized user experiences. -
Multilingual Understanding:
Excels in cross-lingual semantic alignment, enhancing retrieval and translation accuracy across multiple languages.