QwenLong-L1: An innovative large model for long-text reasoning

QwenLong – L1 is the first long – text reasoning large model trained through reinforcement learning. It has shown excellent performance in the long – text DocQA benchmark test, surpassing multiple similar models and demonstrating strong reasoning capabilities.

The innovation of this model lies in the adoption of a new reinforcement learning framework, which promotes the transition from short – text to long – text reasoning capabilities. Specifically, it includes a warm – up supervised fine – tuning stage, a curriculum – guided RL stage, and a difficulty – aware retrospective sampling mechanism, thereby enhancing the model’s adaptability in long – text reasoning.

In addition, a dataset specifically for RL training, DocQA – RL – 1.6K, has been released. This dataset covers mathematical, logical, and multi – hop reasoning problems, providing rich data support for the model’s training.