GPT-4.1 – OpenAI has launched a new generation of language model that supports up to one million tokens in context.

What is GPT – 4.1?

GPT-4.1 is the latest generation of language models introduced by OpenAI, comprising three versions: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. The series demonstrates significant improvements in coding ability, instruction following, and long-text processing, supporting a context window of up to 1 million tokens. In multiple benchmark tests, GPT-4.1 performs exceptionally well, achieving a score of 54.6% in the SWE-bench Verified test, which is 21.4 percentage points higher than GPT-4o. The GPT-4.1 series is also more cost-effective, making it the fastest and most affordable model currently available. Currently, the GPT-4.1 series is only accessible via API and has been made available to all developers.

The main functions of GPT – 4.1

Long-context Processing Capability: The GPT-4.1 series models support a long-context processing ability of up to 1 million tokens. This enables the models to handle longer texts, such as entire books or large codebases.
Multimodal Processing: The GPT-4.1 series of models have also been optimized for multimodal processing. The visual encoder and text encoder are separate, with cross-attention mechanisms in place. This design enables the model to better handle content that mixes images and text.
Code Generation and Optimization: GPT-4.1 significantly outperforms its predecessors in programming tasks. In the SWE-bench Verified test, it achieves an accuracy rate of 54.6%, a 21.4 percentage point improvement over GPT-4o. It can also explore codebases, write code, and generate test cases more efficiently.
Multi-language Support: In multilingual coding ability tests, GPT-4.1 demonstrates twice the performance of its predecessor, making it more efficient in handling multilingual programming tasks, code optimization, and version management.
Tool Invocation Efficiency: In real-world applications, such as Windsurf’s internal coding benchmark tests, GPT-4.1 scores 60% higher than GPT-4o, with a 30% improvement in tool invocation efficiency.
Complex Instruction Handling: GPT-4.1 excels in instruction following, reliably adhering to complex instructions. In Scale’s MultiChallenge benchmark test, it scores 10.5 percentage points higher than GPT-4o.
Multi-turn Conversation Ability: In multi-turn conversations, GPT-4.1 better tracks contextual information and maintains conversational coherence. It performs particularly well in handling challenging prompts, as evidenced by OpenAI’s internal instruction-following evaluations.
Ultra-large Context Window: The GPT-4.1 series supports a context processing capacity of up to 1 million tokens, eight times that of GPT-4o. This enables the model to handle ultra-long texts, such as 8 complete React source code repositories or hundreds of pages of documents.
Long-text Understanding: GPT-4.1 demonstrates enhanced accuracy in locating and extracting key information from long texts. In OpenAI’s long-context evaluation, it successfully identifies target text within a context of up to 1 million tokens.
Image Understanding: The GPT-4.1 series performs exceptionally well in image understanding. For instance, GPT-4.1 mini often surpasses GPT-4o in image benchmark tests.
Video Content Understanding: In the Video-MME test, GPT-4.1 interprets and answers multiple-choice questions about 30- to 60-minute unscripted videos, achieving a score of 72%, the current state-of-the-art performance.
Cost-effectiveness: The GPT-4.1 series offers enhanced performance at a lower cost. The mid-sized GPT-4.1 queries are 26% cheaper than GPT-4o, while GPT-4.1 nano is the most affordable and fastest model currently available from OpenAI.
Low Latency and High Efficiency: GPT-4.1 mini significantly reduces latency by nearly half and cuts costs by 83%, making it ideal for tasks requiring low-latency responses.

The Technical Principles of GPT-4.1

Optimization of the Transformer Architecture: GPT-4.1 is still based on the Transformer architecture but has undergone further optimizations. It enables the model to capture a broader range of contextual information during training. Through learning from a vast corpus, the GPT-4.1 series of models have learned how to maintain attention across a large amount of text, accurately locate relevant information, and improve their ability to solve complex tasks.
Mixture of Experts (MoE): To maintain high performance while reducing computational costs and storage requirements, GPT-4.1 adopts a Mixture of Experts model. The model consists of 16 independent expert models, each with 111 billion parameters. During each forward pass, two expert models are routed through, making the model more flexible and efficient in handling different data and task distributions.
Training Dataset: GPT-4.1 is trained on a dataset containing 13 trillion tokens. These tokens are not unique but are calculated based on the number of iterations. The massive dataset allows GPT-4.1 to learn more linguistic knowledge and contextual information during training, improving the model’s accuracy in natural language processing tasks.
Inference Optimization: GPT-4.1 employs various optimization techniques during the inference process, such as variable batch sizes and continuous batch processing. This greatly optimizes latency and reduces inference costs.
Cost Control: By adopting the Mixture of Experts model and optimized training and inference strategies, GPT-4.1 significantly reduces computational costs and storage requirements while maintaining high performance. This makes the model more cost-effective in practical applications.

Performance of the GPT – 4.1 Model Series

GPT-4.1: In terms of coding capabilities, GPT-4.1 achieves a score of 54.6% on the SWE-bench Verified test, marking a 21.4 percentage point improvement over GPT-4o, making it the leading coding model currently available. In terms of instruction following, it scores 10.5 percentage points higher than GPT-4o on the Scale MultiChallenge benchmark. For long-text understanding, the Video-MME test shows that GPT-4.1 achieves a score of 72.0% in the long-video-without-subtitles category, a 6.7 percentage point improvement over GPT-4o.
GPT-4.1 mini: GPT-4.1 mini demonstrates significant advancements in small model performance, surpassing GPT-4o in many benchmark tests. It matches the intelligence evaluation of GPT-4o while reducing latency by nearly half and cutting costs by 83%.
GPT-4.1 nano: As OpenAI’s first nano model, GPT-4.1 nano is currently the fastest and most cost-effective model. It scores 80.1% on MMLU, 50.3% on GPQA, and 9.8% on Aider polyglot coding, outperforming GPT-4o mini.

Project address of GPT – 4.1

Project official website: https://openai.com/index/gpt-4-1/

Pricing of the GPT – 4.1 model

GPT – 4.1: $2 per million input tokens, $8 per million output tokens.
GPT – 4.1 mini: $0.4 per million input tokens, $1.6 per million output tokens.
GPT – 4.1 nano: $0.1 per million input tokens, $0.4 per million output tokens.

Application scenarios of GPT – 4.1

Legal Field: In the area of legal document review, GPT-4.1’s multi-document review accuracy is 17% higher than that of GPT-4o, enabling more efficient processing of complex legal documents.
Financial Analysis: In financial data analysis, GPT-4.1 can extract key information from large documents more accurately, providing analysts with more comprehensive data support.
Front-End Development: In front-end programming, GPT-4.1 can create web applications that are not only more functional but also more aesthetically pleasing. The generated websites are preferred by paid human reviewers in 80% of cases.