OpenAI o4-mini – A small inference model launched by OpenAI

What is OpenAI o4-mini?

OpenAI o4-mini is a compact inference model introduced by OpenAI, optimized for fast and cost-efficient reasoning. OpenAI o4-mini excels in mathematical, programming, and visual tasks, achieving the best performance among models in the AIME 2024 and 2025 benchmarks. It supports high-capacity, high-throughput inference tasks, making it ideal for quickly processing large volumes of questions. OpenAI o4-mini features multimodal capabilities, enabling it to integrate images into the reasoning process, support tool usage, and generate detailed and well-thought-out answers rapidly. Compared to its predecessor, OpenAI o4-mini demonstrates significant improvements in both performance and cost-effectiveness. Currently, ChatGPT Plus, Pro, and Team users can find OpenAI o4-mini and OpenAI o4-mini-high in the model selector, replacing o1, o3-mini, and o3-mini-high. ChatGPT Enterprise and Edu users will gain access within a week. Developers are supported to utilize the model via the Chat Completions API and Responses API.

The main functions of OpenAI o4-mini

Fast Reasoning: Excels at quickly handling mathematical, programming, and visual tasks, making it suitable for high-throughput scenarios.
Multimodal Capability: Combines images and text for reasoning, supporting image processing.
Tool Usage: Can invoke tools such as web search and Python programming to assist in problem-solving.
Cost-Effective: Offers superior performance compared to the previous generation o3-mini at the same price, making it the top choice for upgrades.
Safe and Reliable: Trained with safety measures, supports rejecting inappropriate requests.

Performance of OpenAI o4-mini

Mathematical Reasoning: In the AIME 2024 and 2025 benchmark tests, OpenAI o4-mini achieved an accuracy rate of up to 93.4% without using tools. After integrating Python, its accuracy soared to 98.7%, nearly reaching full marks. In terms of solving complex mathematical problems, OpenAI o4-mini outperforms its predecessor, o3-mini, and approaches the performance of the full version of o3 in certain tasks.
Programming Ability:
◦ SWE-Lancer: OpenAI o4-mini performs excellently, supporting the efficient completion of complex programming tasks with outstanding results.
◦ SWE-Bench Verified (Software Engineering Question Bank): OpenAI o4-mini demonstrates exceptional performance in common algorithms, system design, API calls, and other tasks, with higher accuracy and efficiency compared to o3-mini.
◦ Aider Polyglot Code Editing (Multilingual Code Editing Benchmark): OpenAI o4-mini exhibits outstanding performance in code editing tasks, including full rewriting and patch-based modifications, outperforming o3-mini.
Multimodal Capabilities:
◦ MMMU (Mathematics for Multimodal University-level understanding): OpenAI o4-mini supports solving problems by combining images and mathematical symbols, achieving an accuracy rate of 87.5%, significantly higher than the previous generation o1’s 71.8%.
◦ MathVista (Visual Mathematics Reasoning): OpenAI o4-mini performs excellently in visual mathematics reasoning tasks such as geometric shapes and function curves, with an accuracy rate of up to 87.5%.
◦ CharXiv-Reasoning (Scientific Chart Reasoning): OpenAI o4-mini can understand charts and diagrams in scientific papers, achieving an accuracy rate of 75.4%, which is significantly better than o1’s 55.1%.
Tool Usage:
◦ Scale MultiChallenge: OpenAI o4-mini supports handling complex multi-turn instruction tasks, correctly understanding and executing multi-turn instructions.
◦ BrowseComp Agentic Browsing: Performs browser-based tasks such as searching, clicking, paging, and integrating information. Its performance is close to that of o3 and significantly surpasses traditional AI search capabilities.
◦ Tau-bench Function Calling: Demonstrates stable performance in function calling tasks, supporting accurate generation of structured API calls. However, further optimization is needed for complex scenarios.
Comprehensive Tests:
◦ Expert-level Comprehensive Test (Humanity’s Last Exam): Achieved an accuracy rate of 14.3% without tools, which improved to 17.7% with the assistance of plugins. Although it falls short of o3’s 24.9%, it performs exceptionally well among smaller models.
◦ Interdisciplinary PhD-level Science Questions (GPQA Diamond): Demonstrated an accuracy rate of 81.4% on scientific questions, slightly lower than o3’s 83.3%, but still highly impressive for a smaller model.

The project address of OpenAI o4-mini

Project official website: https://openai.com/index/introducing-o4-mini/

Application scenarios of OpenAI o4-mini

Educational Tutoring: Assist students in solving math and programming problems.
Data Analysis: Quickly generate data charts and analysis results.
Software Development: Generate code snippets and assist in code debugging.
Content Creation: Provide creative inspiration and generate descriptions combined with images.
Daily Queries: Answer questions based on search and image analysis.