Grok 4 – the latest reasoning model released by xAI

What is Grok 4?

Grok 4 is the latest AI reasoning model developed by xAI. It offers 10x improved reasoning capabilities compared to its predecessor. Excelling in difficult exams such as the SAT and GRE, Grok 4 demonstrates near-perfect scores and outperforms many cutting-edge models across various benchmarks. It supports multimodal functions, understands subjective concepts, generates code and visual content, and introduces significant advancements in voice interaction. Grok 4 comes in two versions: the single-agent Grok 4, and Grok 4 Heavy, a multi-agent version that supports up to four agents working in parallel with a context window of up to 256k tokens.

Grok 4 – Key Features

Exceptional Reasoning Abilities: Achieves near-perfect scores in exams like SAT and GRE, demonstrating superhuman reasoning performance.
Multimodal Understanding: Capable of interpreting subjective concepts, conducting image analysis, and performing complex visual searches.
Information Aggregation & Summarization: Gathers information from social media and other sources, extracts key events, and presents them chronologically.
Code & Visual Generation: Can generate complex animations and scientific simulations (e.g., black hole collisions) based on textual prompts.
Enhanced Voice Interaction: Supports five new voice profiles with smoother dialogue and more natural emotional expression.
Complex Task Handling: Excels in simulation and strategy-based tasks, showcasing strong planning and execution capabilities.
Multi-Agent Collaboration: The SuperGrok Heavy version allows parallel processing with multiple intelligent agents to solve complex problems.

Grok 4 – Test Performance

Official Benchmarks:

Humanity’s Last Exam: Features 2,500 interdisciplinary expert-level questions. Grok 4 Heavy scores 44.4% with tools, potentially up to 50.7% with optimization.
AIME25 (Math Competition): Grok 4 Heavy scores a perfect 100%, outperforming all competitors.
GPQA (Graduate-Level QA): Scores 88.9%, ahead of Gemini 2.5 Pro (86.4%) and Claude 4 Opus (79.6%).
HMMT25 (High School Math Competition): Scores 96.7%, far surpassing Gemini 2.5 Pro (82.5%).
USAMO25 (USA Math Olympiad): Scores 61.9%, significantly beating Gemini DeepThink (49.4%) and Gemini 2.5 Pro (34.5%).
ARC-AGI (Abstract Reasoning): Scores 15.9%, nearly doubling the previous commercial SOTA.
Vending-Bench (Simulation Business Task): Grok 4 generates a net profit of $4,694, far exceeding Claude Opus 4 ($2,077) and human players ($844).

Grok 4 – the latest reasoning model released by xAI

Third-Party Evaluation (Artificial Analysis):

AI Index Score: Grok 4 scores 73, ahead of OpenAI o3 (70), Gemini 2.5 Pro (70), Claude 4 Opus (64), and DeepSeek R1 0528 (68).
Coding & Math Indices: Grok 4 ranks first in both categories.
GPQA Diamond Score: Achieves a record-high 88%, surpassing Gemini 2.5 Pro (84%).
Humanity’s Last Exam Score: Reaches 24%, topping Gemini 2.5 Pro (21%).
Speed: Processes at 75 tokens/second—slower than o3 (188 t/s) and Gemini 2.5 Pro (142 t/s), but faster than Claude 4 Opus Thinking (66 t/s).

Grok 4 – the latest reasoning model released by xAI

Grok 4 – Pricing

Subscription Plans:

SuperGrok: $30/month or $300/year
SuperGrok Heavy: $300/month or $3,000/year

API Pricing:

Input: $3 per million tokens
Output: $15 per million tokens

Grok 4 – the latest reasoning model released by xAI

Grok 4 – Official Website

Website: Grok

Grok 4 – Application Scenarios

Educational Tutoring: Offers personalized learning plans, answers complex academic questions, and enhances students’ understanding.
Scientific Research: Analyzes large datasets, predicts scientific trends, and aids in discovering new theories and technologies.
Business & Finance: Performs market analysis and forecasting to support strategic decisions and optimize operations.
Content Creation: Assists with idea generation and script writing in advertising, film, and gaming, enhancing creative productivity.
Intelligent Assistant: Functions as a multimodal voice assistant to help users manage daily tasks and improve life convenience.