What is Grok 4?
Grok 4 is the latest AI reasoning model developed by xAI. It offers 10x improved reasoning capabilities compared to its predecessor. Excelling in difficult exams such as the SAT and GRE, Grok 4 demonstrates near-perfect scores and outperforms many cutting-edge models across various benchmarks. It supports multimodal functions, understands subjective concepts, generates code and visual content, and introduces significant advancements in voice interaction. Grok 4 comes in two versions: the single-agent Grok 4, and Grok 4 Heavy, a multi-agent version that supports up to four agents working in parallel with a context window of up to 256k tokens.
Grok 4 – Key Features
-
Exceptional Reasoning Abilities: Achieves near-perfect scores in exams like SAT and GRE, demonstrating superhuman reasoning performance.
-
Multimodal Understanding: Capable of interpreting subjective concepts, conducting image analysis, and performing complex visual searches.
-
Information Aggregation & Summarization: Gathers information from social media and other sources, extracts key events, and presents them chronologically.
-
Code & Visual Generation: Can generate complex animations and scientific simulations (e.g., black hole collisions) based on textual prompts.
-
Enhanced Voice Interaction: Supports five new voice profiles with smoother dialogue and more natural emotional expression.
-
Complex Task Handling: Excels in simulation and strategy-based tasks, showcasing strong planning and execution capabilities.
-
Multi-Agent Collaboration: The SuperGrok Heavy version allows parallel processing with multiple intelligent agents to solve complex problems.
Grok 4 – Test Performance
Official Benchmarks:
-
Humanity’s Last Exam: Features 2,500 interdisciplinary expert-level questions. Grok 4 Heavy scores 44.4% with tools, potentially up to 50.7% with optimization.
-
AIME25 (Math Competition): Grok 4 Heavy scores a perfect 100%, outperforming all competitors.
-
GPQA (Graduate-Level QA): Scores 88.9%, ahead of Gemini 2.5 Pro (86.4%) and Claude 4 Opus (79.6%).
-
HMMT25 (High School Math Competition): Scores 96.7%, far surpassing Gemini 2.5 Pro (82.5%).
-
USAMO25 (USA Math Olympiad): Scores 61.9%, significantly beating Gemini DeepThink (49.4%) and Gemini 2.5 Pro (34.5%).
-
ARC-AGI (Abstract Reasoning): Scores 15.9%, nearly doubling the previous commercial SOTA.
-
Vending-Bench (Simulation Business Task): Grok 4 generates a net profit of $4,694, far exceeding Claude Opus 4 ($2,077) and human players ($844).
Third-Party Evaluation (Artificial Analysis):
-
AI Index Score: Grok 4 scores 73, ahead of OpenAI o3 (70), Gemini 2.5 Pro (70), Claude 4 Opus (64), and DeepSeek R1 0528 (68).
-
Coding & Math Indices: Grok 4 ranks first in both categories.
-
GPQA Diamond Score: Achieves a record-high 88%, surpassing Gemini 2.5 Pro (84%).
-
Humanity’s Last Exam Score: Reaches 24%, topping Gemini 2.5 Pro (21%).
-
Speed: Processes at 75 tokens/second—slower than o3 (188 t/s) and Gemini 2.5 Pro (142 t/s), but faster than Claude 4 Opus Thinking (66 t/s).
Grok 4 – Pricing
Subscription Plans:
-
SuperGrok: $30/month or $300/year
-
SuperGrok Heavy: $300/month or $3,000/year
API Pricing:
-
Input: $3 per million tokens
-
Output: $15 per million tokens
Grok 4 – Official Website
- Website: Grok
Grok 4 – Application Scenarios
-
Educational Tutoring: Offers personalized learning plans, answers complex academic questions, and enhances students’ understanding.
-
Scientific Research: Analyzes large datasets, predicts scientific trends, and aids in discovering new theories and technologies.
-
Business & Finance: Performs market analysis and forecasting to support strategic decisions and optimize operations.
-
Content Creation: Assists with idea generation and script writing in advertising, film, and gaming, enhancing creative productivity.
-
Intelligent Assistant: Functions as a multimodal voice assistant to help users manage daily tasks and improve life convenience.