Table of Contents
- 1. Release Overview — Date, Availability, Spec Sheet
- 2. What's New — 5 Key Improvements
- 3. Benchmarks — Head-to-Head With Claude and Gemini
- 4. GPT-5.5 vs GPT-5.5 Pro — Which Should You Use?
- 5. Pricing — Why the 2x Hike?
- 6. ChatGPT Plan Availability
- 7. API Specs and Developer Features
- 8. Codex Integration and the Super-app Strategy
- 9. GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro
- 10. The Catch — Watch Out for Overconfidence
- 11. When to Pick GPT-5.5 — By Use Case
- FAQ
On April 23, 2026, OpenAI released "ChatGPT 5.5 (GPT-5.5)." Billed by OpenAI itself as "a new class of intelligence for real work and AI agents," the model posted 82.7% on Terminal-Bench 2.0, pulling well ahead of Anthropic's Claude Opus 4.7 (69.4%) and Google's Gemini 3.1 Pro (68.5%) and reclaiming SOTA (state of the art) on 14 benchmarks in the process.
At the same time, the API list price doubled compared to GPT-5.4 ($5/$30 per MTok), and Claude Opus 4.7 still beats it on SWE-Bench Pro. The honest read is "the strongest, but not the universal best." There's also a documented tendency to answer confidently about things it doesn't actually know — something to be careful about in medical, legal, or regulated work.
Drawing on OpenAI's official release, the OpenAI Developer Docs, and several independent benchmark reports, this article gives you the full breakdown: what's new, the benchmarks, pricing, plan availability, how it stacks up against Claude and Gemini, and how to pick the right model for what you're doing.
1. Release Overview — Date, Availability, Spec Sheet
| Item | Details |
|---|---|
| Official name | GPT-5.5 (shown as "ChatGPT 5.5" in ChatGPT) |
| Release date | April 23, 2026 |
| Built by | OpenAI |
| Variants | GPT-5.5 (standard) / GPT-5.5 Pro (deep reasoning) |
| Context window | API: 1,050,000 tokens (~1M) / Codex: 400K tokens |
| Max output | 128,000 tokens |
| Knowledge cutoff | December 1, 2025 |
| API pricing (standard) | $5 / 1M tokens (input) / $0.50 (cached input) / $30 (output) |
| API pricing (Pro) | $30 (input) / $180 (output) |
| Long-prompt surcharge | Above 272K tokens: input 2x, output 1.5x |
| Modalities | Text in/out, image input (no audio or video) |
| ChatGPT plans | Plus / Pro / Business / Enterprise (not on Free) |
| Key features | Function calling, structured outputs, streaming, reasoning effort control, Computer Use, MCP support |
2. What's New — 5 Key Improvements
1. A Full Base Model Retrain (First Since GPT-4.5)
GPT-5.5 is the first complete base model retrain since GPT-4.5. GPT-5, 5.1, 5.2, and 5.4 were all fine-tuned variants of the same underlying base, but GPT-5.5 was rebuilt from the ground up. The result is improvements to reasoning efficiency and updated knowledge at the same time.
2. Major Token Efficiency Gains (~40% Reduction)
GPT-5.5 keeps the same per-token latency as GPT-5.4 while cutting output tokens needed to complete Codex tasks by roughly 40%. The list price doubled, but because output volume drops by 40%, OpenAI says the total cost for the same work usually grows by less than you'd expect.
From OpenAI co-founder Greg Brockman:
"It's a model that thinks faster and sharper with fewer tokens — that kind of model, compared to something like 5.4."
3. ~1M Context Window (API)
The API version expands to 1,050,000 tokens (~1M). The Codex integration is 400K. ~1M tokens is roughly 1,400 pages of A4 text. Just remember the metered surcharge: prompts above 272K tokens incur 2x input and 1.5x output pricing, so very long-context workloads need a cost model.
4. Five-Level Reasoning Effort Control
The API exposes reasoning.effort with five settings:
- none: no reasoning (fastest, cheapest)
- low: light reasoning
- medium: default (balanced)
- high: deep reasoning (complex tasks)
- xhigh: maximum reasoning (slowest and most expensive, highest accuracy)
This mirrors the output_config.effort knob on Claude Opus 4.7, and the industry as a whole is converging on "let the caller dial reasoning depth."
5. Expert-SWE at 73.1% — Handles 20-Hour-Class Tasks
On OpenAI's newly published internal eval Expert-SWE (extremely complex coding tasks with a median 20-hour human completion time), GPT-5.5 hit 73.1% — up 5.6 points from GPT-5.4's 68.5%. That's a big jump for long-running autonomous coding agent reliability.
3. Benchmarks — Head-to-Head With Claude and Gemini
| Benchmark | GPT-5.5 | Claude Opus 4.7 | Gemini 3.1 Pro | Winner |
|---|---|---|---|---|
| Terminal-Bench 2.0 | 82.7% | 69.4% | 68.5% | GPT-5.5 |
| GDPval (44-occupation knowledge work) | 84.9% | — | — | GPT-5.5 |
| OSWorld-Verified (PC automation) | 78.7% | 78.0% | — | GPT-5.5 (narrowly) |
| BrowseComp | 84.4% (Pro: 90.1%) | — | — | GPT-5.5 Pro |
| FrontierMath Tier 4 | 35.4% (Pro: 39.6%) | 22.9% | 16.7% | GPT-5.5 |
| SWE-Bench Pro | 58.6% | 64.3% | — | Claude Opus 4.7 |
| Tau2-bench Telecom (customer support) | 98.0% | — | — | GPT-5.5 |
| GPQA Diamond | 93.6% | — | — | GPT-5.5 |
| Expert-SWE (OpenAI internal) | 73.1% | — | — | GPT-5.5 |
Bottom Line: GPT-5.5 Holds SOTA on 14 Benchmarks, Claude on 4, Gemini on 2
Across OpenAI's published benchmark set, GPT-5.5 holds SOTA on 14 benchmarks, Claude Opus 4.7 on 4, and Gemini 3.1 Pro on 2. The overall edge clearly belongs to GPT-5.5.
That said, on SWE-Bench Pro (production-grade coding tasks), Claude Opus 4.7 still wins at 64.3% vs GPT-5.5's 58.6%. For coding work, splitting models by task is still worth doing.
Third-Party Verification: CodeRabbit Code Review Eval
Independent code review service CodeRabbit reports the following GPT-5.5 improvements:
- Curated benchmark: expected issue detection 58.3% → 79.2%, precision 27.9% → 40.6%
- Real-world dataset: issue detection 55.0% → 65.0%, precision 11.6% → 13.2%
CodeRabbit's read: "the model prefers local changes, preserves behavior, and tends to focus on actual failure points." Translation — instead of sweeping rewrites, it leans toward targeted, accurate fixes.
4. GPT-5.5 vs GPT-5.5 Pro — Which Should You Use?
| Item | GPT-5.5 (standard) | GPT-5.5 Pro |
|---|---|---|
| API pricing (input) | $5 / 1M tokens | $30 / 1M tokens (6x) |
| API pricing (output) | $30 / 1M tokens | $180 / 1M tokens (6x) |
| BrowseComp | 84.4% | 90.1% |
| FrontierMath Tier 4 | 35.4% | 39.6% |
| ChatGPT plans | Plus / Pro / Business / Enterprise | Pro / Business / Enterprise only |
| Best for | Day-to-day tasks, coding, agents | Scientific research, complex math, deep reasoning |
How to Choose
- Pick standard GPT-5.5: general coding, writing, agent workloads, cost-conscious use
- Pick GPT-5.5 Pro: math and scientific research, paper drafting, complex decision-making — accuracy over cost
5. Pricing — Why the 2x Hike?
API Pricing (Standard GPT-5.5)
| Item | Price | Notes |
|---|---|---|
| Input | $5.00 / 1M tokens | 2x GPT-5.4 |
| Cached input | $0.50 / 1M tokens | 1/10 of regular input |
| Output | $30.00 / 1M tokens | 2x GPT-5.4 |
| Long prompts (>272K tokens) | Input 2x, output 1.5x | Applied to the whole session |
| Batch API / Flex | 50% discount | For asynchronous workloads |
| Priority processing | 2.5x | For low-latency requirements |
| Regional processing (data residency) | +10% | For compliance use cases |
Why the 2x Hike?
OpenAI hasn't directly explained the price increase, but the likely drivers are:
- Cost of a full base model retrain — the first ground-up rebuild since GPT-4.5
- Pricing in performance gains — significant improvements on Terminal-Bench and others
- Token efficiency offsets some of the pain — 40% fewer output tokens partially balances the higher unit price
For output-heavy workloads, the effective cost increase works out to roughly "2x x 0.6 = 1.2x." But for input-heavy tasks (summarization, analysis), you take the full 2x hit head-on — keep that in mind.
6. ChatGPT Plan Availability
| Plan | Monthly | GPT-5.5 | GPT-5.5 Pro | Codex |
|---|---|---|---|---|
| Free | $0 | No | No | No |
| Plus | $20/mo | Yes | No | Yes |
| Pro | $200/mo | Yes | Yes | Yes (incl. Fast Mode) |
| Business | Usage-based | Yes | Yes | Yes |
| Enterprise | Custom | Yes | Yes | Yes |
Free Users Stay on GPT-5 (or 5.4)
GPT-5.5 isn't available on the Free plan — Free users continue on GPT-5 (or 5.4). Plus ($20/mo) is the entry point.
7. API Specs and Developer Features
Supported Features
- Function calling
- Structured outputs (JSON Schema)
- Streaming
- Reasoning effort control (none/low/medium/high/xhigh)
- Tools: web search, file search, image generation, Code Interpreter, Hosted Shell, Apply Patch, Skills, Computer Use, MCP, Tool Search
- Distillation (to smaller models)
- Fine-tuning: not supported at launch
- Audio / video in or out: not supported
Rate Limits (Tier 5: highest)
- RPM (requests per minute): 15,000
- TPM (tokens per minute): 40,000,000
- Batch queue limit: 15,000,000,000
Reasoning Effort Example (Python)
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5.5",
reasoning={"effort": "high"}, # none/low/medium/high/xhigh
input="Solve this complex math problem step by step..."
)
print(response.output_text)
8. Codex Integration and the Super-app Strategy
Codex Fast Mode
Alongside the GPT-5.5 release, Codex gained a Fast Mode:
- 1.5x faster processing
- 2.5x the cost (in credits)
- Available on Pro / Business / Enterprise plans
The Super-app Strategy
OpenAI also unveiled a "Super-app" vision that bundles "ChatGPT + Codex + AI browser" into one offering. The plan is to deliver these as a single enterprise package — what OpenAI calls "a step toward more agentic, more intuitive computing."
Conceptually, this is the "all-in-one package to maximize developer experience" pattern from PaaS like Vercel or Next.js, brought into the AI agent space.
9. GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro
| Item | GPT-5.5 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|
| Released | April 23, 2026 | April 16, 2026 | Early 2026 |
| API input | $5/MTok | $5/MTok | Not disclosed |
| API output | $30/MTok | $25/MTok | Not disclosed |
| Context | 1,050K | 1,000K (200K standard) | 1,000K |
| Knowledge cutoff | Dec 1, 2025 | ~early 2025 | ~early 2025 |
| SOTA benchmarks | 14 | 4 | 2 |
| Terminal-Bench 2.0 | 82.7% | 69.4% | 68.5% |
| SWE-Bench Pro | 58.6% | 64.3% | — |
| FrontierMath T4 | 35.4% | 22.9% | 16.7% |
| Strengths | Agents, long-running tasks, PC operation | Long coding sessions, safety, long-form writing | Multimodal, Google Workspace integration |
How to Choose
- Best overall + cutting-edge agent performance — GPT-5.5 (especially if >$30/MTok output is acceptable)
- Long autonomous coding and safety-first work — Claude Opus 4.7 (wins on SWE-Bench Pro and has cheaper output pricing)
- Google Workspace integration and multimodal — Gemini 3.1 Pro
10. The Catch — Watch Out for Overconfidence
Independent analysis (Handy AI) flags a tendency in GPT-5.5 to "answer confidently about things it doesn't actually know."
"The model knows more, but it also answers more confidently about things it doesn't know."
High-Risk Use Cases
- Medical diagnosis or prescriptions — wrong information can be life-threatening
- Legal advice or case research — citing hallucinated cases is a professional ethics issue
- Financial advice or tax work — regulatory exposure
- Citations in academic writing — known cases of citing non-existent papers
Mitigations
- Mandatory fact-checking — never use AI output as-is; verify against primary sources
- Use the web search tool — make the model fetch real-time information
- Cross-check against Claude Opus 4.7 — for accuracy-critical work, run answers past multiple models
- Tell it to say "I don't know" — instruct via system prompt: "if uncertain, say so explicitly"
11. When to Pick GPT-5.5 — By Use Case
Pick GPT-5.5 When
- Long-running autonomous coding agents — Expert-SWE 73.1% is best-in-class
- PC automation / Computer Use — OSWorld 78.7% is on par with Opus 4.7
- Customer support automation — Tau2-bench 98.0% is essentially perfect
- Advanced math and scientific research — FrontierMath T4 35.4% (well above Opus 4.7's 22.9%)
- You're committed to the OpenAI ecosystem — integrates with ChatGPT, Codex, Operator
Skip GPT-5.5 When
- SWE-Bench Pro–level production coding — Claude Opus 4.7 still leads
- Accuracy-critical work (medical, legal, financial) — watch out for hallucinations
- Cost is the top priority — $30/MTok output is at the high end
- You want a free option — not available on the Free plan
- Audio or video processing — text + image input only
FAQ
Q1. When did GPT-5.5 become available in ChatGPT?
April 23, 2026 (US time), on the Plus, Pro, Business, and Enterprise plans. GPT-5.5 Pro is limited to Pro, Business, and Enterprise.
Q2. Can I use GPT-5.5 on the Free plan?
No. The Free plan stays on GPT-5 (or 5.4). To access GPT-5.5 you need at least the $20/mo Plus plan.
Q3. GPT-5.5 vs Claude Opus 4.7 — which is better?
Overall, GPT-5.5 (SOTA on 14 benchmarks vs Claude's 4). But on SWE-Bench Pro, Claude Opus 4.7 wins 64.3% to 58.6% — so for production-grade coding, Claude has the edge. Claude is also cheaper on output ($25/MTok vs GPT-5.5's $30/MTok).
Q4. The API got more expensive — how do I keep costs under control?
Yes, $5/$30 per MTok is 2x GPT-5.4. But output token usage drops about 40%, so for output-heavy workloads the real cost increase typically lands around 1.2x. Cost control tips:
1. Use Batch API / Flex (50% discount)
2. Use cached input ($0.50/MTok, 1/10 of regular)
3. Use reasoning.effort=low for lighter tasks
4. Avoid prompts above 272K tokens
Q5. What's actually different about GPT-5.5 Pro?
It has stronger reasoning, with notable score lifts on complex math (FrontierMath: 35.4% → 39.6%) and scientific research tasks. The catch is the price — 6x the API rate ($30 input / $180 output). Outside research and serious paper-writing use cases, the price/performance often doesn't pencil out.
Q6. Is fine-tuning supported?
Not as of April 2026. Distillation (training smaller models from outputs) is supported, so you can use GPT-5.5 outputs to train something like GPT-5 nano.
Q7. Anything to watch out for when using the 1M context?
Prompts above 272K tokens trigger a surcharge of 2x input and 1.5x output across the entire session. If you're designing an API around 1M-token usage, run the cost numbers up front.
Q8. What's GPT-5.5's knowledge cutoff?
December 1, 2025. Anything after that (Jan 2026 onward) isn't in training data, so the web search tool is effectively required for current information.
Q9. Are hallucinations any better?
Independent analysis says "the knowledge base grew, but so did the model's confidence about things it doesn't know." OpenAI claims safety improvements officially, but for medical, legal, or financial work, fact-checking remains mandatory.
Q10. Will my existing GPT-5 app just work?
API compatibility is preserved — switching the model ID from gpt-5 to gpt-5.5 is enough to migrate. That said, taking advantage of new features (like the reasoning.effort parameter, or specifying the Pro variant ID) is worth a design pass.
Wrapping Up: GPT-5.5 Is the Strongest, but Not the Universal Best
GPT-5.5 holds SOTA on 14 benchmarks and pulls clearly ahead of Claude Opus 4.7 and Gemini 3.1 Pro to reclaim the industry top spot. It's especially strong on agent tasks, PC automation, long-running autonomous coding, and math and scientific research.
At the same time, it still loses to Claude Opus 4.7 on SWE-Bench Pro, shows a "confident hallucination" tendency, and comes with a 2x API price hike — so it's not an unconditional win.
The smarter play is "pick the right one — GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro — for the task at hand." All-in on the OpenAI ecosystem? GPT-5.5. Long coding sessions and safety-first work? Claude. Google Workspace integration? Gemini. Multi-model operations are becoming the 2026 standard.
Related Articles
- Claude Opus 4.7 Release Deep Dive — full details on the direct competitor
- Claude Opus 4.7 Migration Guide — moving from 4.6 to 4.7
- Claude vs ChatGPT Pricing Comparison — how the plan structures stack up
- What Is Next.js? — the React framework AI keeps recommending