On April 23, 2026, OpenAI released "ChatGPT 5.5 (GPT-5.5)." Billed by OpenAI itself as "a new class of intelligence for real work and AI agents," the model posted 82.7% on Terminal-Bench 2.0, pulling well ahead of Anthropic's Claude Opus 4.7 (69.4%) and Google's Gemini 3.1 Pro (68.5%) and reclaiming SOTA (state of the art) on 14 benchmarks in the process.

At the same time, the API list price doubled compared to GPT-5.4 ($5/$30 per MTok), and Claude Opus 4.7 still beats it on SWE-Bench Pro. The honest read is "the strongest, but not the universal best." There's also a documented tendency to answer confidently about things it doesn't actually know — something to be careful about in medical, legal, or regulated work.

Drawing on OpenAI's official release, the OpenAI Developer Docs, and several independent benchmark reports, this article gives you the full breakdown: what's new, the benchmarks, pricing, plan availability, how it stacks up against Claude and Gemini, and how to pick the right model for what you're doing.

ChatGPT 5.5 (GPT-5.5) Release Deep Dive

1. Release Overview — Date, Availability, Spec Sheet

ItemDetails
Official nameGPT-5.5 (shown as "ChatGPT 5.5" in ChatGPT)
Release dateApril 23, 2026
Built byOpenAI
VariantsGPT-5.5 (standard) / GPT-5.5 Pro (deep reasoning)
Context windowAPI: 1,050,000 tokens (~1M) / Codex: 400K tokens
Max output128,000 tokens
Knowledge cutoffDecember 1, 2025
API pricing (standard)$5 / 1M tokens (input) / $0.50 (cached input) / $30 (output)
API pricing (Pro)$30 (input) / $180 (output)
Long-prompt surchargeAbove 272K tokens: input 2x, output 1.5x
ModalitiesText in/out, image input (no audio or video)
ChatGPT plansPlus / Pro / Business / Enterprise (not on Free)
Key featuresFunction calling, structured outputs, streaming, reasoning effort control, Computer Use, MCP support

2. What's New — 5 Key Improvements

1. A Full Base Model Retrain (First Since GPT-4.5)

GPT-5.5 is the first complete base model retrain since GPT-4.5. GPT-5, 5.1, 5.2, and 5.4 were all fine-tuned variants of the same underlying base, but GPT-5.5 was rebuilt from the ground up. The result is improvements to reasoning efficiency and updated knowledge at the same time.

2. Major Token Efficiency Gains (~40% Reduction)

GPT-5.5 keeps the same per-token latency as GPT-5.4 while cutting output tokens needed to complete Codex tasks by roughly 40%. The list price doubled, but because output volume drops by 40%, OpenAI says the total cost for the same work usually grows by less than you'd expect.

From OpenAI co-founder Greg Brockman:

"It's a model that thinks faster and sharper with fewer tokens — that kind of model, compared to something like 5.4."

3. ~1M Context Window (API)

The API version expands to 1,050,000 tokens (~1M). The Codex integration is 400K. ~1M tokens is roughly 1,400 pages of A4 text. Just remember the metered surcharge: prompts above 272K tokens incur 2x input and 1.5x output pricing, so very long-context workloads need a cost model.

4. Five-Level Reasoning Effort Control

The API exposes reasoning.effort with five settings:

  • none: no reasoning (fastest, cheapest)
  • low: light reasoning
  • medium: default (balanced)
  • high: deep reasoning (complex tasks)
  • xhigh: maximum reasoning (slowest and most expensive, highest accuracy)

This mirrors the output_config.effort knob on Claude Opus 4.7, and the industry as a whole is converging on "let the caller dial reasoning depth."

5. Expert-SWE at 73.1% — Handles 20-Hour-Class Tasks

On OpenAI's newly published internal eval Expert-SWE (extremely complex coding tasks with a median 20-hour human completion time), GPT-5.5 hit 73.1% — up 5.6 points from GPT-5.4's 68.5%. That's a big jump for long-running autonomous coding agent reliability.

3. Benchmarks — Head-to-Head With Claude and Gemini

GPT-5.5 Benchmark Comparison
BenchmarkGPT-5.5Claude Opus 4.7Gemini 3.1 ProWinner
Terminal-Bench 2.082.7%69.4%68.5%GPT-5.5
GDPval (44-occupation knowledge work)84.9%GPT-5.5
OSWorld-Verified (PC automation)78.7%78.0%GPT-5.5 (narrowly)
BrowseComp84.4% (Pro: 90.1%)GPT-5.5 Pro
FrontierMath Tier 435.4% (Pro: 39.6%)22.9%16.7%GPT-5.5
SWE-Bench Pro58.6%64.3%Claude Opus 4.7
Tau2-bench Telecom (customer support)98.0%GPT-5.5
GPQA Diamond93.6%GPT-5.5
Expert-SWE (OpenAI internal)73.1%GPT-5.5

Bottom Line: GPT-5.5 Holds SOTA on 14 Benchmarks, Claude on 4, Gemini on 2

Across OpenAI's published benchmark set, GPT-5.5 holds SOTA on 14 benchmarks, Claude Opus 4.7 on 4, and Gemini 3.1 Pro on 2. The overall edge clearly belongs to GPT-5.5.

That said, on SWE-Bench Pro (production-grade coding tasks), Claude Opus 4.7 still wins at 64.3% vs GPT-5.5's 58.6%. For coding work, splitting models by task is still worth doing.

Third-Party Verification: CodeRabbit Code Review Eval

Independent code review service CodeRabbit reports the following GPT-5.5 improvements:

  • Curated benchmark: expected issue detection 58.3% → 79.2%, precision 27.9% → 40.6%
  • Real-world dataset: issue detection 55.0% → 65.0%, precision 11.6% → 13.2%

CodeRabbit's read: "the model prefers local changes, preserves behavior, and tends to focus on actual failure points." Translation — instead of sweeping rewrites, it leans toward targeted, accurate fixes.

4. GPT-5.5 vs GPT-5.5 Pro — Which Should You Use?

ItemGPT-5.5 (standard)GPT-5.5 Pro
API pricing (input)$5 / 1M tokens$30 / 1M tokens (6x)
API pricing (output)$30 / 1M tokens$180 / 1M tokens (6x)
BrowseComp84.4%90.1%
FrontierMath Tier 435.4%39.6%
ChatGPT plansPlus / Pro / Business / EnterprisePro / Business / Enterprise only
Best forDay-to-day tasks, coding, agentsScientific research, complex math, deep reasoning

How to Choose

  • Pick standard GPT-5.5: general coding, writing, agent workloads, cost-conscious use
  • Pick GPT-5.5 Pro: math and scientific research, paper drafting, complex decision-making — accuracy over cost

5. Pricing — Why the 2x Hike?

API Pricing (Standard GPT-5.5)

ItemPriceNotes
Input$5.00 / 1M tokens2x GPT-5.4
Cached input$0.50 / 1M tokens1/10 of regular input
Output$30.00 / 1M tokens2x GPT-5.4
Long prompts (>272K tokens)Input 2x, output 1.5xApplied to the whole session
Batch API / Flex50% discountFor asynchronous workloads
Priority processing2.5xFor low-latency requirements
Regional processing (data residency)+10%For compliance use cases

Why the 2x Hike?

OpenAI hasn't directly explained the price increase, but the likely drivers are:

  1. Cost of a full base model retrain — the first ground-up rebuild since GPT-4.5
  2. Pricing in performance gains — significant improvements on Terminal-Bench and others
  3. Token efficiency offsets some of the pain — 40% fewer output tokens partially balances the higher unit price

For output-heavy workloads, the effective cost increase works out to roughly "2x x 0.6 = 1.2x." But for input-heavy tasks (summarization, analysis), you take the full 2x hit head-on — keep that in mind.

6. ChatGPT Plan Availability

PlanMonthlyGPT-5.5GPT-5.5 ProCodex
Free$0NoNoNo
Plus$20/moYesNoYes
Pro$200/moYesYesYes (incl. Fast Mode)
BusinessUsage-basedYesYesYes
EnterpriseCustomYesYesYes

Free Users Stay on GPT-5 (or 5.4)

GPT-5.5 isn't available on the Free plan — Free users continue on GPT-5 (or 5.4). Plus ($20/mo) is the entry point.

7. API Specs and Developer Features

Supported Features

  • Function calling
  • Structured outputs (JSON Schema)
  • Streaming
  • Reasoning effort control (none/low/medium/high/xhigh)
  • Tools: web search, file search, image generation, Code Interpreter, Hosted Shell, Apply Patch, Skills, Computer Use, MCP, Tool Search
  • Distillation (to smaller models)
  • Fine-tuning: not supported at launch
  • Audio / video in or out: not supported

Rate Limits (Tier 5: highest)

  • RPM (requests per minute): 15,000
  • TPM (tokens per minute): 40,000,000
  • Batch queue limit: 15,000,000,000

Reasoning Effort Example (Python)

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "high"},  # none/low/medium/high/xhigh
    input="Solve this complex math problem step by step..."
)

print(response.output_text)

8. Codex Integration and the Super-app Strategy

Codex Fast Mode

Alongside the GPT-5.5 release, Codex gained a Fast Mode:

  • 1.5x faster processing
  • 2.5x the cost (in credits)
  • Available on Pro / Business / Enterprise plans

The Super-app Strategy

OpenAI also unveiled a "Super-app" vision that bundles "ChatGPT + Codex + AI browser" into one offering. The plan is to deliver these as a single enterprise package — what OpenAI calls "a step toward more agentic, more intuitive computing."

Conceptually, this is the "all-in-one package to maximize developer experience" pattern from PaaS like Vercel or Next.js, brought into the AI agent space.

9. GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro

GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro
ItemGPT-5.5Claude Opus 4.7Gemini 3.1 Pro
ReleasedApril 23, 2026April 16, 2026Early 2026
API input$5/MTok$5/MTokNot disclosed
API output$30/MTok$25/MTokNot disclosed
Context1,050K1,000K (200K standard)1,000K
Knowledge cutoffDec 1, 2025~early 2025~early 2025
SOTA benchmarks1442
Terminal-Bench 2.082.7%69.4%68.5%
SWE-Bench Pro58.6%64.3%
FrontierMath T435.4%22.9%16.7%
StrengthsAgents, long-running tasks, PC operationLong coding sessions, safety, long-form writingMultimodal, Google Workspace integration

How to Choose

  • Best overall + cutting-edge agent performance — GPT-5.5 (especially if >$30/MTok output is acceptable)
  • Long autonomous coding and safety-first work — Claude Opus 4.7 (wins on SWE-Bench Pro and has cheaper output pricing)
  • Google Workspace integration and multimodal — Gemini 3.1 Pro

10. The Catch — Watch Out for Overconfidence

Independent analysis (Handy AI) flags a tendency in GPT-5.5 to "answer confidently about things it doesn't actually know."

"The model knows more, but it also answers more confidently about things it doesn't know."

High-Risk Use Cases

  • Medical diagnosis or prescriptions — wrong information can be life-threatening
  • Legal advice or case research — citing hallucinated cases is a professional ethics issue
  • Financial advice or tax work — regulatory exposure
  • Citations in academic writing — known cases of citing non-existent papers

Mitigations

  1. Mandatory fact-checking — never use AI output as-is; verify against primary sources
  2. Use the web search tool — make the model fetch real-time information
  3. Cross-check against Claude Opus 4.7 — for accuracy-critical work, run answers past multiple models
  4. Tell it to say "I don't know" — instruct via system prompt: "if uncertain, say so explicitly"

11. When to Pick GPT-5.5 — By Use Case

Pick GPT-5.5 When

  • Long-running autonomous coding agents — Expert-SWE 73.1% is best-in-class
  • PC automation / Computer Use — OSWorld 78.7% is on par with Opus 4.7
  • Customer support automation — Tau2-bench 98.0% is essentially perfect
  • Advanced math and scientific research — FrontierMath T4 35.4% (well above Opus 4.7's 22.9%)
  • You're committed to the OpenAI ecosystem — integrates with ChatGPT, Codex, Operator

Skip GPT-5.5 When

  • SWE-Bench Pro–level production coding — Claude Opus 4.7 still leads
  • Accuracy-critical work (medical, legal, financial) — watch out for hallucinations
  • Cost is the top priority — $30/MTok output is at the high end
  • You want a free option — not available on the Free plan
  • Audio or video processing — text + image input only

FAQ

Q1. When did GPT-5.5 become available in ChatGPT?

April 23, 2026 (US time), on the Plus, Pro, Business, and Enterprise plans. GPT-5.5 Pro is limited to Pro, Business, and Enterprise.

Q2. Can I use GPT-5.5 on the Free plan?

No. The Free plan stays on GPT-5 (or 5.4). To access GPT-5.5 you need at least the $20/mo Plus plan.

Q3. GPT-5.5 vs Claude Opus 4.7 — which is better?

Overall, GPT-5.5 (SOTA on 14 benchmarks vs Claude's 4). But on SWE-Bench Pro, Claude Opus 4.7 wins 64.3% to 58.6% — so for production-grade coding, Claude has the edge. Claude is also cheaper on output ($25/MTok vs GPT-5.5's $30/MTok).

Q4. The API got more expensive — how do I keep costs under control?

Yes, $5/$30 per MTok is 2x GPT-5.4. But output token usage drops about 40%, so for output-heavy workloads the real cost increase typically lands around 1.2x. Cost control tips:
1. Use Batch API / Flex (50% discount)
2. Use cached input ($0.50/MTok, 1/10 of regular)
3. Use reasoning.effort=low for lighter tasks
4. Avoid prompts above 272K tokens

Q5. What's actually different about GPT-5.5 Pro?

It has stronger reasoning, with notable score lifts on complex math (FrontierMath: 35.4% → 39.6%) and scientific research tasks. The catch is the price — 6x the API rate ($30 input / $180 output). Outside research and serious paper-writing use cases, the price/performance often doesn't pencil out.

Q6. Is fine-tuning supported?

Not as of April 2026. Distillation (training smaller models from outputs) is supported, so you can use GPT-5.5 outputs to train something like GPT-5 nano.

Q7. Anything to watch out for when using the 1M context?

Prompts above 272K tokens trigger a surcharge of 2x input and 1.5x output across the entire session. If you're designing an API around 1M-token usage, run the cost numbers up front.

Q8. What's GPT-5.5's knowledge cutoff?

December 1, 2025. Anything after that (Jan 2026 onward) isn't in training data, so the web search tool is effectively required for current information.

Q9. Are hallucinations any better?

Independent analysis says "the knowledge base grew, but so did the model's confidence about things it doesn't know." OpenAI claims safety improvements officially, but for medical, legal, or financial work, fact-checking remains mandatory.

Q10. Will my existing GPT-5 app just work?

API compatibility is preserved — switching the model ID from gpt-5 to gpt-5.5 is enough to migrate. That said, taking advantage of new features (like the reasoning.effort parameter, or specifying the Pro variant ID) is worth a design pass.

Wrapping Up: GPT-5.5 Is the Strongest, but Not the Universal Best

GPT-5.5 holds SOTA on 14 benchmarks and pulls clearly ahead of Claude Opus 4.7 and Gemini 3.1 Pro to reclaim the industry top spot. It's especially strong on agent tasks, PC automation, long-running autonomous coding, and math and scientific research.

At the same time, it still loses to Claude Opus 4.7 on SWE-Bench Pro, shows a "confident hallucination" tendency, and comes with a 2x API price hike — so it's not an unconditional win.

The smarter play is "pick the right one — GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro — for the task at hand." All-in on the OpenAI ecosystem? GPT-5.5. Long coding sessions and safety-first work? Claude. Google Workspace integration? Gemini. Multi-model operations are becoming the 2026 standard.

Related Articles