What's GPT-5.5's knowledge cutoff?

December 1, 2025. Anything after that (Jan 2026 onward) isn't in training data, so the web search tool is effectively required for current information.

GPT-5.5 Release Deep Dive: Features, Pricing, Benchmarks, vs Claude Opus 4.7

Q: GPT-5.5 vs Claude Opus 4.7 — which is better?

Overall, GPT-5.5 (SOTA on 14 benchmarks vs Claude&#039;s 4). But on SWE-Bench Pro, Claude Opus 4.7 wins 64.3% to 58.6% — so for production-grade coding, Claude has the edge. Claude is also cheaper on output ($25/MTok vs GPT-5.5&#039;s $30/MTok).

Q: Anything to watch out for when using the 1M context?

Prompts above 272K tokens trigger a surcharge of 2x input and 1.5x output across the entire session. If you&#039;re designing an API around 1M-token usage, run the cost numbers up front.

Q: What&#039;s GPT-5.5&#039;s knowledge cutoff?

December 1, 2025. Anything after that (Jan 2026 onward) isn&#039;t in training data, so the web search tool is effectively required for current information.

Q: Are hallucinations any better?

Independent analysis says &quot;the knowledge base grew, but so did the model&#039;s confidence about things it doesn&#039;t know.&quot; OpenAI claims safety improvements officially, but for medical, legal, or financial work, fact-checking remains mandatory.

ChatGPT 5.5 (GPT-5.5) Release: Features, Benchmarks, Pricing & Claude Opus 4.7 Comparison [April 2026]

Table of Contents

1. Release Overview — Date, Availability, Spec Sheet
2. What's New — 5 Key Improvements
3. Benchmarks — Head-to-Head With Claude and Gemini
4. GPT-5.5 vs GPT-5.5 Pro — Which Should You Use?
5. Pricing — Why the 2x Hike?
6. ChatGPT Plan Availability
7. API Specs and Developer Features
8. Codex Integration and the Super-app Strategy
9. GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro
10. The Catch — Watch Out for Overconfidence
11. When to Pick GPT-5.5 — By Use Case
FAQ

On April 23, 2026, OpenAI released "ChatGPT 5.5 (GPT-5.5)." Billed by OpenAI itself as "a new class of intelligence for real work and AI agents," the model posted 82.7% on Terminal-Bench 2.0, pulling well ahead of Anthropic's Claude Opus 4.7 (69.4%) and Google's Gemini 3.1 Pro (68.5%) and reclaiming SOTA (state of the art) on 14 benchmarks in the process.

At the same time, the API list price doubled compared to GPT-5.4 ($5/$30 per MTok), and Claude Opus 4.7 still beats it on SWE-Bench Pro. The honest read is "the strongest, but not the universal best." There's also a documented tendency to answer confidently about things it doesn't actually know — something to be careful about in medical, legal, or regulated work.

Drawing on OpenAI's official release, the OpenAI Developer Docs, and several independent benchmark reports, this article gives you the full breakdown: what's new, the benchmarks, pricing, plan availability, how it stacks up against Claude and Gemini, and how to pick the right model for what you're doing.

1. Release Overview — Date, Availability, Spec Sheet

Item	Details
Official name	GPT-5.5 (shown as "ChatGPT 5.5" in ChatGPT)
Release date	April 23, 2026
Built by	OpenAI
Variants	GPT-5.5 (standard) / GPT-5.5 Pro (deep reasoning)
Context window	API: 1,050,000 tokens (~1M) / Codex: 400K tokens
Max output	128,000 tokens
Knowledge cutoff	December 1, 2025
API pricing (standard)	$5 / 1M tokens (input) / $0.50 (cached input) / $30 (output)
API pricing (Pro)	$30 (input) / $180 (output)
Long-prompt surcharge	Above 272K tokens: input 2x, output 1.5x
Modalities	Text in/out, image input (no audio or video)
ChatGPT plans	Plus / Pro / Business / Enterprise (not on Free)
Key features	Function calling, structured outputs, streaming, reasoning effort control, Computer Use, MCP support

2. What's New — 5 Key Improvements

1. A Full Base Model Retrain (First Since GPT-4.5)

GPT-5.5 is the first complete base model retrain since GPT-4.5. GPT-5, 5.1, 5.2, and 5.4 were all fine-tuned variants of the same underlying base, but GPT-5.5 was rebuilt from the ground up. The result is improvements to reasoning efficiency and updated knowledge at the same time.

2. Major Token Efficiency Gains (~40% Reduction)

GPT-5.5 keeps the same per-token latency as GPT-5.4 while cutting output tokens needed to complete Codex tasks by roughly 40%. The list price doubled, but because output volume drops by 40%, OpenAI says the total cost for the same work usually grows by less than you'd expect.

From OpenAI co-founder Greg Brockman:

"It's a model that thinks faster and sharper with fewer tokens — that kind of model, compared to something like 5.4."

3. ~1M Context Window (API)

The API version expands to 1,050,000 tokens (~1M). The Codex integration is 400K. ~1M tokens is roughly 1,400 pages of A4 text. Just remember the metered surcharge: prompts above 272K tokens incur 2x input and 1.5x output pricing, so very long-context workloads need a cost model.

4. Five-Level Reasoning Effort Control

The API exposes reasoning.effort with five settings:

none: no reasoning (fastest, cheapest)
low: light reasoning
medium: default (balanced)
high: deep reasoning (complex tasks)
xhigh: maximum reasoning (slowest and most expensive, highest accuracy)

This mirrors the output_config.effort knob on Claude Opus 4.7, and the industry as a whole is converging on "let the caller dial reasoning depth."

5. Expert-SWE at 73.1% — Handles 20-Hour-Class Tasks

On OpenAI's newly published internal eval Expert-SWE (extremely complex coding tasks with a median 20-hour human completion time), GPT-5.5 hit 73.1% — up 5.6 points from GPT-5.4's 68.5%. That's a big jump for long-running autonomous coding agent reliability.

3. Benchmarks — Head-to-Head With Claude and Gemini

Benchmark	GPT-5.5	Claude Opus 4.7	Gemini 3.1 Pro	Winner
Terminal-Bench 2.0	82.7%	69.4%	68.5%	GPT-5.5
GDPval (44-occupation knowledge work)	84.9%	—	—	GPT-5.5
OSWorld-Verified (PC automation)	78.7%	78.0%	—	GPT-5.5 (narrowly)
BrowseComp	84.4% (Pro: 90.1%)	—	—	GPT-5.5 Pro
FrontierMath Tier 4	35.4% (Pro: 39.6%)	22.9%	16.7%	GPT-5.5
SWE-Bench Pro	58.6%	64.3%	—	Claude Opus 4.7
Tau2-bench Telecom (customer support)	98.0%	—	—	GPT-5.5
GPQA Diamond	93.6%	—	—	GPT-5.5
Expert-SWE (OpenAI internal)	73.1%	—	—	GPT-5.5

Bottom Line: GPT-5.5 Holds SOTA on 14 Benchmarks, Claude on 4, Gemini on 2

Across OpenAI's published benchmark set, GPT-5.5 holds SOTA on 14 benchmarks, Claude Opus 4.7 on 4, and Gemini 3.1 Pro on 2. The overall edge clearly belongs to GPT-5.5.

That said, on SWE-Bench Pro (production-grade coding tasks), Claude Opus 4.7 still wins at 64.3% vs GPT-5.5's 58.6%. For coding work, splitting models by task is still worth doing.

Third-Party Verification: CodeRabbit Code Review Eval

Independent code review service CodeRabbit reports the following GPT-5.5 improvements:

Curated benchmark: expected issue detection 58.3% → 79.2%, precision 27.9% → 40.6%
Real-world dataset: issue detection 55.0% → 65.0%, precision 11.6% → 13.2%

CodeRabbit's read: "the model prefers local changes, preserves behavior, and tends to focus on actual failure points." Translation — instead of sweeping rewrites, it leans toward targeted, accurate fixes.

4. GPT-5.5 vs GPT-5.5 Pro — Which Should You Use?

Item	GPT-5.5 (standard)	GPT-5.5 Pro
API pricing (input)	$5 / 1M tokens	$30 / 1M tokens (6x)
API pricing (output)	$30 / 1M tokens	$180 / 1M tokens (6x)
BrowseComp	84.4%	90.1%
FrontierMath Tier 4	35.4%	39.6%
ChatGPT plans	Plus / Pro / Business / Enterprise	Pro / Business / Enterprise only
Best for	Day-to-day tasks, coding, agents	Scientific research, complex math, deep reasoning

How to Choose

Pick standard GPT-5.5: general coding, writing, agent workloads, cost-conscious use
Pick GPT-5.5 Pro: math and scientific research, paper drafting, complex decision-making — accuracy over cost

5. Pricing — Why the 2x Hike?

API Pricing (Standard GPT-5.5)

Item	Price	Notes
Input	$5.00 / 1M tokens	2x GPT-5.4
Cached input	$0.50 / 1M tokens	1/10 of regular input
Output	$30.00 / 1M tokens	2x GPT-5.4
Long prompts (>272K tokens)	Input 2x, output 1.5x	Applied to the whole session
Batch API / Flex	50% discount	For asynchronous workloads
Priority processing	2.5x	For low-latency requirements
Regional processing (data residency)	+10%	For compliance use cases

Why the 2x Hike?

OpenAI hasn't directly explained the price increase, but the likely drivers are:

Cost of a full base model retrain — the first ground-up rebuild since GPT-4.5
Pricing in performance gains — significant improvements on Terminal-Bench and others
Token efficiency offsets some of the pain — 40% fewer output tokens partially balances the higher unit price

For output-heavy workloads, the effective cost increase works out to roughly "2x x 0.6 = 1.2x." But for input-heavy tasks (summarization, analysis), you take the full 2x hit head-on — keep that in mind.

6. ChatGPT Plan Availability

Plan	Monthly	GPT-5.5	GPT-5.5 Pro	Codex
Free	$0	No	No	No
Plus	$20/mo	Yes	No	Yes
Pro	$200/mo	Yes	Yes	Yes (incl. Fast Mode)
Business	Usage-based	Yes	Yes	Yes
Enterprise	Custom	Yes	Yes	Yes

Free Users Stay on GPT-5 (or 5.4)

GPT-5.5 isn't available on the Free plan — Free users continue on GPT-5 (or 5.4). Plus ($20/mo) is the entry point.

7. API Specs and Developer Features

Supported Features

Function calling
Structured outputs (JSON Schema)
Streaming
Reasoning effort control (none/low/medium/high/xhigh)
Tools: web search, file search, image generation, Code Interpreter, Hosted Shell, Apply Patch, Skills, Computer Use, MCP, Tool Search
Distillation (to smaller models)
Fine-tuning: not supported at launch
Audio / video in or out: not supported

Rate Limits (Tier 5: highest)

RPM (requests per minute): 15,000
TPM (tokens per minute): 40,000,000
Batch queue limit: 15,000,000,000

Reasoning Effort Example (Python)

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    reasoning={"effort": "high"},  # none/low/medium/high/xhigh
    input="Solve this complex math problem step by step..."
)

print(response.output_text)

8. Codex Integration and the Super-app Strategy

Codex Fast Mode

Alongside the GPT-5.5 release, Codex gained a Fast Mode:

1.5x faster processing
2.5x the cost (in credits)
Available on Pro / Business / Enterprise plans

The Super-app Strategy

OpenAI also unveiled a "Super-app" vision that bundles "ChatGPT + Codex + AI browser" into one offering. The plan is to deliver these as a single enterprise package — what OpenAI calls "a step toward more agentic, more intuitive computing."

Conceptually, this is the "all-in-one package to maximize developer experience" pattern from PaaS like Vercel or Next.js, brought into the AI agent space.

9. GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro

Item	GPT-5.5	Claude Opus 4.7	Gemini 3.1 Pro
Released	April 23, 2026	April 16, 2026	Early 2026
API input	$5/MTok	$5/MTok	Not disclosed
API output	$30/MTok	$25/MTok	Not disclosed
Context	1,050K	1,000K (200K standard)	1,000K
Knowledge cutoff	Dec 1, 2025	~early 2025	~early 2025
SOTA benchmarks	14	4	2
Terminal-Bench 2.0	82.7%	69.4%	68.5%
SWE-Bench Pro	58.6%	64.3%	—
FrontierMath T4	35.4%	22.9%	16.7%
Strengths	Agents, long-running tasks, PC operation	Long coding sessions, safety, long-form writing	Multimodal, Google Workspace integration

How to Choose

Best overall + cutting-edge agent performance — GPT-5.5 (especially if >$30/MTok output is acceptable)
Long autonomous coding and safety-first work — Claude Opus 4.7 (wins on SWE-Bench Pro and has cheaper output pricing)
Google Workspace integration and multimodal — Gemini 3.1 Pro

10. The Catch — Watch Out for Overconfidence

Independent analysis (Handy AI) flags a tendency in GPT-5.5 to "answer confidently about things it doesn't actually know."

"The model knows more, but it also answers more confidently about things it doesn't know."

High-Risk Use Cases

Medical diagnosis or prescriptions — wrong information can be life-threatening
Legal advice or case research — citing hallucinated cases is a professional ethics issue
Financial advice or tax work — regulatory exposure
Citations in academic writing — known cases of citing non-existent papers

Mitigations

Mandatory fact-checking — never use AI output as-is; verify against primary sources
Use the web search tool — make the model fetch real-time information
Cross-check against Claude Opus 4.7 — for accuracy-critical work, run answers past multiple models
Tell it to say "I don't know" — instruct via system prompt: "if uncertain, say so explicitly"

11. When to Pick GPT-5.5 — By Use Case

Pick GPT-5.5 When

Long-running autonomous coding agents — Expert-SWE 73.1% is best-in-class
PC automation / Computer Use — OSWorld 78.7% is on par with Opus 4.7
Customer support automation — Tau2-bench 98.0% is essentially perfect
Advanced math and scientific research — FrontierMath T4 35.4% (well above Opus 4.7's 22.9%)
You're committed to the OpenAI ecosystem — integrates with ChatGPT, Codex, Operator

Skip GPT-5.5 When

SWE-Bench Pro–level production coding — Claude Opus 4.7 still leads
Accuracy-critical work (medical, legal, financial) — watch out for hallucinations
Cost is the top priority — $30/MTok output is at the high end
You want a free option — not available on the Free plan
Audio or video processing — text + image input only

FAQ

Q1. When did GPT-5.5 become available in ChatGPT?

April 23, 2026 (US time), on the Plus, Pro, Business, and Enterprise plans. GPT-5.5 Pro is limited to Pro, Business, and Enterprise.

Q2. Can I use GPT-5.5 on the Free plan?

No. The Free plan stays on GPT-5 (or 5.4). To access GPT-5.5 you need at least the $20/mo Plus plan.

Q3. GPT-5.5 vs Claude Opus 4.7 — which is better?

Overall, GPT-5.5 (SOTA on 14 benchmarks vs Claude's 4). But on SWE-Bench Pro, Claude Opus 4.7 wins 64.3% to 58.6% — so for production-grade coding, Claude has the edge. Claude is also cheaper on output ($25/MTok vs GPT-5.5's $30/MTok).

Q4. The API got more expensive — how do I keep costs under control?

Yes, $5/$30 per MTok is 2x GPT-5.4. But output token usage drops about 40%, so for output-heavy workloads the real cost increase typically lands around 1.2x. Cost control tips:
1. Use Batch API / Flex (50% discount)
2. Use cached input ($0.50/MTok, 1/10 of regular)
3. Use reasoning.effort=low for lighter tasks
4. Avoid prompts above 272K tokens

Q5. What's actually different about GPT-5.5 Pro?

It has stronger reasoning, with notable score lifts on complex math (FrontierMath: 35.4% → 39.6%) and scientific research tasks. The catch is the price — 6x the API rate ($30 input / $180 output). Outside research and serious paper-writing use cases, the price/performance often doesn't pencil out.

Q6. Is fine-tuning supported?

Not as of April 2026. Distillation (training smaller models from outputs) is supported, so you can use GPT-5.5 outputs to train something like GPT-5 nano.

Q7. Anything to watch out for when using the 1M context?

Prompts above 272K tokens trigger a surcharge of 2x input and 1.5x output across the entire session. If you're designing an API around 1M-token usage, run the cost numbers up front.

Q8. What's GPT-5.5's knowledge cutoff?

December 1, 2025. Anything after that (Jan 2026 onward) isn't in training data, so the web search tool is effectively required for current information.

Q9. Are hallucinations any better?

Independent analysis says "the knowledge base grew, but so did the model's confidence about things it doesn't know." OpenAI claims safety improvements officially, but for medical, legal, or financial work, fact-checking remains mandatory.

Q10. Will my existing GPT-5 app just work?

API compatibility is preserved — switching the model ID from gpt-5 to gpt-5.5 is enough to migrate. That said, taking advantage of new features (like the reasoning.effort parameter, or specifying the Pro variant ID) is worth a design pass.

Wrapping Up: GPT-5.5 Is the Strongest, but Not the Universal Best

GPT-5.5 holds SOTA on 14 benchmarks and pulls clearly ahead of Claude Opus 4.7 and Gemini 3.1 Pro to reclaim the industry top spot. It's especially strong on agent tasks, PC automation, long-running autonomous coding, and math and scientific research.

At the same time, it still loses to Claude Opus 4.7 on SWE-Bench Pro, shows a "confident hallucination" tendency, and comes with a 2x API price hike — so it's not an unconditional win.

The smarter play is "pick the right one — GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro — for the task at hand." All-in on the OpenAI ecosystem? GPT-5.5. Long coding sessions and safety-first work? Claude. Google Workspace integration? Gemini. Multi-model operations are becoming the 2026 standard.

Claude Opus 4.7 Release Deep Dive — full details on the direct competitor
Claude Opus 4.7 Migration Guide — moving from 4.6 to 4.7
Claude vs ChatGPT Pricing Comparison — how the plan structures stack up
What Is Next.js? — the React framework AI keeps recommending

ChatGPT 5.5 (GPT-5.5) Release: Features, Benchmarks, Pricing & Claude Opus 4.7 Comparison [April 2026]

1. Release Overview — Date, Availability, Spec Sheet

2. What's New — 5 Key Improvements

1. A Full Base Model Retrain (First Since GPT-4.5)

2. Major Token Efficiency Gains (~40% Reduction)

3. ~1M Context Window (API)

4. Five-Level Reasoning Effort Control

5. Expert-SWE at 73.1% — Handles 20-Hour-Class Tasks

3. Benchmarks — Head-to-Head With Claude and Gemini

Bottom Line: GPT-5.5 Holds SOTA on 14 Benchmarks, Claude on 4, Gemini on 2

Third-Party Verification: CodeRabbit Code Review Eval

4. GPT-5.5 vs GPT-5.5 Pro — Which Should You Use?

How to Choose

5. Pricing — Why the 2x Hike?

API Pricing (Standard GPT-5.5)

Why the 2x Hike?

6. ChatGPT Plan Availability

Free Users Stay on GPT-5 (or 5.4)

7. API Specs and Developer Features

Supported Features

Rate Limits (Tier 5: highest)

Reasoning Effort Example (Python)

8. Codex Integration and the Super-app Strategy

Codex Fast Mode

The Super-app Strategy

9. GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro

How to Choose

10. The Catch — Watch Out for Overconfidence

High-Risk Use Cases

Mitigations

11. When to Pick GPT-5.5 — By Use Case

Pick GPT-5.5 When

Skip GPT-5.5 When

FAQ

Q1. When did GPT-5.5 become available in ChatGPT?

Q2. Can I use GPT-5.5 on the Free plan?

Q3. GPT-5.5 vs Claude Opus 4.7 — which is better?

Q4. The API got more expensive — how do I keep costs under control?

Q5. What's actually different about GPT-5.5 Pro?

Q6. Is fine-tuning supported?

Q7. Anything to watch out for when using the 1M context?

Q8. What's GPT-5.5's knowledge cutoff?

Q9. Are hallucinations any better?

Q10. Will my existing GPT-5 app just work?

Wrapping Up: GPT-5.5 Is the Strongest, but Not the Universal Best

Related Articles

Related Articles

What Is Claude Agent SDK? A Complete Guide to Building AI Agents

Generative AI Knowledge Cutoff Dates Compared: ChatGPT, Claude, Gemini & More [2026]

Claude vs ChatGPT Pricing Comparison [2026]: Free Plans, Subscriptions & API Costs Explained

Claude Code vs Codex: Which Should You Choose? A Complete Comparison of Pricing, Performance & Use Cases [2026]

Comments

Leave a Comment