1. Who This Guide Is For

As covered in the Claude Opus 4.7 release article, Opus 4.7 is the direct successor to 4.6. That said, there are several breaking changes at the API level, and simply swapping the model name can give you 400 Bad Request errors.

This guide is written for:

  • Developers calling claude-opus-4-6 via the Anthropic API or SDK
  • Teams using Claude through Bedrock or Vertex AI
  • Anyone stuck on Opus 4.5 or 4.1 who wants to jump straight to 4.7
  • Anyone using extended thinking (thinking: enabled) or temperature in production

Anthropic's official migration guide is the primary source; this article focuses on the friction points that tend to trip up engineers in real-world projects. For the official guide, see platform.claude.com's migration guide.

Claude Opus 4.7 migration guide at a glance

2. TL;DR --- The 3-Line Summary

For those in a hurry:

  1. Swap the model name from claude-opus-4-6 to claude-opus-4-7 (and update the SDK)
  2. Three big breaking changes: thinking: enabled is gone (move to adaptive thinking + effort), temperature/top_p/top_k are gone, and the new tokenizer consumes up to 1.35x more tokens for the same text
  3. What you get in return: better coding performance, 1M context at standard pricing, and a new xhigh effort level

The details follow.

3. Update the Model Name

Step one is simple: swap the model identifier.

# Before
model = "claude-opus-4-6"

# After
model = "claude-opus-4-7"

If you manage this via environment variables or config files, having a single switch point makes the next migration easier too.

// .env
// CLAUDE_MODEL=claude-opus-4-6
CLAUDE_MODEL=claude-opus-4-7

// app.ts
const model = process.env.CLAUDE_MODEL ?? "claude-opus-4-7";

However, swapping the model name alone is often not enough --- that's what makes this migration tricky. Let's go through the breaking changes one by one.

4. Breaking Change 1: Extended Thinking Is Gone (Adaptive Thinking Only)

In Opus 4.6, you enabled extended thinking with thinking: {type: "enabled", budget_tokens: N}. In 4.7 this form returns a 400 error.

Instead, you use "adaptive thinking," where the model automatically adjusts thinking effort. What's more, 4.7 defaults thinking to OFF --- if you omit the thinking field, the model runs without thinking. You have to opt in explicitly to get the old deep-reasoning behavior.

Before (4.6) / After (4.7)

# Before: Opus 4.6
client.messages.create(
    model="claude-opus-4-6",
    max_tokens=64000,
    thinking={"type": "enabled", "budget_tokens": 32000},
    messages=[{"role": "user", "content": "..."}],
)

# After: Opus 4.7
client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # "max", "xhigh", "high", "medium", "low"
    messages=[{"role": "user", "content": "..."}],
)
// Before: Opus 4.6
await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 64000,
  thinking: { type: "enabled", budget_tokens: 32000 },
  messages: [{ role: "user", content: "..." }],
});

// After: Opus 4.7
await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 64000,
  thinking: { type: "adaptive" },
  output_config: { effort: "high" },
  messages: [{ role: "user", content: "..." }],
});

Key Points

  • You no longer control "how much thinking" directly via budget_tokens
  • Instead, use output_config.effort to say "how seriously should I think about this?"
  • If you don't include the thinking field, the model answers immediately without thinking (a subtle behavior shift from 4.6)

5. Breaking Change 2: Sampling Parameters Are Gone

Passing non-default values for temperature, top_p, or top_k returns a 400 error. This reflects Anthropic's direction that "the model's behavior should be controlled via prompting."

# Before
client.messages.create(
    model="claude-opus-4-6",
    temperature=0.7,
    top_p=0.9,
    top_k=40,
    messages=[...],
)

# After
client.messages.create(
    model="claude-opus-4-7",
    # temperature / top_p / top_k removed completely
    messages=[...],
)

If you were using temperature=0.2 to keep output "as stable as possible," replace it with a prompt directive --- "make output as consistent as possible across identical questions" --- or combine it with structured outputs via a JSON schema.

If you were using temperature=1.2 for "more creative" output, fold it into prompt tone instructions like "use evocative, unexpected phrasing."

6. Breaking Change 3: Thinking Content Hidden by Default

In 4.6, with thinking enabled, "summarized thinking" was streamed in the response by default. Many apps surfaced this as a "thinking..." indicator in the UI.

In 4.7 the contract quietly changed: thinking blocks appear in the response, but the thinking field is empty. Content is only returned if you opt in explicitly.

Symptom

Your "thinking..." indicator spins indefinitely, and responses feel unusually slow to start. "Is it frozen?" --- that's this pattern.

Fix

# To stream summarized thinking to the UI, as in 4.6
client.messages.create(
    model="claude-opus-4-7",
    thinking={
        "type": "adaptive",
        "display": "summarized",  # Explicitly set this
    },
    output_config={"effort": "high"},
    messages=[...],
)
await client.messages.create({
  model: "claude-opus-4-7",
  thinking: {
    type: "adaptive",
    display: "summarized",
  },
  output_config: { effort: "high" },
  messages: [...],
});

If you process everything server-side and don't expose thinking to the UI, leaving display unset is fine.

7. Breaking Change 4: New Tokenizer (~1.35x)

Opus 4.7 uses a new tokenizer internally. It contributes to quality gains, but as a side effect, the same text consumes roughly 1.0-1.35x as many tokens as under 4.6.

What that causes:

  • Operations designed right at the edge of max_tokens get truncated
  • Any tiktoken-style estimates you use client-side for billing or length checks go off
  • /v1/messages/count_tokens returns different numbers for 4.7 vs 4.6
  • The same prompt costs and takes slightly more

Fix

# Before: allocating a 16k output buffer under 4.6
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    messages=[...],
)

# After: leave headroom at ~1.35x
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=22000,  # 16000 * 1.35 ~ 21600 -> round up
    messages=[...],
)

Also, in 4.7, the 1M context window is available at standard API pricing (no long-context surcharge). Token counts go up, but this does make the "just dump everything into the input" strategy more feasible.

8. Breaking Change 5: Prefill Is Gone

This breaking change carries over from 4.6. The assistant message prefill trick --- putting an {role: "assistant", content: "```json"} message at the end of your messages array to force "start the response with JSON" --- returns a 400 error.

# Before: forcing JSON output with prefill
client.messages.create(
    model="claude-opus-4-6",
    messages=[
        {"role": "user", "content": "Return the user info as JSON"},
        {"role": "assistant", "content": "```json\n{"},  # prefill
    ],
)

# After: use structured outputs
client.messages.create(
    model="claude-opus-4-7",
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                },
                "required": ["name", "age"],
            },
        },
    },
    messages=[
        {"role": "user", "content": "Return the user info as JSON"},
    ],
)

There are three replacements for prefill:

  1. Structured outputs (output_config.format) --- constrain output to a JSON schema
  2. System prompt saying "return only JSON --- no markdown or preamble"
  3. Tool use to receive a function call (arguments come back as structured JSON)
Before/After for the 5 breaking changes in Opus 4.7

9. Picking an Effort Level (xhigh Is New)

output_config.effort accepts five values. xhigh is new in 4.7.

effortPositioningPrimary Use
maxNo upper bound on thinkingBenchmarks, hard problems. Risk of overthinking, diminishing returns
xhigh (NEW)Coding / agent optimalThe default for Claude Code and autonomous tasks
highBalancedMinimum bar for cognitively demanding tasks
mediumCost-firstTrade a bit of quality for cost and speed
lowShort, routine tasksClassification, formatting, summarization --- latency first

If you were manually setting budget_tokens on 4.6, you now just pick an effort value on 4.7. Rules of thumb:

  • For coding agents (Claude Code-style workflows), start at xhigh
  • For Q&A chat or RAG responses, high is a safe default
  • For tagging, JSON extraction, classification workers, use medium or low
  • Use max only when you want to "think deeply on this one question without watching the cost"

10. Adapting to Behavioral Changes

Even when the API is compatible, how the model responds to prompts has shifted from 4.6. Roll this out to production without knowing, and users will tell you things feel "cold."

10.1 Response Length Adapts to the Task

4.7 adjusts response length automatically based on task complexity. There's no fixed verbosity like "always three paragraphs." Conversely, any existing "length-control prompts" are worth removing first to see how it behaves.

10.2 Instructions Are Taken More Literally

This is most pronounced at low effort. Write "be concise" and it really will be concise. Write "list three" and it won't throw in a fourth. Handy, but the 4.6-style "reads between the lines and adds helpful context" behavior is reduced.

10.3 The Tone Is More Direct

Fewer validation phrases ("great question!"), fewer decorative emojis, less opening pleasantry. To preserve a friendly tone, be explicit in the system prompt.

10.4 Progress Updates Are Built Into Agent Tracking

If you had scaffolding to get the model to say things like "I'm about to X" or "I'm doing X" mid-run, you can drop it --- 4.7 emits these natively.

10.5 Subagents and Tool Calls Are More Restrained

By default, the model spawns fewer subagents and calls fewer tools. If it judges that "reasoning is enough," it answers without reaching for a tool. Update your agent-design expectations accordingly.

10.6 Real-Time Cybersecurity Safeguards

Even legitimate offensive security work (red-teaming, vulnerability PoCs) can now be refused depending on context. If security work is a production use case, consider applying to Anthropic's Cyber Verification Program.

10.7 High-Resolution Image Support

You can now process high-res images up to 2576px natively. However, a full-resolution image costs about 3x the tokens per image. For image-heavy workloads, either (a) reallocate max_tokens, or (b) downsample before sending.

These aren't required for things to work, but they're worth doing.

  1. Re-examine max_tokens: the new tokenizer also inflates output --- retest starting from 1.2-1.35x your existing value
  2. Audit client-side token estimates: if you implement billing calculations or length checks yourself, switch to the count_tokens API or recalibrate coefficients
  3. Adopt task_budgets (beta): useful for agents. Add the task-budgets-2026-03-13 header, minimum 20k. Remember this is an advisory cap, not a hard limit
  4. Set max_tokens to 64k or higher: when using xhigh/max, we recommend 64k+ for thinking + output combined
  5. Downsample images: if high resolution isn't required, resize before sending to save tokens and cost

11.1 Minimal task_budgets Example (Official Python SDK)

Because task_budgets is a beta feature, you must use the client.beta.messages.create endpoint and pass betas explicitly. The call differs from GA features.

response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=128000,
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[
        {"role": "user", "content": "Review the codebase and propose a refactor plan."}
    ],
    betas=["task-budgets-2026-03-13"],
)

Key points:

  • Minimum is 20,000 tokens; anything lower is rejected
  • max_tokens is a per-request hard cap (not visible to the model); task_budget is an advisory cap for the whole agent loop (the model sees the countdown)
  • Use max_tokens to tightly bound cost; use task_budget to balance quality and efficiency
  • For open-ended work where quality > speed, omit task_budget (otherwise it tends to cut things too short)

12. Migrating From Opus 4.5 / 4.1 or Earlier

Jumping straight from 4.5 or 4.1 to 4.7 requires the following in addition to the above.

  • Remove sampling parameters: carryover Claude 3.x users are probably using temperature habitually. Remove it completely
  • Clean up beta headers: effort-2025-11-24, fine-grained-tool-streaming-2025-05-14, interleaved-thinking-2025-05-14, and similar are all now part of the main API --- drop them
  • Switch endpoints: if you were calling client.beta.messages.create, change to client.messages.create
  • Migrate output_format -> output_config.format: the key was renamed
  • Parse tool arguments carefully: since 4.6, JSON escape behavior can differ in some cases. Use a proper parser (JSON.parse / json.loads) rather than raw string parsing

For an overview of what's new in Claude Opus 4.7 itself, see the companion article Claude Opus 4.7 Release: Features, Benchmarks, and Pricing.

13. Migration Checklist (Every Item)

Opus 4.7 migration checklist

Here's a flat, printable list you can share with the team.

13.1 Required (skip and you'll see 400 errors or behavior breaks)

  • [ ] Update the model name from claude-opus-4-6 to claude-opus-4-7
  • [ ] Remove temperature / top_p / top_k
  • [ ] Replace thinking: {type: "enabled", budget_tokens: N} with {type: "adaptive"} + output_config.effort
  • [ ] Remove assistant message prefill; replace with structured outputs / system prompts
  • [ ] If you display thinking in a UI, set thinking.display: "summarized" explicitly

13.2 Tuning (cost / quality optimization)

  • [ ] Re-benchmark cost and latency under the new tokenizer
  • [ ] Adjust max_tokens by ~1.35x as a starting point
  • [ ] Retest client-side token estimation code
  • [ ] For image-heavy workloads, reallocate tokens for high-res images
  • [ ] When using xhigh/max, set max_tokens >= 64k
  • [ ] For agent workloads, consider adopting task_budgets (beta)

13.3 Prompt and Operational Review

  • [ ] Verify length adaptation, literal interpretation, and tone shifts on real prompts
  • [ ] Remove existing length-control prompts and re-establish a baseline
  • [ ] For security use cases being refused, apply to the Cyber Verification Program
  • [ ] Simplify agent scaffolding (mid-run progress output, etc.)
  • [ ] For migrations from 4.5 or earlier, remove beta headers and move to client.messages.create

14. Automated Migration Tools

If you use Claude Code, Anthropic provides a Claude API skill that can automate the mechanical rewrites. Invoke the skill from Claude Code with something like:

/claude-api migrate

Please migrate the entire project from Claude Opus 4.6 to 4.7.
- Swap model names
- Remove temperature / top_p / top_k
- Replace thinking: enabled with adaptive + effort: high
- If any prefill remains, replace with structured outputs

The skill scans the repo, identifies files importing the anthropic SDK, and proposes changes. Prompt fine-tuning and re-benchmarking can't be automated, so finish the job with the checklist above.

FAQ

Q. Does swapping the model name alone work?

If your code doesn't use any of temperature, top_p, top_k, thinking: {type: "enabled"}, or prefill, yes, it runs. That said, the new tokenizer can cause output to be truncated mid-response, so it's still worth revisiting max_tokens.

Q. If I don't include the thinking field, will 4.7 answer without thinking?

Yes --- 4.7 defaults thinking to OFF. That's the same as 4.6's default-off stance, but the behavioral changes that come with adaptive thinking only apply if you opt in explicitly. Include thinking: {type: "adaptive"} and set output_config.effort to control depth.

Q. If I remove temperature entirely, will I get identical output every time?

No. Claude still generates probabilistically, so there will be some variation even for the same prompt. If you want strong consistency, (a) use structured outputs (JSON schema) to constrain format, (b) add explicit instructions like "return the same output for identical inputs" or "list items in a fixed order."

Q. Is task_budgets a hard cap?

No --- it's an advisory cap that the model respects. There's no guarantee output stays inside that bound. To strictly limit cost, combine it with max_tokens and app-side timeout/abort logic as before. Beta usage requires the task-budgets-2026-03-13 header.

Q. Does behavior match between Claude Code and direct API calls?

The API contract is the same. That said, Claude Code has recommended settings (xhigh is close to the default for coding) and the skill sometimes sets task_budgets behind the scenes. If you notice differences, log the request JSON on both sides to compare.

Q. My image-heavy app's token use is spiking. What can I do?

Consider: (1) downsample images to under 2576px before sending, (2) combine multiple images into a single sheet, (3) OCR images on the app side in advance and only send text. For use cases where high resolution is mandatory (medical imaging, blueprints), send those at full resolution and give them extra max_tokens headroom --- a practical balance.

Q. Are the steps the same via Bedrock / Vertex AI?

Parameter changes are the same at the core. Model identifiers (like Bedrock's anthropic.claude-opus-4-7) and rollout timing follow the platform's own announcements. The thinking and output_config structures are common across platforms.

Q. How much should I trust automated migration tools?

The Claude API skill (/claude-api migrate) is good at mechanical pieces: "swap model name," "remove sampling parameters," "rewrite extended thinking." Prompt tone, length control, and re-benchmarking require human judgment. The practical approach is to run the automation, then close out the checklist in this article by hand.

This article is based on Anthropic's official Claude Opus 4.7 migration guide (as of April 2026). API specifications can change --- check the official documentation for the latest before going to production.