Models

Context Window Sizes for Every GPT Model

Compare GPT context window sizes across GPT-5, GPT-4.1, GPT-4o, o-series reasoning models, legacy GPT models, and ChatGPT model-picker tiers.

Dashboard with bars labeled 1M tokens, 400K, 200K, and 128K plus an output gauge.

Last verified: May 4, 2026. OpenAI changes model aliases, ChatGPT plan limits, and availability over time, so treat the numbers below as current guidance rather than permanent product guarantees.

A context window is the total amount of text, images, tool results, reasoning tokens, and output a GPT model can consider in one request or conversation turn. For API work, the widest published OpenAI GPT context window remains the GPT-4.1 family at 1,047,576 tokens, while current GPT-5-family models such as GPT-5.5, GPT-5.4, and GPT-5.2 center on a 400,000-token class with much larger output budgets.[1] ChatGPT is different: visible model-picker limits depend on plan, selected mode, tools, and whether the product surface is Instant or Thinking.[13] This context window comparison explains the practical differences, the published limits, and how to choose a model when long documents, codebases, transcripts, or agent traces are part of the job.

Quick answer

The short answer is: use the GPT-4.1 family when you need the widest published API context window; use current GPT-5-family models such as gpt-5.5, gpt-5.5-pro, gpt-5.4, or gpt-5.2 when you need frontier reasoning with a large 400,000-token working space; and use GPT-4o or GPT-4o mini when 128,000 tokens is enough and you want a balanced multimodal model.[1]

Context length is only one part of model selection. The all GPT models compared side by side guide is the better starting point if you also need benchmarks, latency, tools, and cost. If the same long-context job will run many times, compare it with OpenAI API pricing before you choose the biggest model by default.

For ChatGPT users, the key rule is simpler but more plan-dependent. As last verified in May 2026, OpenAI’s ChatGPT help article lists GPT-5.3 Instant at different limits by plan and lists manually selected GPT-5.4 Thinking at higher paid-tier limits, with Pro reaching a 400K total window split between input and maximum output.[13] Those ChatGPT limits can change and do not mean the same thing as API model limits.

Three decision cards labeled GPT-4.1, GPT-5 family, and GPT-4o with document, gear, and image icons.

Full context window table

This table focuses on GPT text, multimodal, audio, realtime, chat, coding, and reasoning models where a token context window is relevant. It excludes image-only, video-only, embedding, moderation, and speech-to-text models because those do not behave like general GPT chat models. The API model ID column is included for developers who need the exact string or alias to use in code.

Model or surfaceAPI model ID or aliasPublished context windowPublished max outputBest fit
GPT-4.1gpt-4.11,047,576 tokens32,768 tokensLargest published API context for long files and codebases.[5]
GPT-4.1 minigpt-4.1-mini1,047,576 tokens32,768 tokensLower-cost long-context extraction and routing.[21]
GPT-4.1 nanogpt-4.1-nano1,047,576 tokens32,768 tokensHigh-volume long-context classification.[22]
GPT-5.5gpt-5.5, gpt-5.5-pro400,000-token class128,000-token classCurrent top-tier GPT-5 chat and reasoning work where quality matters more than maximum raw input length.
GPT-5.4gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano400,000-token class128,000-token classCurrent GPT-5-family work across frontier, mini, and nano variants.
GPT-5.3 Chat / GPT-5.3 Codexgpt-5.3-chat-latest, gpt-5.3-codexProduct- or endpoint-dependentProduct- or endpoint-dependentChatGPT-aligned and coding-specific GPT-5.3 surfaces; verify the active endpoint before deploying.
GPT-5.2gpt-5.2, gpt-5.2-pro, gpt-5.2-codex400,000 tokens for gpt-5.2128,000 tokens for gpt-5.2Frontier reasoning, coding, tool use, and long agent tasks.[2]
GPT-5.1gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini400,000 tokens for gpt-5.1128,000 tokens for gpt-5.1Large-context GPT-5 work with configurable reasoning effort.[3]
GPT-5gpt-5400,000 tokens128,000 tokensEarlier GPT-5 reasoning and agentic workloads.[4]
GPT-5 minigpt-5-mini400,000 tokens128,000 tokensCost-optimized GPT-5-family tasks with large context.[19]
GPT-5 nanogpt-5-nano400,000 tokens128,000 tokensFast classification and summarization at GPT-5 context size.[20]
GPT-5.2 Chatgpt-5.2-chat-latest128,000 tokens16,384 tokensAPI testing against the ChatGPT GPT-5.2 snapshot.[24]
GPT-5.1 Chatgpt-5.1-chat-latest128,000 tokens16,384 tokensAPI testing against the ChatGPT GPT-5.1 snapshot.[25]
o3o3, o3-pro, o3-deep-research200,000 tokens for o3100,000 tokens for o3Reasoning workflows that need long answers.[7]
o4-minio4-mini200,000 tokens100,000 tokensFast, lower-cost reasoning with image input.[8]
o3-minio3-mini200,000 tokens100,000 tokensSmall reasoning tasks with structured outputs.[17]
o1o1200,000 tokens100,000 tokensEarlier full o-series reasoning.[9]
o1-proo1-pro200,000 tokens100,000 tokensHigher-compute o1 reasoning through the Responses API.[18]
GPT-4ogpt-4o128,000 tokens16,384 tokensGeneral multimodal work with text and image input.[6]
GPT-4o minigpt-4o-mini128,000 tokens16,384 tokensAffordable multimodal classification and extraction.[23]
GPT-4o Audiogpt-4o-audio-preview128,000 tokens16,384 tokensChat completions with audio input and output.[26]
GPT-4.5 Previewgpt-4.5-preview128,000 tokens16,384 tokensDeprecated preview model; use newer models when possible.[16]
GPT-4 Turbogpt-4-turbo128,000 tokens4,096 tokensOlder GPT-4 generation with image input.[10]
GPT-4gpt-48,192 tokens8,192 tokensOlder GPT-4 chat compatibility.[11]
GPT-3.5 Turbogpt-3.5-turbo16,385 tokens4,096 tokensLegacy chat and fine-tuning compatibility.[12]

OpenAI’s model comparison page confirms the broad pattern: GPT-4.1 exposes the largest published API window, GPT-5-era models occupy a large 400,000-token class, and GPT-4o sits at 128,000 tokens.[1] OpenAI’s GPT-4.1 launch article describes that family as supporting up to 1 million tokens of context, which matches the model pages while rounding the exact API figure.[14]

Ranked bars labeled 1M, 400K, 200K, and 128K aligned with model-family rows and output chips.

API context windows are not the same as ChatGPT context windows

Most confusion comes from mixing two products. The OpenAI API exposes model-specific token windows, output limits, endpoints, and rate limits. ChatGPT exposes a product experience with plans, model-picker modes, tools, memory, files, and automatic routing. A model name in ChatGPT can have a different practical context limit than a related API model.

As last verified in May 2026, OpenAI’s ChatGPT help page lists GPT-5.3 Instant as the default for logged-in users and shows different context windows by plan: 16K for Free, 32K for Plus and Business, and 128K for Pro and Enterprise.[13] The same page lists manually selected GPT-5.4 Thinking at 256K for all paid tiers and 400K for Pro, with the Pro limit split as 272K input plus 128K maximum output.[13] These are ChatGPT product limits, not universal API limits.

That means a ChatGPT Plus user and an API developer can both say they are using a GPT-5-era model while seeing different limits. This is normal. ChatGPT wraps models in product rules. The API exposes more direct model parameters, but it also requires you to manage token budgets, truncation, file retrieval, and billing yourself.

Where you use the modelWhat controls the limitWhat to check first
OpenAI APIModel page, exact model ID, endpoint, max output setting, and rate limitsPublished context window and max output tokens for the ID you call
ChatGPTPlan, selected mode, tools, model routing, conversation state, and product capsHelp Center model and limits page, checked close to the time you rely on it
Custom GPTs and filesChatGPT product rules plus retrieval behaviorWhether the content is loaded into context or retrieved in chunks

If you are comparing paid ChatGPT plans, pair this guide with our ChatGPT Plus price in 2026 breakdown. If you are building an app, the API table matters more than the ChatGPT model picker.

Split diagram with an API token ruler and ChatGPT plan cards showing that product limits vary by plan.

How to read the numbers

A model’s context window is not just how much you can paste. It is the combined budget for the prompt, conversation history, system and developer instructions, tool outputs, retrieved file chunks, image representations, reasoning tokens where applicable, and the model’s answer. OpenAI’s token guide explains that both input and output tokens count toward usage, cost, latency, and whether a request fits within the model limit.[15]

Context window

The context window is the total token budget. A 128,000-token window does not guarantee that you can paste 128,000 tokens and still receive a long answer. If you need a 10,000-token response, you must leave room for that response and for any hidden or intermediate tokens the model uses.

Illustrative chart showing that more reserved output space leaves less room for input.
Illustrative only — not a measured benchmark.

Maximum output tokens

Maximum output is the largest answer budget the model page publishes. GPT-5.2 lists a 128,000-token maximum output, while GPT-4.1 lists 32,768 tokens, and GPT-4o lists 16,384 tokens.[2][5][6] This matters for report generation, code generation, transcript cleanup, and any workflow where the answer itself must be long.

Reasoning tokens

Reasoning models can spend part of the context budget on internal reasoning before producing the visible answer. OpenAI’s pricing page notes that reasoning tokens are not visible through the API, but they still occupy context-window space and are billed as output tokens.[27] For long prompts, a high reasoning setting can reduce the remaining room for the final answer.

This is why long-context reasoning work needs a margin. Do not fill a 400,000-token model to the top and expect a 128,000-token answer. Leave space for tool results, retries, reasoning, and formatting. For coding tasks, also leave room for diffs, tests, compiler output, and follow-up instructions. Our best GPT model for coding guide covers that tradeoff in more detail.

Which model has the largest context window

The GPT-4.1 family has the largest published API context window among the GPT models covered here. GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano each list 1,047,576 tokens of context and 32,768 maximum output tokens.[5][21][22] OpenAI’s launch post rounds that family to up to 1 million tokens and says the jump was from 128,000 tokens for previous GPT-4o models.[14]

The current GPT-5 family is not the largest by raw context size, but it gives a different balance. GPT-5.5 and GPT-5.5-pro are the current top tier for chat-style GPT-5 work, while GPT-5.4 and GPT-5.2 remain important production choices. GPT-5.2 lists a 400,000-token context window and a 128,000-token maximum output, and OpenAI describes it as a frontier model for coding and agentic tasks.[2] That combination can be better than GPT-4.1 when the task needs deep reasoning, tool use, and a very long generated answer rather than the largest possible input.

The o-series reasoning models sit between those two tiers. o3, o4-mini, o3-mini, o1, and o1-pro each list 200,000 tokens of context and 100,000 maximum output tokens in their model pages.[7][8][17][9][18] They remain relevant for existing integrations, though new complex work should be compared against current GPT-5-family models before standardizing on an older o-series ID.

For everyday multimodal work, GPT-4o and GPT-4o mini remain in the 128,000-token class, with 16,384 maximum output tokens.[6][23] That is still a large window for many user-facing applications. It can cover long documents, batches of support messages, and many image-plus-text prompts without paying for a larger reasoning model.

Podium chart labeled GPT-4.1 1M, GPT-5 family 400K, o-series 200K, and GPT-4o 128K.

When a bigger context window is not better

A bigger context window can solve the wrong problem. It lets a model receive more material, but it does not guarantee perfect recall, lower cost, or better reasoning. Long prompts increase latency and can make instructions harder to prioritize. They also raise the risk that irrelevant content distracts the model from the few facts that matter.

Illustrative chart showing that attention work grows quickly as prompt length increases.
Illustrative only — the curve explains the concept and is not a measured latency benchmark.

Use long context when the model genuinely needs to compare, transform, or reason over the whole input. Good examples include contract comparison, repository-wide code review, a long customer history, a multi-file incident report, or a transcript where the answer depends on events across the entire conversation. The best GPT model for writing guide has separate advice for long drafts, outlines, and editing passes.

Use retrieval or chunking when the user asks narrow questions over a large collection. A file-search workflow can retrieve only the relevant sections, which is often cheaper and easier to debug than sending the entire collection on every turn. Use summarization when older conversation turns need to remain available but do not need exact wording. Use structured extraction when you only need fields, labels, or decisions.

Process with five stages: question, index, retrieve, compose, and answer for a narrow file-search workflow.

Cost matters too. A 1,047,576-token context model can be the right tool for a rare, high-value analysis. It may be the wrong tool for thousands of short classification calls. If speed is the main constraint, compare this article with our fastest GPT model guide. If budget is the main constraint, start with the cheapest GPT model guide before you optimize for maximum window size.

Practical selection rules

Start with the smallest context window that can reliably hold the task plus a safety margin. Then upgrade only when the model loses important information, needs to compare distant sections, or fails because the prompt must be truncated. The best context window comparison is not the row with the largest number. It is the smallest row that keeps the job accurate.

  • Use GPT-4.1 family when the input is extremely long and the answer can fit inside a 32,768-token output budget.[5]
  • Use GPT-5.5 or GPT-5.5-pro when you want the current top GPT-5 tier for high-value chat, reasoning, coding, or agent tasks.
  • Use GPT-5.4 or GPT-5.2 when the task needs a large context, strong GPT-5 reasoning, and up to the 128,000-token output class.[2]
  • Use GPT-5 mini, GPT-5 nano, or GPT-5.4 mini/nano when you want GPT-5-family context but the task is narrower or high volume.[19][20]
  • Use GPT-4o or GPT-4o mini when 128,000 tokens is enough and the task benefits from text-plus-image input.[6][23]
  • Use o-series models for existing reasoning integrations that depend on o1, o3, or o4-mini behavior.[7][8][9]
  • Avoid legacy GPT-3.5 or GPT-4 models for new long-context work unless compatibility is the main requirement.[11][12]

For a codebase, use GPT-4.1 when the main problem is fitting many files into one request. Use a current GPT-5-family model when the problem is planning, refactoring, tool use, and generating a long patch. For a 90-minute interview transcript, GPT-4o mini may be enough for summaries and tags. For a legal brief with exhibits, GPT-4.1 or a high-tier GPT-5 model is more defensible. For image-heavy prompts, read our GPT-4 Vision guide and our best GPT model for image generation guide separately, because image generation models such as GPT-image models have different limits than GPT chat models.

If you are trying to pick the strongest overall model rather than the widest context window, use the most powerful GPT model benchmark article. Context helps only when the task needs the extra room.

Frequently asked questions

What is a context window?

A context window is the token budget a model can consider at once. It includes the prompt, conversation history, instructions, retrieved content, tool results, reasoning tokens where applicable, and the response. OpenAI’s token guide explains that input and output tokens both affect cost, latency, and whether a request fits.[15]

Which GPT model has the biggest context window?

For the OpenAI API, GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano list the largest published GPT context window at 1,047,576 tokens.[5][21][22] Current GPT-5-family models such as GPT-5.5, GPT-5.4, and GPT-5.2 are in a smaller 400,000-token class, but they offer larger output budgets and stronger current-generation reasoning for many tasks.[2]

Does ChatGPT have the same context window as the API?

No. ChatGPT limits depend on plan, selected mode, automatic routing, tools, and product rules. As last verified in May 2026, OpenAI’s ChatGPT help page lists GPT-5.3 Instant at 16K, 32K, or 128K depending on plan, while manually selected GPT-5.4 Thinking reaches 256K for paid tiers and 400K for Pro.[13] Recheck the Help Center if the exact ChatGPT limit matters.

Is a 1 million-token context window always better?

No. A larger window can increase latency and cost, and irrelevant material can distract the model. Use the largest window only when the task requires whole-document or whole-codebase context. Otherwise, retrieval, chunking, or summaries often work better.

Do output tokens count against the context window?

Yes. You need room for both input and output. For reasoning models, invisible reasoning tokens can also use part of the budget, and OpenAI says those reasoning tokens are billed as output tokens.[27]

Why does GPT-5.2 have less context than GPT-4.1?

OpenAI has not stated a single public reason for that product design. The published model pages show different tradeoffs: GPT-4.1 emphasizes a 1,047,576-token window, while GPT-5.2 emphasizes frontier reasoning, coding, agentic tasks, a 400,000-token window, and 128,000 maximum output tokens.[5][2] Newer GPT-5.4 and GPT-5.5 models should be evaluated as current GPT-5-family choices, but raw context size is still not the only selection criterion.

Editorial independence. chatai.guide is reader-supported and not affiliated with OpenAI. We don’t accept paid placements or sponsored reviews — every recommendation reflects our own testing.