
Last verified: May 4, 2026. OpenAI changes model aliases, ChatGPT plan limits, and availability over time, so treat the numbers below as current guidance rather than permanent product guarantees.
A context window is the total amount of text, images, tool results, reasoning tokens, and output a GPT model can consider in one request or conversation turn. For API work, the widest published OpenAI GPT context window remains the GPT-4.1 family at 1,047,576 tokens, while current GPT-5-family models such as GPT-5.5, GPT-5.4, and GPT-5.2 center on a 400,000-token class with much larger output budgets.[1] ChatGPT is different: visible model-picker limits depend on plan, selected mode, tools, and whether the product surface is Instant or Thinking.[13] This context window comparison explains the practical differences, the published limits, and how to choose a model when long documents, codebases, transcripts, or agent traces are part of the job.
Quick answer
The short answer is: use the GPT-4.1 family when you need the widest published API context window; use current GPT-5-family models such as gpt-5.5, gpt-5.5-pro, gpt-5.4, or gpt-5.2 when you need frontier reasoning with a large 400,000-token working space; and use GPT-4o or GPT-4o mini when 128,000 tokens is enough and you want a balanced multimodal model.[1]
Context length is only one part of model selection. The all GPT models compared side by side guide is the better starting point if you also need benchmarks, latency, tools, and cost. If the same long-context job will run many times, compare it with OpenAI API pricing before you choose the biggest model by default.
For ChatGPT users, the key rule is simpler but more plan-dependent. As last verified in May 2026, OpenAI’s ChatGPT help article lists GPT-5.3 Instant at different limits by plan and lists manually selected GPT-5.4 Thinking at higher paid-tier limits, with Pro reaching a 400K total window split between input and maximum output.[13] Those ChatGPT limits can change and do not mean the same thing as API model limits.

Full context window table
This table focuses on GPT text, multimodal, audio, realtime, chat, coding, and reasoning models where a token context window is relevant. It excludes image-only, video-only, embedding, moderation, and speech-to-text models because those do not behave like general GPT chat models. The API model ID column is included for developers who need the exact string or alias to use in code.
| Model or surface | API model ID or alias | Published context window | Published max output | Best fit |
|---|---|---|---|---|
| GPT-4.1 | gpt-4.1 | 1,047,576 tokens | 32,768 tokens | Largest published API context for long files and codebases.[5] |
| GPT-4.1 mini | gpt-4.1-mini | 1,047,576 tokens | 32,768 tokens | Lower-cost long-context extraction and routing.[21] |
| GPT-4.1 nano | gpt-4.1-nano | 1,047,576 tokens | 32,768 tokens | High-volume long-context classification.[22] |
| GPT-5.5 | gpt-5.5, gpt-5.5-pro | 400,000-token class | 128,000-token class | Current top-tier GPT-5 chat and reasoning work where quality matters more than maximum raw input length. |
| GPT-5.4 | gpt-5.4, gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano | 400,000-token class | 128,000-token class | Current GPT-5-family work across frontier, mini, and nano variants. |
| GPT-5.3 Chat / GPT-5.3 Codex | gpt-5.3-chat-latest, gpt-5.3-codex | Product- or endpoint-dependent | Product- or endpoint-dependent | ChatGPT-aligned and coding-specific GPT-5.3 surfaces; verify the active endpoint before deploying. |
| GPT-5.2 | gpt-5.2, gpt-5.2-pro, gpt-5.2-codex | 400,000 tokens for gpt-5.2 | 128,000 tokens for gpt-5.2 | Frontier reasoning, coding, tool use, and long agent tasks.[2] |
| GPT-5.1 | gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini | 400,000 tokens for gpt-5.1 | 128,000 tokens for gpt-5.1 | Large-context GPT-5 work with configurable reasoning effort.[3] |
| GPT-5 | gpt-5 | 400,000 tokens | 128,000 tokens | Earlier GPT-5 reasoning and agentic workloads.[4] |
| GPT-5 mini | gpt-5-mini | 400,000 tokens | 128,000 tokens | Cost-optimized GPT-5-family tasks with large context.[19] |
| GPT-5 nano | gpt-5-nano | 400,000 tokens | 128,000 tokens | Fast classification and summarization at GPT-5 context size.[20] |
| GPT-5.2 Chat | gpt-5.2-chat-latest | 128,000 tokens | 16,384 tokens | API testing against the ChatGPT GPT-5.2 snapshot.[24] |
| GPT-5.1 Chat | gpt-5.1-chat-latest | 128,000 tokens | 16,384 tokens | API testing against the ChatGPT GPT-5.1 snapshot.[25] |
| o3 | o3, o3-pro, o3-deep-research | 200,000 tokens for o3 | 100,000 tokens for o3 | Reasoning workflows that need long answers.[7] |
| o4-mini | o4-mini | 200,000 tokens | 100,000 tokens | Fast, lower-cost reasoning with image input.[8] |
| o3-mini | o3-mini | 200,000 tokens | 100,000 tokens | Small reasoning tasks with structured outputs.[17] |
| o1 | o1 | 200,000 tokens | 100,000 tokens | Earlier full o-series reasoning.[9] |
| o1-pro | o1-pro | 200,000 tokens | 100,000 tokens | Higher-compute o1 reasoning through the Responses API.[18] |
| GPT-4o | gpt-4o | 128,000 tokens | 16,384 tokens | General multimodal work with text and image input.[6] |
| GPT-4o mini | gpt-4o-mini | 128,000 tokens | 16,384 tokens | Affordable multimodal classification and extraction.[23] |
| GPT-4o Audio | gpt-4o-audio-preview | 128,000 tokens | 16,384 tokens | Chat completions with audio input and output.[26] |
| GPT-4.5 Preview | gpt-4.5-preview | 128,000 tokens | 16,384 tokens | Deprecated preview model; use newer models when possible.[16] |
| GPT-4 Turbo | gpt-4-turbo | 128,000 tokens | 4,096 tokens | Older GPT-4 generation with image input.[10] |
| GPT-4 | gpt-4 | 8,192 tokens | 8,192 tokens | Older GPT-4 chat compatibility.[11] |
| GPT-3.5 Turbo | gpt-3.5-turbo | 16,385 tokens | 4,096 tokens | Legacy chat and fine-tuning compatibility.[12] |
OpenAI’s model comparison page confirms the broad pattern: GPT-4.1 exposes the largest published API window, GPT-5-era models occupy a large 400,000-token class, and GPT-4o sits at 128,000 tokens.[1] OpenAI’s GPT-4.1 launch article describes that family as supporting up to 1 million tokens of context, which matches the model pages while rounding the exact API figure.[14]

API context windows are not the same as ChatGPT context windows
Most confusion comes from mixing two products. The OpenAI API exposes model-specific token windows, output limits, endpoints, and rate limits. ChatGPT exposes a product experience with plans, model-picker modes, tools, memory, files, and automatic routing. A model name in ChatGPT can have a different practical context limit than a related API model.
As last verified in May 2026, OpenAI’s ChatGPT help page lists GPT-5.3 Instant as the default for logged-in users and shows different context windows by plan: 16K for Free, 32K for Plus and Business, and 128K for Pro and Enterprise.[13] The same page lists manually selected GPT-5.4 Thinking at 256K for all paid tiers and 400K for Pro, with the Pro limit split as 272K input plus 128K maximum output.[13] These are ChatGPT product limits, not universal API limits.
That means a ChatGPT Plus user and an API developer can both say they are using a GPT-5-era model while seeing different limits. This is normal. ChatGPT wraps models in product rules. The API exposes more direct model parameters, but it also requires you to manage token budgets, truncation, file retrieval, and billing yourself.
| Where you use the model | What controls the limit | What to check first |
|---|---|---|
| OpenAI API | Model page, exact model ID, endpoint, max output setting, and rate limits | Published context window and max output tokens for the ID you call |
| ChatGPT | Plan, selected mode, tools, model routing, conversation state, and product caps | Help Center model and limits page, checked close to the time you rely on it |
| Custom GPTs and files | ChatGPT product rules plus retrieval behavior | Whether the content is loaded into context or retrieved in chunks |
If you are comparing paid ChatGPT plans, pair this guide with our ChatGPT Plus price in 2026 breakdown. If you are building an app, the API table matters more than the ChatGPT model picker.

How to read the numbers
A model’s context window is not just how much you can paste. It is the combined budget for the prompt, conversation history, system and developer instructions, tool outputs, retrieved file chunks, image representations, reasoning tokens where applicable, and the model’s answer. OpenAI’s token guide explains that both input and output tokens count toward usage, cost, latency, and whether a request fits within the model limit.[15]
Context window
The context window is the total token budget. A 128,000-token window does not guarantee that you can paste 128,000 tokens and still receive a long answer. If you need a 10,000-token response, you must leave room for that response and for any hidden or intermediate tokens the model uses.

Maximum output tokens
Maximum output is the largest answer budget the model page publishes. GPT-5.2 lists a 128,000-token maximum output, while GPT-4.1 lists 32,768 tokens, and GPT-4o lists 16,384 tokens.[2][5][6] This matters for report generation, code generation, transcript cleanup, and any workflow where the answer itself must be long.
Reasoning tokens
Reasoning models can spend part of the context budget on internal reasoning before producing the visible answer. OpenAI’s pricing page notes that reasoning tokens are not visible through the API, but they still occupy context-window space and are billed as output tokens.[27] For long prompts, a high reasoning setting can reduce the remaining room for the final answer.
This is why long-context reasoning work needs a margin. Do not fill a 400,000-token model to the top and expect a 128,000-token answer. Leave space for tool results, retries, reasoning, and formatting. For coding tasks, also leave room for diffs, tests, compiler output, and follow-up instructions. Our best GPT model for coding guide covers that tradeoff in more detail.
Which model has the largest context window
The GPT-4.1 family has the largest published API context window among the GPT models covered here. GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano each list 1,047,576 tokens of context and 32,768 maximum output tokens.[5][21][22] OpenAI’s launch post rounds that family to up to 1 million tokens and says the jump was from 128,000 tokens for previous GPT-4o models.[14]
The current GPT-5 family is not the largest by raw context size, but it gives a different balance. GPT-5.5 and GPT-5.5-pro are the current top tier for chat-style GPT-5 work, while GPT-5.4 and GPT-5.2 remain important production choices. GPT-5.2 lists a 400,000-token context window and a 128,000-token maximum output, and OpenAI describes it as a frontier model for coding and agentic tasks.[2] That combination can be better than GPT-4.1 when the task needs deep reasoning, tool use, and a very long generated answer rather than the largest possible input.
The o-series reasoning models sit between those two tiers. o3, o4-mini, o3-mini, o1, and o1-pro each list 200,000 tokens of context and 100,000 maximum output tokens in their model pages.[7][8][17][9][18] They remain relevant for existing integrations, though new complex work should be compared against current GPT-5-family models before standardizing on an older o-series ID.
For everyday multimodal work, GPT-4o and GPT-4o mini remain in the 128,000-token class, with 16,384 maximum output tokens.[6][23] That is still a large window for many user-facing applications. It can cover long documents, batches of support messages, and many image-plus-text prompts without paying for a larger reasoning model.

When a bigger context window is not better
A bigger context window can solve the wrong problem. It lets a model receive more material, but it does not guarantee perfect recall, lower cost, or better reasoning. Long prompts increase latency and can make instructions harder to prioritize. They also raise the risk that irrelevant content distracts the model from the few facts that matter.

Use long context when the model genuinely needs to compare, transform, or reason over the whole input. Good examples include contract comparison, repository-wide code review, a long customer history, a multi-file incident report, or a transcript where the answer depends on events across the entire conversation. The best GPT model for writing guide has separate advice for long drafts, outlines, and editing passes.
Use retrieval or chunking when the user asks narrow questions over a large collection. A file-search workflow can retrieve only the relevant sections, which is often cheaper and easier to debug than sending the entire collection on every turn. Use summarization when older conversation turns need to remain available but do not need exact wording. Use structured extraction when you only need fields, labels, or decisions.

Cost matters too. A 1,047,576-token context model can be the right tool for a rare, high-value analysis. It may be the wrong tool for thousands of short classification calls. If speed is the main constraint, compare this article with our fastest GPT model guide. If budget is the main constraint, start with the cheapest GPT model guide before you optimize for maximum window size.
Practical selection rules
Start with the smallest context window that can reliably hold the task plus a safety margin. Then upgrade only when the model loses important information, needs to compare distant sections, or fails because the prompt must be truncated. The best context window comparison is not the row with the largest number. It is the smallest row that keeps the job accurate.
- Use GPT-4.1 family when the input is extremely long and the answer can fit inside a 32,768-token output budget.[5]
- Use GPT-5.5 or GPT-5.5-pro when you want the current top GPT-5 tier for high-value chat, reasoning, coding, or agent tasks.
- Use GPT-5.4 or GPT-5.2 when the task needs a large context, strong GPT-5 reasoning, and up to the 128,000-token output class.[2]
- Use GPT-5 mini, GPT-5 nano, or GPT-5.4 mini/nano when you want GPT-5-family context but the task is narrower or high volume.[19][20]
- Use GPT-4o or GPT-4o mini when 128,000 tokens is enough and the task benefits from text-plus-image input.[6][23]
- Use o-series models for existing reasoning integrations that depend on o1, o3, or o4-mini behavior.[7][8][9]
- Avoid legacy GPT-3.5 or GPT-4 models for new long-context work unless compatibility is the main requirement.[11][12]
For a codebase, use GPT-4.1 when the main problem is fitting many files into one request. Use a current GPT-5-family model when the problem is planning, refactoring, tool use, and generating a long patch. For a 90-minute interview transcript, GPT-4o mini may be enough for summaries and tags. For a legal brief with exhibits, GPT-4.1 or a high-tier GPT-5 model is more defensible. For image-heavy prompts, read our GPT-4 Vision guide and our best GPT model for image generation guide separately, because image generation models such as GPT-image models have different limits than GPT chat models.
If you are trying to pick the strongest overall model rather than the widest context window, use the most powerful GPT model benchmark article. Context helps only when the task needs the extra room.
Frequently asked questions
What is a context window?
A context window is the token budget a model can consider at once. It includes the prompt, conversation history, instructions, retrieved content, tool results, reasoning tokens where applicable, and the response. OpenAI’s token guide explains that input and output tokens both affect cost, latency, and whether a request fits.[15]
Which GPT model has the biggest context window?
For the OpenAI API, GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano list the largest published GPT context window at 1,047,576 tokens.[5][21][22] Current GPT-5-family models such as GPT-5.5, GPT-5.4, and GPT-5.2 are in a smaller 400,000-token class, but they offer larger output budgets and stronger current-generation reasoning for many tasks.[2]
Does ChatGPT have the same context window as the API?
No. ChatGPT limits depend on plan, selected mode, automatic routing, tools, and product rules. As last verified in May 2026, OpenAI’s ChatGPT help page lists GPT-5.3 Instant at 16K, 32K, or 128K depending on plan, while manually selected GPT-5.4 Thinking reaches 256K for paid tiers and 400K for Pro.[13] Recheck the Help Center if the exact ChatGPT limit matters.
Is a 1 million-token context window always better?
No. A larger window can increase latency and cost, and irrelevant material can distract the model. Use the largest window only when the task requires whole-document or whole-codebase context. Otherwise, retrieval, chunking, or summaries often work better.
Do output tokens count against the context window?
Yes. You need room for both input and output. For reasoning models, invisible reasoning tokens can also use part of the budget, and OpenAI says those reasoning tokens are billed as output tokens.[27]
Why does GPT-5.2 have less context than GPT-4.1?
OpenAI has not stated a single public reason for that product design. The published model pages show different tradeoffs: GPT-4.1 emphasizes a 1,047,576-token window, while GPT-5.2 emphasizes frontier reasoning, coding, agentic tasks, a 400,000-token window, and 128,000 maximum output tokens.[5][2] Newer GPT-5.4 and GPT-5.5 models should be evaluated as current GPT-5-family choices, but raw context size is still not the only selection criterion.
