API

OpenAI API Pricing: Every Model Compared

OpenAI API pricing guide for March 10, 2026: model-by-model token costs, Batch discounts, cached input pricing, tools, embeddings, audio, and tips.

Pricing dashboard with token meter and lanes labeled INPUT, CACHE, OUTPUT, BATCH, PRO, and TOOLS.

OpenAI API pricing is usage-based, separate from ChatGPT Plus, and sensitive to model choice, output length, caching, tools, and processing mode. As of May 2026, the current top chat/API tier includes gpt-5.5 and gpt-5.5-pro, while gpt-5.4 and gpt-5.4-pro remain important comparison points with published launch pricing of $2.50 per 1M input tokens and $15 per 1M output tokens for gpt-5.4, and $30 per 1M input tokens and $180 per 1M output tokens for gpt-5.4-pro.[1][2][5] Smaller GPT-5 mini/nano-class models, GPT-4.1 mini/nano, o-series reasoning models, embeddings, realtime audio, image generation, video generation, fine-tuning, storage, and tools can all change the final bill. Treat this guide as a comparison of the major current pricing families, then verify the exact live rate on OpenAI’s pricing page before launch.[1]

Short answer: what OpenAI API pricing costs now

OpenAI API pricing is metered by model, fresh input tokens, cached input tokens, output tokens, and separate tool or storage usage. In May 2026, gpt-5.5 and gpt-5.5-pro are the current frontier chat/API models, with gpt-5.4, gpt-5.4-pro, GPT-5 mini/nano-class models, GPT-4.1 mini/nano, and o-series models still relevant for cost planning.[1]

The practical answer: use the cheapest model that passes your evals, not the newest model by default. Start with mini/nano-class or GPT-4.1 mini/nano models for routine extraction, classification, routing, and formatting. Use gpt-5.5 or gpt-5.4-class models when quality matters, and reserve pro models such as gpt-5.5-pro or gpt-5.4-pro for high-value hard tasks where better answers reduce review time, failed runs, or business risk.[1][7][10][11]

Two pricing cards labeled GPT-5.4 and PRO with chips $2.50, $0.25, $15, $30, and $180.

How OpenAI API billing works

The API bill is separate from ChatGPT subscriptions. A ChatGPT Plus or Pro plan does not bundle API credits; if you are comparing app subscriptions with developer usage, start with ChatGPT API vs ChatGPT Plus and Does ChatGPT Plus Include API Access?.

For most language models, OpenAI bills three token categories. Input tokens are the prompt and supplied context. Cached input tokens are repeated prompt tokens that OpenAI can reuse at a discounted rate. Output tokens are the generated answer, including reasoning tokens where the model exposes or accounts for them.[1][3]

Token billing means two apps using the same model can have very different costs. A short classification endpoint with a tiny JSON response may stay inexpensive even at high request volume. A long agent run that sends files, tool definitions, retrieved documents, and retry turns can become expensive on a cheaper per-token model if the workflow burns too many tokens.

OpenAI also charges separately for some platform tools and resources. Web search, containers, file/vector storage, realtime audio, image/video generation, and fine-tuning can have different meters from standard text generation, so a full budget must include both model tokens and product features used in the request path.[1]

If you are estimating production spend, do not multiply only the visible user prompt. Count the system prompt, developer instructions, retrieved context, function schemas, tool outputs, hidden retries, and streaming completions. For a calculator-style workflow, use the OpenAI API cost calculator after you choose the model family.

Text and reasoning model pricing compared

The table below compares the major public API model families that matter for new builds as of May 2026. It includes current frontier models and important lower-cost families, but it does not list every deprecated dated snapshot. When a model has both a stable alias and dated snapshots, pin a dated snapshot in production if behavior stability matters.

Model or familyBest fitInput priceCached input priceOutput priceNotes
gpt-5.5-proCurrent highest-compute chat/API reasoning for expensive, high-stakes tasksCheck live pricing pageCheck live pricing pageCheck live pricing pageCurrent May 2026 pro-tier option. Use as an escalation model, not a default.[1]
gpt-5.5Current frontier chat/API model for demanding production workloadsCheck live pricing pageCheck live pricing pageCheck live pricing pageEvaluate first for new high-quality builds, then compare against cheaper models on your own eval set.[1]
gpt-5.4-proPrior pro-tier reasoning where published launch pricing is useful for comparison$30 / 1M tokensNot listed in the launch coverage$180 / 1M tokensAvailable as gpt-5.4-pro for complex tasks.[2][5]
gpt-5.4Prior frontier model, still useful for compatibility and price comparisons$2.50 / 1M tokens$0.25 / 1M tokens$15 / 1M tokensReleased to the API on March 5, 2026.[2][5]
gpt-5.2Regression testing and existing GPT-5.2 workloads$1.75 / 1M tokens$0.175 / 1M tokens$14 / 1M tokensUseful when an existing app is tuned around this model’s behavior.[8]
gpt-5.1 and gpt-5General agentic tasks at a lower price point than newer frontier models$1.25 / 1M tokens$0.125 / 1M tokens$10 / 1M tokensBoth were listed at the same text-token prices in referenced model/pricing docs.[9][14]
GPT-5 mini / GPT-5 nano classHigh-volume routing, extraction, classification, and lightweight generationCheck live pricing pageCheck live pricing pageCheck live pricing pageCurrent small-model lane for workloads where cost and latency matter more than maximum reasoning.[1]
gpt-4.1Non-reasoning instruction following, coding, and tool calling$2 / 1M tokens$0.50 / 1M tokens$8 / 1M tokensOpenAI launched GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano in the API with 1M-token context support.[7]
gpt-4.1-miniBalanced low-cost production work$0.40 / 1M tokens$0.10 / 1M tokens$1.60 / 1M tokensUseful for routing, structured extraction, and drafts that can be verified.[7][14]
gpt-4.1-nanoVery high-volume classification, autocomplete, and simple extraction$0.10 / 1M tokens$0.025 / 1M tokens$0.40 / 1M tokensOpenAI described it as the fastest and cheapest GPT-4.1 model at launch.[7][14]
o3, o3-pro, o3-mini, o3-deep-researchReasoning, research-style workflows, math, science, and technical analysisVaries by model; check live pricing pageVaries by model; check live pricing pageVaries by model; check live pricing pageKeep only if their behavior, latency, or tool support beats your GPT-5-family alternative.[1][10]
o4-miniFast, cost-efficient reasoning and visual tasks$1.10 / 1M tokens$0.275 / 1M tokens$4.40 / 1M tokensOpenAI lists o4-mini as succeeded by GPT-5 mini.[11][14]
Codex models, including GPT-5 Codex variantsCoding agents, repo edits, review, and developer workflowsCheck live pricing pageCheck live pricing pageCheck live pricing pageBudget separately from general chat if your app runs long coding-agent loops.[1]

The key pattern is unchanged: output tokens dominate the bill on premium models. In the GPT-5.4 launch pricing, gpt-5.4-pro output was 12 times the gpt-5.4 output price, and pro input was also 12 times the standard gpt-5.4 input price.[2][5] Current pro-tier models should be routed the same way: use them when the value of better reasoning is higher than the extra token cost.

For more model-selection context, see all GPT models compared side by side, the GPT-5 API getting started guide, and context window sizes for every GPT model.

Horizontal price bars labeled NANO, MINI, GPT-5.4, and PRO increasing sharply in length.

Special pricing rules that change the bill

The base token table is only the starting point. OpenAI has processing modes and surcharges that can cut or raise total cost depending on latency, region, context length, and whether the request uses tools or storage.

Pricing ruleWhat changesWhen to use itSource-grounded figure
Batch APIRuns jobs asynchronously instead of immediatelyBackfills, nightly processing, evaluations, dataset labelingOpenAI says Batch can save 50% on inputs and outputs and run asynchronously over 24 hours.[1][2]
Flex processingTrades speed and availability for lower costNon-production or low-priority jobsOpenAI stated Batch and Flex pricing were available at half the standard API rate for gpt-5.4.[2]
Priority processingPays more for reliable high-speed processingUser-facing workloads where latency mattersOpenAI stated Priority processing was available at twice the standard API rate for gpt-5.4.[2]
Cached inputRepeated prompt context is billed lower than fresh inputLong system prompts, repeated instructions, stable retrieval contextgpt-5.4 cached input was listed at $0.25 per 1M tokens versus $2.50 per 1M fresh input tokens.[2][5]
Long-context surchargeVery long prompts can be priced above the standard rateOnly when the model must see the entire large contextFor GPT-5.4 and GPT-5.4 pro, prompts above 272K input tokens were priced at 2x input and 1.5x output for the full session.[3][4]
Regional processingData residency endpoints add a surchargeCompliance-sensitive workloads that require supported regionsOpenAI lists a 10% uplift for GPT-5.4 and GPT-5.4 pro regional processing endpoints.[3][4]
Tool, file, and vector storage metersCharges can be attached to retrieval, containers, file search, stored data, or external-tool callsRAG apps, agents, code execution, long-running assistantsOpenAI’s pricing page is the authoritative place to confirm current tool and storage meters.[1]

The Batch API is the easiest discount to understand. If users do not need an answer immediately, you can often redesign the job around asynchronous processing. See the OpenAI Batch API breakdown for implementation details.

The long-context surcharge deserves special attention. A giant prompt can make a cheaper model more expensive than a smarter model with a smaller, better-curated context. Before sending a whole repository, transcript library, or document vault, try retrieval, chunk ranking, summaries, or a two-stage router.

Flow diagram with paths labeled BATCH -50%, CACHE, LONG 2X, and REGION +10% merging into an invoice.

Embeddings, image, audio, and tool pricing

Not every OpenAI API cost is a chat-completion cost. Retrieval systems, voice agents, image tools, video generation, moderation/safety checks, fine-tuning, and agent platforms may combine several meters in one request path.

CategoryModel or featurePublished price or billing basisPlanning note
Embeddingstext-embedding-3-small$0.00002 per 1K tokensOpenAI said this was a 5x reduction versus text-embedding-ada-002.[12]
Embeddingstext-embedding-3-large$0.00013 per 1K tokensOpenAI said the large model creates embeddings with up to 3,072 dimensions.[12]
Image generationgpt-image-1, gpt-image-1.5, gpt-image-2Check live image-input, cached-image-input, and image-output pricesAs of May 2026, gpt-image-2 is the current top image model. Image costs depend on tokenized image inputs and outputs, resolution, and workflow.[1]
Video generationsora-2 and sora-2-proCheck live video-generation pricingsora-2-pro is the current pro video option; budget by generation settings, length, and retry rate.[1]
Realtime voiceRealtime API model variantsAudio input, cached audio input, and audio output are billed separatelyRealtime apps need separate budgeting for text, image, and audio streams.[1]
Text to speechtts-1$15 per 1M tokensOpenAI lists tts-1 as optimized for realtime text-to-speech use cases.[13]
Fine-tuningFine-tunable model familiesTraining, input, and output charges can differ from base inferenceUse fine-tuning only after prompt, retrieval, and routing baselines are measured.[1]
Files, vector storage, and file searchStorage/retrieval platform featuresMay include storage and tool-use metersRAG systems should budget both embedding/generation tokens and ongoing storage or retrieval charges.[1]
Moderation and safetyModeration or safety endpoints/toolsCheck current pricing page and endpoint docsDo not skip safety checks just to reduce cost; include them in the request-path budget.[1]
Web searchBuilt-in web search tool$10 per 1,000 callsOpenAI says search content tokens are free for this tool on the pricing page.[1]
ContainersCode and tool containersContainer/session billing; verify current live rateAfter the scheduled March 31, 2026 change, container costs should be checked as a per-session/platform-tool line item rather than treated as plain token cost.[1]

Embeddings are usually cheap compared with generation, but vector storage and retrieval quality still matter. If you are building search, recommendation, or RAG features, start with the OpenAI embeddings API guide before optimizing language-model prompts.

Image, video, and voice apps need their own budget model. For images, see the DALL-E API and image generation guide. For speech-to-text workloads, see the Whisper API pricing and code samples. For low-latency voice apps, see the OpenAI Realtime API guide.

Analysis: the cost ladder decision framework

The best way to control OpenAI API pricing is to build a cost ladder. A cost ladder routes each request to the cheapest model that can meet the required accuracy, latency, and safety standard. It is not the same as always choosing the cheapest model.

Start with the failure cost. If a wrong answer creates legal exposure, financial loss, customer churn, or hours of staff review, the higher model price may be rational. If a wrong answer simply triggers a retry or human review, a cheaper model plus validation may win.

Then separate the task into stages. A common production pattern is: cheap model for classification, embeddings retrieval for context, mid-tier model for draft output, and frontier/pro model only for escalation. This design also works well with structured outputs and function calling, because reliable schemas make it easier to route, validate, and retry.

The third step is to measure total tokens, not prompt tokens. Agentic tasks often look affordable at the first turn and expensive after tool calls, retrieved files, code execution, and repeated context. TechCrunch reported that GPT-5.4 introduced Tool Search so the model can look up tool definitions as needed instead of placing every tool definition in the prompt, a change aimed at faster and cheaper requests in systems with many tools.[6]

Illustrative line chart showing full-history context growing faster than a rolling-summary approach over multiple agent turns.
Illustrative pattern only — not measured benchmark data.

This is the pattern across the pricing data: OpenAI gives developers a ladder from very cheap high-volume models to high-compute pro models, but the platform rewards architecture. A smaller model with caching, retrieval, and validation often beats a premium model that receives bloated context.

Cost ladder routing diagram with steps labeled CLASSIFY, DRAFT, VERIFY, and ESCALATE.

Example API cost math

Here is the basic formula for language-model calls:

Total cost = (fresh input tokens / 1,000,000 × input price)
           + (cached input tokens / 1,000,000 × cached input price)
           + (output tokens / 1,000,000 × output price)
           + tool, storage, fine-tuning, or media charges

Suppose a gpt-5.4 request uses 20,000 fresh input tokens, 80,000 cached input tokens, and 5,000 output tokens. Using $2.50 per 1M fresh input tokens, $0.25 per 1M cached input tokens, and $15 per 1M output tokens, the token cost is $0.145 before tool charges.[2][5] The same token counts on gpt-5.4-pro would be much higher because the pro model is $30 per 1M input tokens and $180 per 1M output tokens.[2][5]

Illustrative monthly workload 1: support bot. A support bot handles 100,000 chats per month on gpt-4.1-mini. Each chat averages 2,000 fresh input tokens, 1,000 cached instruction/context tokens, and 500 output tokens. At $0.40 input, $0.10 cached input, and $1.60 output per 1M tokens, the base generation cost is about $170/month. If 5% of chats escalate to gpt-5.4 with an extra 4,000 input tokens and 1,000 output tokens, add about $125/month, for roughly $295/month before web search, storage, or other tools.[7][14][2]

Illustrative monthly workload 2: RAG search. A company embeds 1,000,000 documents averaging 800 tokens each with text-embedding-3-small. The one-time embedding pass is about 800,000 units of 1K tokens, or about $16 at $0.00002 per 1K tokens. If the app then answers 50,000 monthly queries with gpt-4.1-mini using 3,000 input tokens and 300 output tokens per answer, the generation portion is about $84/month before vector storage, file search, reranking, and tool charges.[12][7][14]

Illustrative monthly workload 3: batch summarization. A nightly job summarizes 20,000 documents with gpt-5.4, averaging 12,000 input tokens and 1,000 output tokens per document. Standard token cost would be about $900. If the job qualifies for Batch pricing, the same workload can be about half that, or roughly $450, because OpenAI says Batch can save 50% on inputs and outputs.[1][2]

For production systems, log prompt tokens, cached tokens, completion tokens, model name, tool calls, storage activity, latency, retries, and user-facing outcome. Without these fields, cost optimization becomes guesswork. If something fails in production, the OpenAI API errors breakdown can help distinguish pricing issues from rate limits, authentication failures, and request-shape problems.

How to reduce OpenAI API costs

First, route by task. Use cheaper mini/nano-class models for classification, extraction, moderation prechecks, and simple transformations. Reserve gpt-5.5, gpt-5.4, or pro-tier models for tasks that need complex reasoning, long-horizon tool use, or high-stakes synthesis.

Second, cache stable context. Repeated instructions, policies, schemas, and reference material are strong candidates for cached input pricing. On gpt-5.4, cached input was listed at $0.25 per 1M tokens instead of $2.50 per 1M fresh input tokens.[2][5]

Third, reduce output length. Output tokens are often the most expensive part of a request. Use concise response formats, structured JSON, diff-style edits, citations only when needed, and streaming only when it improves the product experience. For implementation details, see streaming responses with the OpenAI API.

Fourth, use Batch for non-urgent work. OpenAI says Batch can save 50% on inputs and outputs and can run asynchronous work over 24 hours.[1][2] If a user is not waiting on the answer, a live request may be the wrong billing mode.

Fifth, avoid sending huge context by default. Use embeddings, file search, chunk ranking, and summarization. GPT-5.4 and GPT-5.4-pro had a specific surcharge above 272K input tokens, priced at 2x input and 1.5x output for the full session.[3][4] For newer models, verify the current context and surcharge rules before assuming a long prompt is economical.[1]

Finally, treat price as a production metric. Review cost per successful task, not cost per request. A model that costs more per token can still be cheaper if it needs fewer retries, shorter prompts, or less human review. For production patterns, see OpenAI API best practices for production and OpenAI Responses API examples.

Illustrative line chart showing a cheaper route becoming more expensive than a premium route when failure penalties rise.
Illustrative cost tradeoff only — not measured benchmark data.

Frequently asked questions

Is OpenAI API pricing included with ChatGPT Plus?

No. API usage is billed separately from ChatGPT subscriptions. ChatGPT Plus, Pro, Team, or Enterprise access does not automatically include API credits. For a user-facing subscription comparison, read Does ChatGPT Plus Include API Access?.

What is the cheapest OpenAI API model?

Among the cited prices in this guide, GPT-4.1 nano is one of the lowest-cost general text options at $0.10 per 1M input tokens, $0.025 per 1M cached input tokens, and $0.40 per 1M output tokens.[7][14] Current GPT-5 nano-class models may also be appropriate for very high-volume simple tasks; check the live pricing page and test accuracy before routing production traffic.[1]

Why are pro models so much more expensive?

Pro models are priced for higher-compute reasoning. For example, gpt-5.4-pro launched at $30 per 1M input tokens and $180 per 1M output tokens, 12 times the standard gpt-5.4 input and output price.[2][5] Use pro-tier models for escalations where better answers reduce expensive failures, not for routine chat.

Does the Batch API really cut costs?

Yes, when the workload fits asynchronous processing. OpenAI says the Batch API can save 50% on inputs and outputs and run jobs asynchronously over 24 hours.[1][2] It is a strong fit for evaluations, document backfills, enrichment jobs, and offline summarization.

How do cached input tokens affect OpenAI API pricing?

Cached input tokens are repeated prompt tokens billed at a lower rate. For gpt-5.4, the cached input price was listed at $0.25 per 1M tokens, compared with $2.50 per 1M fresh input tokens.[2][5] Apps with stable instructions, repeated schemas, or reusable context can benefit more from caching than apps with entirely new prompts every turn.

Do tools like web search have separate API costs?

Yes. OpenAI lists built-in web search at $10 per 1,000 calls and states that search content tokens are free for this tool on the pricing page.[1] Other tools, containers, storage, fine-tuning, image/video generation, and realtime audio can add separate charges. Always log tool calls along with token usage.

What changed on March 5, 2026?

OpenAI released GPT-5.4 in ChatGPT, the API, and Codex on March 5, 2026, and also made gpt-5.4-pro available in the API.[2][6] That release made GPT-5.4 the main model to evaluate at the time. As of May 2026, however, the newer GPT-5.5 and GPT-5.5-pro models should be included in any fresh API evaluation, with GPT-5.4 retained as a useful price and compatibility reference.[1]

Bottom line

OpenAI API pricing is easiest to manage when you design for routing, caching, Batch processing, short verified outputs, and explicit tool/storage budgets from the beginning. In May 2026, evaluate gpt-5.5 or gpt-5.5-pro for the hardest new workloads, but start production routing with the cheapest model that passes your evals and escalate only the requests that need frontier reasoning.

The categories to watch are not only headline text-token prices. Image generation now includes gpt-image-2, video includes sora-2-pro, realtime apps have audio-specific meters, and fine-tuning, file/vector storage, web search, containers, and regional processing can affect total cost as much as the model row in a pricing table.[1]

More guides in OpenAI API

Editorial independence. chatai.guide is reader-supported and not affiliated with OpenAI. We don’t accept paid placements or sponsored reviews — every recommendation reflects our own testing.