API

OpenAI API Pricing: Every Model Compared

OpenAI API pricing guide for March 10, 2026: model-by-model token costs, Batch discounts, cached input pricing, tools, embeddings, audio, and tips.

By ChatAI Guide Editorial Updated May 5, 2026 16 min read

Pricing dashboard with token meter and lanes labeled INPUT, CACHE, OUTPUT, BATCH, PRO, and TOOLS.

OpenAI API pricing is usage-based, separate from ChatGPT Plus, and sensitive to model choice, output length, caching, tools, and processing mode. As of May 2026, the current top chat/API tier includes gpt-5.5 and gpt-5.5-pro, while gpt-5.4 and gpt-5.4-pro remain important comparison points with published launch pricing of $2.50 per 1M input tokens and $15 per 1M output tokens for gpt-5.4, and $30 per 1M input tokens and $180 per 1M output tokens for gpt-5.4-pro.^[1]^[2]^[5] Smaller GPT-5 mini/nano-class models, GPT-4.1 mini/nano, o-series reasoning models, embeddings, realtime audio, image generation, video generation, fine-tuning, storage, and tools can all change the final bill. Treat this guide as a comparison of the major current pricing families, then verify the exact live rate on OpenAI’s pricing page before launch.^[1]

Short answer: what OpenAI API pricing costs now

OpenAI API pricing is metered by model, fresh input tokens, cached input tokens, output tokens, and separate tool or storage usage. In May 2026, gpt-5.5 and gpt-5.5-pro are the current frontier chat/API models, with gpt-5.4, gpt-5.4-pro, GPT-5 mini/nano-class models, GPT-4.1 mini/nano, and o-series models still relevant for cost planning.^[1]

The practical answer: use the cheapest model that passes your evals, not the newest model by default. Start with mini/nano-class or GPT-4.1 mini/nano models for routine extraction, classification, routing, and formatting. Use gpt-5.5 or gpt-5.4-class models when quality matters, and reserve pro models such as gpt-5.5-pro or gpt-5.4-pro for high-value hard tasks where better answers reduce review time, failed runs, or business risk.^[1]^[7]^[10]^[11]

Two pricing cards labeled GPT-5.4 and PRO with chips $2.50, $0.25, $15, $30, and $180.

How OpenAI API billing works

The API bill is separate from ChatGPT subscriptions. A ChatGPT Plus or Pro plan does not bundle API credits; if you are comparing app subscriptions with developer usage, start with ChatGPT API vs ChatGPT Plus and Does ChatGPT Plus Include API Access?.

For most language models, OpenAI bills three token categories. Input tokens are the prompt and supplied context. Cached input tokens are repeated prompt tokens that OpenAI can reuse at a discounted rate. Output tokens are the generated answer, including reasoning tokens where the model exposes or accounts for them.^[1]^[3]

Token billing means two apps using the same model can have very different costs. A short classification endpoint with a tiny JSON response may stay inexpensive even at high request volume. A long agent run that sends files, tool definitions, retrieved documents, and retry turns can become expensive on a cheaper per-token model if the workflow burns too many tokens.

OpenAI also charges separately for some platform tools and resources. Web search, containers, file/vector storage, realtime audio, image/video generation, and fine-tuning can have different meters from standard text generation, so a full budget must include both model tokens and product features used in the request path.^[1]

If you are estimating production spend, do not multiply only the visible user prompt. Count the system prompt, developer instructions, retrieved context, function schemas, tool outputs, hidden retries, and streaming completions. For a calculator-style workflow, use the OpenAI API cost calculator after you choose the model family.

Text and reasoning model pricing compared

The table below compares the major public API model families that matter for new builds as of May 2026. It includes current frontier models and important lower-cost families, but it does not list every deprecated dated snapshot. When a model has both a stable alias and dated snapshots, pin a dated snapshot in production if behavior stability matters.

Model or family	Best fit	Input price	Cached input price	Output price	Notes
gpt-5.5-pro	Current highest-compute chat/API reasoning for expensive, high-stakes tasks	Check live pricing page	Check live pricing page	Check live pricing page	Current May 2026 pro-tier option. Use as an escalation model, not a default.^[1]
gpt-5.5	Current frontier chat/API model for demanding production workloads	Check live pricing page	Check live pricing page	Check live pricing page	Evaluate first for new high-quality builds, then compare against cheaper models on your own eval set.^[1]
gpt-5.4-pro	Prior pro-tier reasoning where published launch pricing is useful for comparison	$30 / 1M tokens	Not listed in the launch coverage	$180 / 1M tokens	Available as gpt-5.4-pro for complex tasks.^[2]^[5]
gpt-5.4	Prior frontier model, still useful for compatibility and price comparisons	$2.50 / 1M tokens	$0.25 / 1M tokens	$15 / 1M tokens	Released to the API on March 5, 2026.^[2]^[5]
gpt-5.2	Regression testing and existing GPT-5.2 workloads	$1.75 / 1M tokens	$0.175 / 1M tokens	$14 / 1M tokens	Useful when an existing app is tuned around this model’s behavior.^[8]
gpt-5.1 and gpt-5	General agentic tasks at a lower price point than newer frontier models	$1.25 / 1M tokens	$0.125 / 1M tokens	$10 / 1M tokens	Both were listed at the same text-token prices in referenced model/pricing docs.^[9]^[14]
GPT-5 mini / GPT-5 nano class	High-volume routing, extraction, classification, and lightweight generation	Check live pricing page	Check live pricing page	Check live pricing page	Current small-model lane for workloads where cost and latency matter more than maximum reasoning.^[1]
gpt-4.1	Non-reasoning instruction following, coding, and tool calling	$2 / 1M tokens	$0.50 / 1M tokens	$8 / 1M tokens	OpenAI launched GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano in the API with 1M-token context support.^[7]
gpt-4.1-mini	Balanced low-cost production work	$0.40 / 1M tokens	$0.10 / 1M tokens	$1.60 / 1M tokens	Useful for routing, structured extraction, and drafts that can be verified.^[7]^[14]
gpt-4.1-nano	Very high-volume classification, autocomplete, and simple extraction	$0.10 / 1M tokens	$0.025 / 1M tokens	$0.40 / 1M tokens	OpenAI described it as the fastest and cheapest GPT-4.1 model at launch.^[7]^[14]
o3, o3-pro, o3-mini, o3-deep-research	Reasoning, research-style workflows, math, science, and technical analysis	Varies by model; check live pricing page	Varies by model; check live pricing page	Varies by model; check live pricing page	Keep only if their behavior, latency, or tool support beats your GPT-5-family alternative.^[1]^[10]
o4-mini	Fast, cost-efficient reasoning and visual tasks	$1.10 / 1M tokens	$0.275 / 1M tokens	$4.40 / 1M tokens	OpenAI lists o4-mini as succeeded by GPT-5 mini.^[11]^[14]
Codex models, including GPT-5 Codex variants	Coding agents, repo edits, review, and developer workflows	Check live pricing page	Check live pricing page	Check live pricing page	Budget separately from general chat if your app runs long coding-agent loops.^[1]

The key pattern is unchanged: output tokens dominate the bill on premium models. In the GPT-5.4 launch pricing, gpt-5.4-pro output was 12 times the gpt-5.4 output price, and pro input was also 12 times the standard gpt-5.4 input price.^[2]^[5] Current pro-tier models should be routed the same way: use them when the value of better reasoning is higher than the extra token cost.

For more model-selection context, see all GPT models compared side by side, the GPT-5 API getting started guide, and context window sizes for every GPT model.

Horizontal price bars labeled NANO, MINI, GPT-5.4, and PRO increasing sharply in length.

Special pricing rules that change the bill

The base token table is only the starting point. OpenAI has processing modes and surcharges that can cut or raise total cost depending on latency, region, context length, and whether the request uses tools or storage.

Pricing rule	What changes	When to use it	Source-grounded figure
Batch API	Runs jobs asynchronously instead of immediately	Backfills, nightly processing, evaluations, dataset labeling	OpenAI says Batch can save 50% on inputs and outputs and run asynchronously over 24 hours.^[1]^[2]
Flex processing	Trades speed and availability for lower cost	Non-production or low-priority jobs	OpenAI stated Batch and Flex pricing were available at half the standard API rate for gpt-5.4.^[2]
Priority processing	Pays more for reliable high-speed processing	User-facing workloads where latency matters	OpenAI stated Priority processing was available at twice the standard API rate for gpt-5.4.^[2]
Cached input	Repeated prompt context is billed lower than fresh input	Long system prompts, repeated instructions, stable retrieval context	gpt-5.4 cached input was listed at $0.25 per 1M tokens versus $2.50 per 1M fresh input tokens.^[2]^[5]
Long-context surcharge	Very long prompts can be priced above the standard rate	Only when the model must see the entire large context	For GPT-5.4 and GPT-5.4 pro, prompts above 272K input tokens were priced at 2x input and 1.5x output for the full session.^[3]^[4]
Regional processing	Data residency endpoints add a surcharge	Compliance-sensitive workloads that require supported regions	OpenAI lists a 10% uplift for GPT-5.4 and GPT-5.4 pro regional processing endpoints.^[3]^[4]
Tool, file, and vector storage meters	Charges can be attached to retrieval, containers, file search, stored data, or external-tool calls	RAG apps, agents, code execution, long-running assistants	OpenAI’s pricing page is the authoritative place to confirm current tool and storage meters.^[1]

The Batch API is the easiest discount to understand. If users do not need an answer immediately, you can often redesign the job around asynchronous processing. See the OpenAI Batch API breakdown for implementation details.

The long-context surcharge deserves special attention. A giant prompt can make a cheaper model more expensive than a smarter model with a smaller, better-curated context. Before sending a whole repository, transcript library, or document vault, try retrieval, chunk ranking, summaries, or a two-stage router.

Flow diagram with paths labeled BATCH -50%, CACHE, LONG 2X, and REGION +10% merging into an invoice.

Embeddings, image, audio, and tool pricing

Not every OpenAI API cost is a chat-completion cost. Retrieval systems, voice agents, image tools, video generation, moderation/safety checks, fine-tuning, and agent platforms may combine several meters in one request path.

Category	Model or feature	Published price or billing basis	Planning note
Embeddings	text-embedding-3-small	$0.00002 per 1K tokens	OpenAI said this was a 5x reduction versus text-embedding-ada-002.^[12]
Embeddings	text-embedding-3-large	$0.00013 per 1K tokens	OpenAI said the large model creates embeddings with up to 3,072 dimensions.^[12]
Image generation	gpt-image-1, gpt-image-1.5, gpt-image-2	Check live image-input, cached-image-input, and image-output prices	As of May 2026, gpt-image-2 is the current top image model. Image costs depend on tokenized image inputs and outputs, resolution, and workflow.^[1]
Video generation	sora-2 and sora-2-pro	Check live video-generation pricing	sora-2-pro is the current pro video option; budget by generation settings, length, and retry rate.^[1]
Realtime voice	Realtime API model variants	Audio input, cached audio input, and audio output are billed separately	Realtime apps need separate budgeting for text, image, and audio streams.^[1]
Text to speech	tts-1	$15 per 1M tokens	OpenAI lists tts-1 as optimized for realtime text-to-speech use cases.^[13]
Fine-tuning	Fine-tunable model families	Training, input, and output charges can differ from base inference	Use fine-tuning only after prompt, retrieval, and routing baselines are measured.^[1]
Files, vector storage, and file search	Storage/retrieval platform features	May include storage and tool-use meters	RAG systems should budget both embedding/generation tokens and ongoing storage or retrieval charges.^[1]
Moderation and safety	Moderation or safety endpoints/tools	Check current pricing page and endpoint docs	Do not skip safety checks just to reduce cost; include them in the request-path budget.^[1]
Web search	Built-in web search tool	$10 per 1,000 calls	OpenAI says search content tokens are free for this tool on the pricing page.^[1]
Containers	Code and tool containers	Container/session billing; verify current live rate	After the scheduled March 31, 2026 change, container costs should be checked as a per-session/platform-tool line item rather than treated as plain token cost.^[1]

Embeddings are usually cheap compared with generation, but vector storage and retrieval quality still matter. If you are building search, recommendation, or RAG features, start with the OpenAI embeddings API guide before optimizing language-model prompts.

Image, video, and voice apps need their own budget model. For images, see the DALL-E API and image generation guide. For speech-to-text workloads, see the Whisper API pricing and code samples. For low-latency voice apps, see the OpenAI Realtime API guide.

Analysis: the cost ladder decision framework

The best way to control OpenAI API pricing is to build a cost ladder. A cost ladder routes each request to the cheapest model that can meet the required accuracy, latency, and safety standard. It is not the same as always choosing the cheapest model.

Start with the failure cost. If a wrong answer creates legal exposure, financial loss, customer churn, or hours of staff review, the higher model price may be rational. If a wrong answer simply triggers a retry or human review, a cheaper model plus validation may win.

Then separate the task into stages. A common production pattern is: cheap model for classification, embeddings retrieval for context, mid-tier model for draft output, and frontier/pro model only for escalation. This design also works well with structured outputs and function calling, because reliable schemas make it easier to route, validate, and retry.

The third step is to measure total tokens, not prompt tokens. Agentic tasks often look affordable at the first turn and expensive after tool calls, retrieved files, code execution, and repeated context. TechCrunch reported that GPT-5.4 introduced Tool Search so the model can look up tool definitions as needed instead of placing every tool definition in the prompt, a change aimed at faster and cheaper requests in systems with many tools.^[6]

Illustrative line chart showing full-history context growing faster than a rolling-summary approach over multiple agent turns. — Illustrative pattern only — not measured benchmark data.

This is the pattern across the pricing data: OpenAI gives developers a ladder from very cheap high-volume models to high-compute pro models, but the platform rewards architecture. A smaller model with caching, retrieval, and validation often beats a premium model that receives bloated context.

Cost ladder routing diagram with steps labeled CLASSIFY, DRAFT, VERIFY, and ESCALATE.

Example API cost math

Here is the basic formula for language-model calls:

Total cost = (fresh input tokens / 1,000,000 × input price)
           + (cached input tokens / 1,000,000 × cached input price)
           + (output tokens / 1,000,000 × output price)
           + tool, storage, fine-tuning, or media charges

Suppose a gpt-5.4 request uses 20,000 fresh input tokens, 80,000 cached input tokens, and 5,000 output tokens. Using $2.50 per 1M fresh input tokens, $0.25 per 1M cached input tokens, and $15 per 1M output tokens, the token cost is $0.145 before tool charges.^[2]^[5] The same token counts on gpt-5.4-pro would be much higher because the pro model is $30 per 1M input tokens and $180 per 1M output tokens.^[2]^[5]

Illustrative monthly workload 1: support bot. A support bot handles 100,000 chats per month on gpt-4.1-mini. Each chat averages 2,000 fresh input tokens, 1,000 cached instruction/context tokens, and 500 output tokens. At $0.40 input, $0.10 cached input, and $1.60 output per 1M tokens, the base generation cost is about $170/month. If 5% of chats escalate to gpt-5.4 with an extra 4,000 input tokens and 1,000 output tokens, add about $125/month, for roughly $295/month before web search, storage, or other tools.^[7]^[14]^[2]

Illustrative monthly workload 2: RAG search. A company embeds 1,000,000 documents averaging 800 tokens each with text-embedding-3-small. The one-time embedding pass is about 800,000 units of 1K tokens, or about $16 at $0.00002 per 1K tokens. If the app then answers 50,000 monthly queries with gpt-4.1-mini using 3,000 input tokens and 300 output tokens per answer, the generation portion is about $84/month before vector storage, file search, reranking, and tool charges.^[12]^[7]^[14]

Illustrative monthly workload 3: batch summarization. A nightly job summarizes 20,000 documents with gpt-5.4, averaging 12,000 input tokens and 1,000 output tokens per document. Standard token cost would be about $900. If the job qualifies for Batch pricing, the same workload can be about half that, or roughly $450, because OpenAI says Batch can save 50% on inputs and outputs.^[1]^[2]

For production systems, log prompt tokens, cached tokens, completion tokens, model name, tool calls, storage activity, latency, retries, and user-facing outcome. Without these fields, cost optimization becomes guesswork. If something fails in production, the OpenAI API errors breakdown can help distinguish pricing issues from rate limits, authentication failures, and request-shape problems.

How to reduce OpenAI API costs

First, route by task. Use cheaper mini/nano-class models for classification, extraction, moderation prechecks, and simple transformations. Reserve gpt-5.5, gpt-5.4, or pro-tier models for tasks that need complex reasoning, long-horizon tool use, or high-stakes synthesis.

Second, cache stable context. Repeated instructions, policies, schemas, and reference material are strong candidates for cached input pricing. On gpt-5.4, cached input was listed at $0.25 per 1M tokens instead of $2.50 per 1M fresh input tokens.^[2]^[5]

Third, reduce output length. Output tokens are often the most expensive part of a request. Use concise response formats, structured JSON, diff-style edits, citations only when needed, and streaming only when it improves the product experience. For implementation details, see streaming responses with the OpenAI API.

Fourth, use Batch for non-urgent work. OpenAI says Batch can save 50% on inputs and outputs and can run asynchronous work over 24 hours.^[1]^[2] If a user is not waiting on the answer, a live request may be the wrong billing mode.

Fifth, avoid sending huge context by default. Use embeddings, file search, chunk ranking, and summarization. GPT-5.4 and GPT-5.4-pro had a specific surcharge above 272K input tokens, priced at 2x input and 1.5x output for the full session.^[3]^[4] For newer models, verify the current context and surcharge rules before assuming a long prompt is economical.^[1]

Finally, treat price as a production metric. Review cost per successful task, not cost per request. A model that costs more per token can still be cheaper if it needs fewer retries, shorter prompts, or less human review. For production patterns, see OpenAI API best practices for production and OpenAI Responses API examples.

Illustrative line chart showing a cheaper route becoming more expensive than a premium route when failure penalties rise. — Illustrative cost tradeoff only — not measured benchmark data.

Frequently asked questions

Is OpenAI API pricing included with ChatGPT Plus?

No. API usage is billed separately from ChatGPT subscriptions. ChatGPT Plus, Pro, Team, or Enterprise access does not automatically include API credits. For a user-facing subscription comparison, read Does ChatGPT Plus Include API Access?.

What is the cheapest OpenAI API model?

Among the cited prices in this guide, GPT-4.1 nano is one of the lowest-cost general text options at $0.10 per 1M input tokens, $0.025 per 1M cached input tokens, and $0.40 per 1M output tokens.^[7]^[14] Current GPT-5 nano-class models may also be appropriate for very high-volume simple tasks; check the live pricing page and test accuracy before routing production traffic.^[1]

Why are pro models so much more expensive?

Pro models are priced for higher-compute reasoning. For example, gpt-5.4-pro launched at $30 per 1M input tokens and $180 per 1M output tokens, 12 times the standard gpt-5.4 input and output price.^[2]^[5] Use pro-tier models for escalations where better answers reduce expensive failures, not for routine chat.

Does the Batch API really cut costs?

Yes, when the workload fits asynchronous processing. OpenAI says the Batch API can save 50% on inputs and outputs and run jobs asynchronously over 24 hours.^[1]^[2] It is a strong fit for evaluations, document backfills, enrichment jobs, and offline summarization.

How do cached input tokens affect OpenAI API pricing?

Cached input tokens are repeated prompt tokens billed at a lower rate. For gpt-5.4, the cached input price was listed at $0.25 per 1M tokens, compared with $2.50 per 1M fresh input tokens.^[2]^[5] Apps with stable instructions, repeated schemas, or reusable context can benefit more from caching than apps with entirely new prompts every turn.

Do tools like web search have separate API costs?

Yes. OpenAI lists built-in web search at $10 per 1,000 calls and states that search content tokens are free for this tool on the pricing page.^[1] Other tools, containers, storage, fine-tuning, image/video generation, and realtime audio can add separate charges. Always log tool calls along with token usage.

What changed on March 5, 2026?

OpenAI released GPT-5.4 in ChatGPT, the API, and Codex on March 5, 2026, and also made gpt-5.4-pro available in the API.^[2]^[6] That release made GPT-5.4 the main model to evaluate at the time. As of May 2026, however, the newer GPT-5.5 and GPT-5.5-pro models should be included in any fresh API evaluation, with GPT-5.4 retained as a useful price and compatibility reference.^[1]

Bottom line

OpenAI API pricing is easiest to manage when you design for routing, caching, Batch processing, short verified outputs, and explicit tool/storage budgets from the beginning. In May 2026, evaluate gpt-5.5 or gpt-5.5-pro for the hardest new workloads, but start production routing with the cheapest model that passes your evals and escalate only the requests that need frontier reasoning.

The categories to watch are not only headline text-token prices. Image generation now includes gpt-image-2, video includes sora-2-pro, realtime apps have audio-specific meters, and fine-tuning, file/vector storage, web search, containers, and regional processing can affect total cost as much as the model row in a pricing table.^[1]

More guides in OpenAI API

Sources & references

14 cited

Each fact in this article was checked against the sources below. Numbers in the body link to the matching entry here.

1

OpenAI API Pricing
OpenAI openai.com accessed March 10, 2026
2

Introducing GPT-5.4
OpenAI openai.com accessed March 10, 2026
3

GPT-5.4 Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
4

GPT-5.4 pro Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
5

ChatGPT 5.4 is apparently a big spreadsheet fan — and even comes with some special Excel and Google Sheets tools
TechRadar techradar.com accessed March 10, 2026
6

OpenAI launches GPT-5.4 with Pro and Thinking versions
TechCrunch techcrunch.com accessed March 10, 2026
7

Introducing GPT-4.1 in the API
OpenAI openai.com accessed March 10, 2026
8

GPT-5.2 Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
9

GPT-5.1 Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
10

o3 Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
11

o4-mini Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
12

New embedding models and API updates
OpenAI openai.com accessed March 10, 2026
13

TTS-1 Model | OpenAI API
OpenAI Developers developers.openai.com accessed March 10, 2026
14

OpenAI API Pricing 2026 — GPT-5.4, O3, O1 & GPT-4o Cost Per Token
TLDL tldl.io accessed March 10, 2026

Sources were retrieved from official documentation when available. Prices, message limits, and feature lists change — verify against the linked source for production decisions.