API

OpenAI Responses API: Documentation and Examples

API request card connected to lanes labeled INPUT, MODEL, TOOLS, STATE, and OUTPUT.

The OpenAI Responses API is the main OpenAI endpoint for generating model outputs, building multi-turn workflows, and giving models access to hosted tools such as web search, file search, Code Interpreter, image generation, computer use, and remote MCP servers. It replaces much of the orchestration that developers previously split across Chat Completions, Assistants, tool-calling loops, and separate retrieval layers. For a new API integration in 2026, start with the Responses API unless you have a specific reason to keep an older Chat Completions implementation. This guide explains the endpoint shape, shows practical examples, covers structured outputs and tools, and highlights pricing and migration details you should know before shipping.

What the Responses API is

The Responses API is OpenAI’s unified interface for model responses. OpenAI describes it as an advanced interface that supports text and image inputs, text outputs, stateful interactions, hosted tools, and function calling.[1] It is not only a text-completion endpoint. It is the API surface where OpenAI is putting newer agent-oriented features.

OpenAI introduced the Responses API on March 11, 2025 as part of its agent-building tools. The launch positioned it as a combination of Chat Completions simplicity and Assistants-style tool use.[7] OpenAI later added support for remote MCP servers, image generation, Code Interpreter, file search improvements, background mode, reasoning summaries, and encrypted reasoning items.[8]

Use it when you want one API call pattern for normal text generation, JSON output, image inputs, retrieval, web search, custom functions, or agent-like workflows. If you are still deciding which endpoint to build on, this article pairs well with our OpenAI API best practices for production and our broader OpenAI API pricing guide.

Pipeline with INPUT card, MODEL hexagon, tool nodes WEB, FILES, CODE, and OUTPUT card.

Endpoint and request shape

The core endpoint is POST /v1/responses. A basic request includes a model and an input. You can also pass top-level instructions, tool definitions, structured output settings, conversation state, streaming settings, and reasoning options.[1]

The most important conceptual change from Chat Completions is that Responses uses a broader item model. Inputs and outputs can include messages, tool calls, tool results, reasoning items, image inputs, file inputs, and other typed items. The API reference lists response items such as reasoning items, image generation calls, Code Interpreter calls, web search calls, and function call outputs.[1]

NeedResponses API field or patternWhy it matters
Plain text answermodel + inputSmallest useful request shape.
System-style behaviorinstructionsKeeps developer instructions separate from user input.
Tool usetools and tool_choiceLets the model call hosted tools or your functions.
Structured JSONtext.formatConstrains the model’s final answer to a schema.
Follow-up turnprevious_response_id or managed conversation stateAvoids rebuilding full message arrays for simple threads.
Long-running taskbackgroundRuns work asynchronously when latency may be high.

For most applications, the output you display is response.output_text. OpenAI highlighted that helper at launch as a simpler way to access the model’s text output.[7] Keep the full response object in logs during development, because tool calls and structured outputs are easier to debug when you can inspect the typed output items.

Basic examples

The examples below use the same request idea in JavaScript, Python, and cURL. The model name shown here follows OpenAI’s current documentation examples for the Responses API.[2] Swap in the model your account and use case require, then test cost and latency with real prompts.

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await client.responses.create({
  model: "gpt-5",
  instructions: "You write concise API documentation for developers.",
  input: "Explain what a webhook retry policy is in two paragraphs."
});

console.log(response.output_text);

Python

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    instructions="You write concise API documentation for developers.",
    input="Explain what a webhook retry policy is in two paragraphs."
)

print(response.output_text)

cURL

curl https://api.openai.com/v1/responses 
  -H "Content-Type: application/json" 
  -H "Authorization: Bearer $OPENAI_API_KEY" 
  -d '{
    "model": "gpt-5",
    "instructions": "You write concise API documentation for developers.",
    "input": "Explain what a webhook retry policy is in two paragraphs."
  }'

Do not expose your API key in browser code. Keep calls on your server, pass only the user input your app needs, and store the response ID only if you need follow-up turns, audit logs, or debugging. If you are still setting up credentials, see Free OpenAI API Key: Is It Possible in 2026? and Does ChatGPT Plus Include API Access?.

Three code panels labeled JS, PYTHON, and CURL feeding one OUTPUT response card.

Multi-turn state

Chat Completions normally makes you manage the full message array yourself. OpenAI’s migration documentation contrasts that with Responses, where you can pass outputs from one response into another or use response state to continue a conversation.[2] This matters for chat apps, research workflows, coding agents, and support assistants that need context across turns.

A simple pattern is to store the response ID returned from the first call, then pass it as previous_response_id on the next call. That lets your application continue the interaction without resending the entire conversation every time. For compliance-sensitive systems, evaluate what state you store, what OpenAI stores, and whether you need stateless context management.

const first = await client.responses.create({
  model: "gpt-5",
  input: "Draft a short refund policy for a SaaS product."
});

const followup = await client.responses.create({
  model: "gpt-5",
  previous_response_id: first.id,
  input: "Now make it friendlier and add a 14-day trial note."
});

console.log(followup.output_text);

Keep your own durable record of user-visible messages even if you use response state. Product teams need that record for user history, moderation review, analytics, exports, and support. Response state is useful, but it should not be your only application database.

Tools and agent workflows

The Responses API can use hosted tools and custom tools through the tools parameter. OpenAI’s tools documentation lists function calling, web search, remote MCP servers, Skills, Shell, computer use, image generation, file search, and tool search as available tool categories in the platform.[4] The model can decide whether to use a configured tool based on the prompt unless you constrain that behavior with tool_choice.[4]

For web search, new Responses API integrations should use { "type": "web_search" }. OpenAI says the older web_search_preview tool remains for legacy integrations, but it lacks newer controls such as filters and external web access control.[5] Web search responses can include inline citations and a sources field, which is useful when your interface must show where retrieved information came from.[5]

const response = await client.responses.create({
  model: "gpt-5.5",
  tools: [
    { type: "web_search", search_context_size: "low" }
  ],
  input: "Find one current source about U.S. EV tax credit changes and summarize it."
});

console.log(response.output_text);

File search is best for private or app-specific documents. Upload files, attach them to vector stores, and let the file search tool retrieve relevant chunks. This can replace a custom retrieval pipeline for many support, policy, documentation, and internal knowledge-base applications. If your app depends heavily on image understanding, read our OpenAI Vision API guide as well.

Function calling remains the right choice when the model must call your own code, database, or business logic. Use hosted tools for OpenAI-managed capabilities, and use functions when your application owns the side effect. Our function calling in OpenAI API article covers that pattern in more detail.

Tool matrix tiles labeled WEB, FILES, FUNCTION, CODE, IMAGE, and MCP connected to a model node.

Structured outputs

Structured Outputs let you force the final answer into a JSON Schema. In the Responses API, OpenAI’s migration guide says structured output settings moved from response_format to text.format.[2] OpenAI’s structured output documentation also distinguishes schema adherence from older JSON mode: JSON mode produces valid JSON, while Structured Outputs are designed to match your supplied schema.[6]

const response = await client.responses.create({
  model: "gpt-5",
  input: "Extract the company name, renewal date, and cancellation window from this note: Acme renews on June 30. Cancel at least 30 days before renewal.",
  text: {
    format: {
      type: "json_schema",
      name: "contract_terms",
      strict: true,
      schema: {
        type: "object",
        additionalProperties: false,
        properties: {
          company: { type: "string" },
          renewal_date: { type: "string" },
          cancellation_window_days: { type: "number" }
        },
        required: ["company", "renewal_date", "cancellation_window_days"]
      }
    }
  }
});

console.log(response.output_text);

Use Structured Outputs when your next step is a parser, database write, UI component, workflow engine, or validation rule. Use function calling when the model should choose an action and pass arguments to your code. If you need a deeper treatment of schemas, validation, and failure handling, see Structured Outputs with the OpenAI API.

Keep schemas small at first. Add fields only when your application needs them. The more optional branches you add, the more testing you need across empty inputs, ambiguous inputs, adversarial inputs, and tool-call paths.

Streaming, background mode, and reasoning

Use streaming when users should see output as it arrives. This is usually the right choice for chat interfaces, writing tools, coding assistants, and long answers. For implementation details, see our separate guide to streaming responses with the OpenAI API.

Use background mode when a task may take longer than a normal request-response cycle. OpenAI added background mode to the Responses API to support long-running tasks asynchronously.[8] Good candidates include multi-step research, long document analysis, tool-heavy workflows, and jobs that should keep running even if the user closes the page.

Reasoning models can return reasoning-related items and summaries. The Responses API reference describes reasoning items as summaries of reasoning output and notes that encrypted reasoning content can be returned when reasoning.encrypted_content is included.[1] Treat these as operational artifacts, not user-facing explanations unless your product has a clear reason to display them.

For production systems, separate three things: the visible answer, the hidden workflow state, and the audit trail. Users need the answer. Your app may need state for continuity. Your team needs logs that are useful without storing unnecessary sensitive data.

Pricing and limits

The Responses API is not priced as a separate API line item. OpenAI’s pricing documentation says Responses API, Chat Completions API, Realtime API, Batch API, and Assistants API are not priced separately; tokens are billed at the chosen model’s input and output rates.[3] Hosted tools can add their own charges.

Cost areaCurrent published pricing detailPlanning note
Model tokensBilled at the selected model’s input and output token rates.[3]Measure real prompts and outputs, not toy prompts.
Web search$10.00 per 1,000 calls for web search, plus search content tokens billed at model rates.[3]Cache results when freshness is not required.
Web search preview for non-reasoning models$25.00 per 1,000 calls, with search content tokens free.[3]Use the non-preview tool for new Responses integrations when possible.
File search storage$0.10 per GB per day, with 1 GB free.[3]Delete stale vector stores and uploaded files.
File search tool calls$2.50 per 1,000 calls.[3]Batch document questions when product UX allows it.
Containers for Hosted Shell and Code Interpreter1 GB at $0.03, 4 GB at $0.12, 16 GB at $0.48, and 64 GB at $1.92 per 20-minute session per container.[3]Avoid starting containers for tasks that simple model reasoning can handle.

Web search has specific operational limits. OpenAI’s web search documentation says domain filtering in Responses can configure up to 100 allowed domains or up to 100 blocked domains.[5] It also says Responses API web search uses a 128k search context window, even when the underlying model has a larger context window.[5]

For cost planning, do not look only at model token rates. Add tool calls, retrieved content tokens, file storage, container sessions, retries, and background jobs. Our OpenAI API cost calculator, Batch API guide, and context window comparison can help estimate real workloads.

Cost dashboard with bars labeled WEB $10, FILE $2.50, CODE $0.03, and storage labeled 1 GB FREE.

Migration guidance

OpenAI calls the Responses API a superset of Chat Completions and says Chat Completions will continue to be supported.[2] That means you do not need to migrate every stable workflow immediately. A reasonable approach is to move flows that benefit from hosted tools, reasoning features, structured outputs, or easier state management first.

The Assistants API has a clearer deadline. OpenAI’s migration documentation says the Assistants API was deprecated as of August 26, 2025 and has a sunset date of August 26, 2026.[2] If you still run Assistants-based Threads and Runs, plan a migration path now rather than waiting for the final month.

Existing implementationMove now ifWait if
Chat Completions, plain text onlyYou want one endpoint for future tool use.The workflow is stable, cheap, and well tested.
Chat Completions with manual toolsYou want hosted web search, file search, or cleaner tool orchestration.Your custom orchestrator is core product IP.
Assistants APIYou need to prepare before the August 26, 2026 sunset date.[2]Only short-term maintenance remains and migration risk is higher than sunset risk.
Custom RAG pipelineFile search meets your retrieval quality and governance needs.You need custom indexing, ranking, permissions, or hybrid search.

When migrating, test output shape, tool-call behavior, latency, token usage, and error handling. Do not only compare final answer quality. The surrounding workflow often changes more than the model response itself. If you hit failures, our OpenAI API Errors guide can help decode status codes and retry patterns.

Production checklist

  • Choose the smallest model that meets quality needs. Test with real prompts, not only demos.
  • Set clear instructions. Keep durable behavior in instructions rather than mixing it into user input.
  • Log typed output items during development. Tool calls, reasoning items, and structured outputs are easier to debug with the full object.
  • Validate every structured output. A schema helps, but your application should still handle refusals, empty values, and downstream validation errors.
  • Control tool use. Use tool_choice when a tool must or must not run.
  • Budget for tools. Web search, file search, and containers can change the cost profile of an otherwise cheap model call.
  • Design for retries. Network errors, rate limits, and tool failures should not corrupt user state.
  • Separate product history from API state. Store the user-visible conversation and business records in your own database.
  • Review sensitive data paths. Decide what can be sent to the model, stored in files, logged, or used in tools.

The Responses API is a strong default for new OpenAI applications because it covers text generation, multimodal inputs, structured outputs, tools, and agent workflows through one interface. It does not remove the need for product design, security review, cost controls, and evaluation. It gives you a cleaner base to build those systems.

Frequently asked questions

Is the Responses API replacing Chat Completions?

Not immediately. OpenAI says the Responses API is a superset of Chat Completions and recommends it for newer features, but also says Chat Completions will continue to be supported.[2] For new projects, Responses is usually the safer default.

What endpoint does the Responses API use?

The main endpoint is POST /v1/responses.[1] You send a model, input, and optional settings such as instructions, tools, structured output format, state, or streaming options.

Can the Responses API search the web?

Yes. New Responses API integrations should use the hosted web_search tool, while web_search_preview remains available for legacy integrations.[5] If search must always happen, do not rely on automatic tool choice alone; configure tool choice accordingly.

Does the Responses API support JSON output?

Yes. Use text.format with json_schema for Structured Outputs in the Responses API.[2] Prefer Structured Outputs over older JSON mode when your application needs schema adherence.

Is the Responses API priced separately?

No separate Responses API fee is listed. OpenAI says tokens are billed at the chosen model’s input and output rates, while hosted tools can add tool-specific charges.[3] Always estimate both token usage and tool usage.

Should I migrate from the Assistants API?

Yes, if the application will still be active after the sunset window. OpenAI says the Assistants API was deprecated on August 26, 2025 and has a sunset date of August 26, 2026.[2] Start with one workflow, compare behavior, then migrate the rest in stages.

Editorial independence. chatai.guide is reader-supported and not affiliated with OpenAI. We don’t accept paid placements or sponsored reviews — every recommendation reflects our own testing.