Create a response with the OpenAI-compatible Responses API
/v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:
X-Provider header or save preferences to choose a provider. If you are on a subscription and want provider selection for a subscription-included model, force paid routing with the pay-as-you-go billing override (billing_mode: "paygo" or X-Billing-Mode: paygo). See Provider Selection and Pay-As-You-Go Billing Override.POST /v1/responses - Create a new response from the modelGET /v1/responses - Returns endpoint informationGET /v1/responses/{id} - Retrieve a stored response by IDDELETE /v1/responses/{id} - Delete a stored response (soft delete)| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model to use (e.g., openai/gpt-5.2, anthropic/claude-opus-4.5) |
input | string or array | Yes | The input prompt or array of input items |
instructions | string | No | System instructions for the model |
max_output_tokens | integer | No | Maximum tokens in the response (minimum: 16) |
max_tool_calls | integer | No | Maximum number of tool calls allowed |
temperature | number | No | Sampling temperature (0-2). Not supported by reasoning-capable models |
top_p | number | No | Nucleus sampling parameter. Not supported by reasoning-capable models |
presence_penalty | number | No | Presence penalty for sampling (-2.0 to 2.0) |
frequency_penalty | number | No | Frequency penalty for sampling (-2.0 to 2.0) |
top_logprobs | integer | No | Number of top logprobs to return (0-20) |
tools | array | No | Array of tools available to the model |
tool_choice | string or object | No | Tool use: auto, none, required, { type: "function", name: "..." }, or { type: "allowed_tools", ... } |
parallel_tool_calls | boolean | No | Allow multiple tool calls in parallel |
stream | boolean | No | Enable streaming responses (default: false) |
stream_options | object | No | Streaming options: { include_obfuscation?: boolean } |
store | boolean | No | Store response for later retrieval (default: true) |
previous_response_id | string | No | Link to previous response for conversation threading |
reasoning | object | No | Reasoning configuration for reasoning-capable models |
text | object | No | Text output configuration (format + verbosity) |
metadata | object | No | Custom metadata (max 16 keys, 64 char keys, 512 char values) |
truncation | string | No | Truncation strategy: auto or disabled |
user | string | No | Unique user identifier |
seed | integer | No | Random seed for reproducibility |
conversation | object | No | Conversation context: { id?: string, messages?: InputItem[] } |
include | string[] | No | Additional fields to include in response |
safety_identifier | string | No | Safety tracking identifier |
prompt_cache_key | string | No | Key for prompt caching |
background | boolean | No | Enable background/async processing |
service_tier | string | No | Service tier. Use "priority" where supported. See Service tiers (priority) near the end. |
input parameter accepts either a simple string or an array of input items.
| Type | Description |
|---|---|
message | A message with role and content |
function_call | A tool/function call made by the model |
function_call_output | The result of a tool/function call |
user, assistant, system, developer
Content can be a string or an array of content parts:
| Type | Description |
|---|---|
input_text | Text input |
input_image | Image input (via URL or file_id) |
input_file | File input |
output_text | Text output (includes annotations/logprobs) |
refusal | Model refusal |
detail parameter can be: auto, low, or high.
allowed_tools to restrict which tools the model may choose from:
| Parameter | Values | Description |
|---|---|---|
effort | low, medium, high | How much effort the model puts into reasoning |
summary | none, auto, detailed, concise | Reasoning summary format |
{ "type": "text" } - Plain text (default){ "type": "json_object" } - JSON object output{ "type": "json_schema", "json_schema": { ... } } - Structured JSON with schemalow - Short, compact responsesmedium - Balanced detailhigh - Most detailed output| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier (format: resp_*) |
object | string | Always "response" |
created_at | integer | Unix timestamp of creation |
completed_at | integer or null | Unix timestamp when response completed |
model | string | Model used for the response |
status | string | Response status |
instructions | string or null | System instructions used |
previous_response_id | string or null | ID of previous response in conversation |
tools | array | Tools available (normalized with nullable fields) |
tool_choice | string or object | Tool choice setting used |
parallel_tool_calls | boolean | Whether parallel tool calls were enabled |
truncation | string | Truncation strategy: auto or disabled |
text | object | Resolved text configuration |
reasoning | object or null | Reasoning configuration |
temperature | number | Temperature used |
top_p | number | Top-p value used |
presence_penalty | number | Presence penalty used |
frequency_penalty | number | Frequency penalty used |
top_logprobs | number | Top logprobs setting |
max_output_tokens | integer or null | Max output tokens setting |
max_tool_calls | integer or null | Max tool calls setting |
user | string or null | User identifier |
store | boolean | Whether response was stored |
background | boolean | Whether processed in background |
safety_identifier | string or null | Safety identifier |
prompt_cache_key | string or null | Prompt cache key |
output | array | Array of output items |
output_text | string | Convenience field with concatenated text output |
usage | object | Token usage statistics |
error | object | Error details (if status is failed) |
incomplete_details | object | Details if status is incomplete |
metadata | object | Custom metadata (if provided) |
service_tier | string | Service tier used (echoed when provided) |
usage object always includes token details:
| Status | Description |
|---|---|
queued | Background request is queued |
in_progress | Request is being processed |
completed | Request completed successfully |
incomplete | Response was truncated |
failed | Request failed with error |
cancelled | Request was cancelled |
reasoning Response Fieldtext Response Field (Resolved)status field.
| Status | Description |
|---|---|
completed | Item finished successfully |
in_progress | Item still being generated |
incomplete | Item was truncated/interrupted |
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Processing started |
response.output_item.added | New output item started |
response.output_item.done | Output item completed |
response.content_part.added | Content part started |
response.content_part.done | Content part completed |
response.output_text.delta | Incremental text chunk |
response.output_text.done | Text content completed |
response.reasoning.delta | Incremental reasoning text |
response.reasoning.done | Reasoning content completed |
response.function_call_arguments.delta | Incremental function arguments |
response.function_call_arguments.done | Function call completed |
response.completed | Response completed successfully |
response.incomplete | Response truncated |
response.failed | Response failed |
item_id for the parent output item.logprobs.response.output_text.delta:
previous_response_id or the conversation object (id or messages) to manage context.
id: "resp_abc123"
previous_response_id requires authentication and store: true (default) on previous responses.
status is completed, failed, or incomplete.
Constraints:
stream: true404 - Response not found or belongs to different account401 - Authentication required/invalid| HTTP Status | Description |
|---|---|
400 | Invalid request parameters |
401 | Missing or invalid API key |
403 | Insufficient permissions |
404 | Resource not found |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Service unavailable |
| Code | Description |
|---|---|
missing_required_parameter | Required parameter not provided |
model_not_found | Specified model does not exist |
response_not_found | Response ID not found |
invalid_response_id | Invalid response ID format |
authentication_required | No API key provided |
invalid_api_key | API key is invalid or inactive |
/v1/chat/completions for these models.service_tier: "priority" to request priority processing on providers that support service tiers.
Behavior notes:
X-Provider) and explicit provider selection are honored for pricing and x402 estimates.| Header | Description |
|---|---|
X-Request-ID | Unique request/response identifier |
Content-Type | application/json or text/event-stream |
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Optional provider override for pay-as-you-go requests on supported open-source models (case-insensitive). Subscription requests ignore this header.
Optional billing override to force pay-as-you-go (e.g., paygo). Header name is case-insensitive.
Parameters for the response request
Model ID to use for the response
Prompt string or array of input items
Billing override to force pay-as-you-go. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.
Alias for billing_mode.
System instructions for the model
Maximum tokens in the response
x >= 16Sampling temperature (not supported by reasoning models)
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Function tools available to the model
How the model should use tools
Allow multiple tool calls in parallel
Enable streaming responses
Store response for later retrieval
Link to previous response for conversation threading
Reasoning configuration for reasoning-capable models
Text/format configuration
Custom metadata
Truncation strategy
auto, disabled Unique user identifier
Random seed for reproducibility
Enable background/async processing
Optional service tier. Set to "priority" to request priority processing when supported by the routed provider
auto, default, flex, priority Response created
Response object returned by the Responses API