Responses
Create a response with the OpenAI-compatible Responses API
Overview
The/v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:
- Stateless and stateful (conversation threading) chat completions
- Streaming responses via Server-Sent Events (SSE)
- Background (async) processing for long-running requests
- Response storage and retrieval
- Function/tool calling support
- Multimodal inputs (images, files) for supported models
POST /api/v1/responses requests can be quoted without an account or API key on supported deployments. Streaming and background Responses have implementation coverage but are not part of the stable public accountless contract. See Accountless x402 API Payments and GET /api/v1/x402/endpoints.X-Provider explicitly selects a provider for the request and is always billed pay-as-you-go at the selected provider’s price, including provider-selection markup. For provider-selection-capable models, model may include routing preference suffixes such as :fast (alias for :speed) and :cheap (alias for :price). These are billed like explicit provider selection and follow the same conflict rules. For subscription users, sending X-Provider bypasses subscription coverage for that request; X-Billing-Mode: paygo is only needed when forcing pay-as-you-go without an explicit provider or when saved provider preferences should apply to subscription-included traffic. See Provider Selection, Model Suffixes, and Pay-As-You-Go Billing Override.Authentication
Use an API key for normal authenticated billing:x-team-id to choose team context when team defaults are evaluated (for example, retention defaults).
For supported accountless x402 requests, omit Authorization and x-api-key to receive a payment quote.
Endpoints
POST /v1/responses- Create a new response from the modelGET /v1/responses- Returns endpoint informationGET /v1/responses/{id}- Retrieve a stored response by IDDELETE /v1/responses/{id}- Delete a stored response (soft delete)
BYOK Encryption (Stored Responses)
If you setstore: true, you can optionally encrypt the stored response at rest using your own key or passphrase.
To encrypt a stored response, include one of these headers on POST /v1/responses:
x-encryption-key: YOUR_ENCRYPTION_KEYx-encryption-passphrase: YOUR_PASSPHRASE
Create Response
Request
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID to use for the response. Provider-selection-capable models may include routing preference suffixes such as :fast, :speed, :cheap, :price, :latency, :throughput, :floor, :tools, :caching, :cache, or :cached. |
input | string or array | Yes | The input prompt or array of input items |
instructions | string | No | System instructions for the model |
max_output_tokens | integer | No | Maximum tokens in the response (minimum: 16) |
max_tool_calls | integer | No | Maximum number of tool calls allowed |
temperature | number | No | Sampling temperature (0-2). If omitted, NanoGPT does not force a value and the routed provider/model default applies (OpenAI defaults to 1.0). Not supported by reasoning-capable models |
top_p | number | No | Nucleus sampling parameter. Not supported by reasoning-capable models |
presence_penalty | number | No | Presence penalty for sampling (-2.0 to 2.0) |
frequency_penalty | number | No | Frequency penalty for sampling (-2.0 to 2.0) |
top_logprobs | integer | No | Number of top logprobs to return (0-20) |
tools | array | No | Array of tools available to the model |
tool_choice | string or object | No | Tool use: auto, none, required, { type: "function", name: "..." }, or { type: "allowed_tools", ... } |
parallel_tool_calls | boolean | No | Allow multiple tool calls in parallel |
stream | boolean | No | Enable streaming responses (default: false) |
stream_options | object | No | Streaming options: { include_obfuscation?: boolean } |
store | boolean | No | Store response for later retrieval (default: false) |
retention_days | integer or null | No | Per-request retention override in days (0..365). null means no request-level override |
retentionDays | integer or null | No | Alias for retention_days. If both are sent, values must match |
previous_response_id | string | No | Link to previous response for conversation threading |
reasoning | object | No | Reasoning configuration. Setting reasoning.effort to any non-none value explicitly requests reasoning mode. |
text | object | No | Text output configuration (format + verbosity) |
metadata | object | No | Custom metadata (max 16 keys, 64 char keys, 512 char values) |
truncation | string | No | Truncation strategy: auto or disabled |
user | string | No | Unique user identifier |
seed | integer | No | Random seed for reproducibility |
conversation | object | No | Conversation context: { id?: string, messages?: InputItem[] } |
include | string[] | No | Additional fields to include in response |
safety_identifier | string | No | Safety tracking identifier |
prompt_cache_key | string | No | Key for prompt caching |
background | boolean | No | Enable background/async processing |
service_tier | string | No | Service tier: "auto", "default", "flex", or "priority". See Service tiers (flex and priority) near the end. |
Retention Resolution
Effective retention for/v1/responses resolves in this order:
- Request override (
retention_days/retentionDays) - Team setting (
responses_retention_days) - User setting (
responsesRetentionDays) - Platform default (
7days)
retention_daysandretentionDaysaccept integer values0..365, ornull.nullmeans “no request override” and falls back to team/user/platform defaults.- If both request fields are provided, they must match.
- Invalid retention values return
400withinvalid_request_error. 0enables zero-retention behavior for that request.- Existing clients that omit retention fields keep default behavior (team/user/platform retention resolution).
0:
previous_response_idis rejected.backgroundis rejected.
- If
x-team-idis present and the caller is a member, that team is used. - Otherwise, the API uses the caller session’s default team (
default_team_uuid/default_team_id) when membership is valid.
Input Types
Theinput parameter accepts either a simple string or an array of input items.
Simple String Input
Array Input
Input Item Types
| Type | Description |
|---|---|
message | A message with role and content |
function_call | A tool/function call made by the model |
function_call_output | The result of a tool/function call |
Message Item
user, assistant, system, developer
Content can be a string or an array of content parts:
Content Part Types
| Type | Description |
|---|---|
input_text | Text input |
input_image | Image input (via URL or file_id) |
input_file | File input |
output_text | Text output (includes annotations/logprobs) |
refusal | Model refusal |
Image Input
detail parameter can be: auto, low, or high.
Function Call Item
Function Call Output Item
Tools
Provide function tools and built-in tools the model can use:Function Tool
Define functions that the model can call:Web Search Tool
File Search Tool
Code Interpreter Tool
MCP Tool
Image Generation Tool
Tool Choice
Useallowed_tools to restrict which tools the model may choose from:
Function Tool Normalization
Function tools in responses always include nullable fields:Reasoning Configuration
Usereasoning to control depth and visibility of reasoning output:
| Parameter | Values | Description |
|---|---|---|
effort | none, minimal, low, medium, high, xhigh | Reasoning depth. Any value other than none explicitly requests reasoning mode. |
summary | none, auto, detailed, concise | Reasoning summary format |
exclude | true, false | Controls output visibility (hides reasoning fields/blocks). It does not inherently disable reasoning compute. |
Text/Format Configuration
Control response format and verbosity:Text Parameter Structure
Format Types
{ "type": "text" }- Plain text (default){ "type": "json_object" }- JSON object output{ "type": "json_schema", "json_schema": { ... } }- Structured JSON with schema
Verbosity Values
low- Short, compact responsesmedium- Balanced detailhigh- Most detailed output
JSON Schema Format
Response Format
Successful Response
Response Fields
All fields below are always present; nullable values indicate an option was not set.| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier (format: resp_*) |
object | string | Always "response" |
created_at | integer | Unix timestamp of creation |
completed_at | integer or null | Unix timestamp when response completed |
model | string | Model used for the response |
status | string | Response status |
instructions | string or null | System instructions used |
previous_response_id | string or null | ID of previous response in conversation |
tools | array | Tools available (normalized with nullable fields) |
tool_choice | string or object | Tool choice setting used |
parallel_tool_calls | boolean | Whether parallel tool calls were enabled |
truncation | string | Truncation strategy: auto or disabled |
text | object | Resolved text configuration |
reasoning | object or null | Reasoning configuration |
temperature | number | Temperature used |
top_p | number | Top-p value used |
presence_penalty | number | Presence penalty used |
frequency_penalty | number | Frequency penalty used |
top_logprobs | number | Top logprobs setting |
max_output_tokens | integer or null | Max output tokens setting |
max_tool_calls | integer or null | Max tool calls setting |
user | string or null | User identifier |
store | boolean | Whether response was stored |
background | boolean | Whether processed in background |
safety_identifier | string or null | Safety identifier |
prompt_cache_key | string or null | Prompt cache key |
output | array | Array of output items |
output_text | string | Convenience field with concatenated text output |
usage | object | Token usage statistics |
error | object | Error details (if status is failed) |
incomplete_details | object | Details if status is incomplete |
metadata | object | Custom metadata (if provided) |
service_tier | string | Service tier used (echoed when provided) |
Usage Object
Theusage object always includes token details:
Response Status Values
| Status | Description |
|---|---|
queued | Background request is queued |
in_progress | Request is being processed |
completed | Request completed successfully |
incomplete | Response was truncated |
failed | Request failed with error |
cancelled | Request was cancelled |
reasoning Response Field
text Response Field (Resolved)
Output Item Types
All output items include astatus field.
Message Output
Function Call Output
Reasoning Output (reasoning-capable models)
Web Search Call Output
Image Generation Call Output
Computer Call Output
Output Item Status Values
| Status | Description |
|---|---|
completed | Item finished successfully |
in_progress | Item still being generated |
incomplete | Item was truncated/interrupted |
Output Text Parts
Output text parts include annotations and logprobs:Annotation Types
URL Citation
File Citation
File Path
Streaming
See also: Streaming Protocol (SSE). Enable streaming to receive incremental response updates:Streaming Response
The response is delivered as Server-Sent Events (SSE):Streaming Event Types
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Processing started |
response.output_item.added | New output item started |
response.output_item.done | Output item completed |
response.content_part.added | Content part started |
response.content_part.done | Content part completed |
response.output_text.delta | Incremental text chunk |
response.output_text.done | Text content completed |
response.reasoning.delta | Incremental reasoning text |
response.reasoning.done | Reasoning content completed |
response.function_call_arguments.delta | Incremental function arguments |
response.function_call_arguments.done | Function call completed |
response.completed | Response completed successfully |
response.incomplete | Response truncated |
response.failed | Response failed |
Updated Event Fields
- All content/output events include
item_idfor the parent output item. - Text delta/done events include
logprobs.
response.output_text.delta:
Conversation Threading
Chain responses together for multi-turn conversations. You can useprevious_response_id or the conversation object (id or messages) to manage context.
First Request
id: "resp_abc123"
Follow-up Request
previous_response_id requires authentication, store: true on previous responses, and effective retention greater than 0.
Background Mode
For long-running requests, use background mode to receive an immediate response and poll for results.Initiate Background Request
Immediate Response (202 Accepted)
Poll for Completion
status is completed, failed, or incomplete.
Constraints:
- Cannot be combined with
stream: true - Requires authentication
- Effective retention must be greater than
0 - Maximum processing time: approximately 800 seconds
Retrieve Response
Response
Returns the full response object (same format as POST response).Errors
404- Response not found or belongs to different account401- Authentication required/invalid
Delete Response
Response
Error Handling
Error Response Format
HTTP Status Codes
| HTTP Status | Description |
|---|---|
400 | Invalid request parameters |
401 | Missing or invalid API key |
403 | Insufficient permissions |
404 | Resource not found |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Service unavailable |
Common Error Codes
| Code | Description |
|---|---|
missing_required_parameter | Required parameter not provided |
model_not_found | Specified model does not exist |
response_not_found | Response ID not found |
invalid_response_id | Invalid response ID format |
invalid_request_error | Invalid request shape/value (for example retention out of range or mismatched alias fields) |
authentication_required | No API key provided |
invalid_api_key | API key is invalid or inactive |
Complete Examples
Simple Text Completion
Multi-turn Conversation
Streaming Response
Per-request Retention Override
Function Calling
Submitting Tool Results
Image Input (Vision)
JSON Output
Background Processing
Limitations
- Deep research models: Deep research variants are not supported.
- GPU-TEE streaming: Streaming is not supported for GPU-TEE models. Use
/v1/chat/completionsfor these models. - Background mode: Maximum duration is approximately 800 seconds.
- Metadata limits: Maximum 16 keys, 64 character key names, 512 character values.
Service tiers (flex and priority)
Setservice_tier to request a non-default capacity tier on providers that support service tiers:
autoor omitted: use NanoGPT’s normal routing and the provider default.default: request the provider’s standard tier where the provider accepts an explicit default value.flex: request lower-cost, variable-capacity processing where supported.priority: request higher-cost priority processing where supported.
- Service tier availability is model- and provider-specific. Model pages show which tiers are supported.
- Flex and priority tiers are only applied when the routed provider supports them.
- Header provider overrides (like
X-Provider) and explicit provider selection are honored for pricing and x402 estimates. - Provider-native web search can force routing; tier pricing follows that routing.
- If you explicitly force a provider that does not support service tiers, the requested tier may be ignored by the upstream provider, or routing and pricing may differ from the default route.
- Flex tier billing uses flex pricing where applicable.
- Priority tier billing uses priority pricing where applicable.
- High-context pricing may also apply for models and providers with separate high-context SKUs, such as
es2kpricing for GPT-5.5/GPT-5.4 where available.
Example: flex tier
Example: priority tier
Response Headers
All responses include:| Header | Description |
|---|---|
X-Request-ID | Unique request/response identifier |
Content-Type | application/json or text/event-stream |
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Headers
Optional explicit provider override for supported open-source models (case-insensitive). Explicit provider selection is billed pay-as-you-go at the selected provider's price, including provider-selection markup; for subscription users it bypasses subscription coverage for that request.
Optional billing override to force pay-as-you-go without an explicit provider, or to apply saved provider preferences to subscription-included traffic (e.g., paygo). Header name is case-insensitive.
Optional team context override for API-key requests. If provided, it must reference a team the caller belongs to.
Body
Parameters for the response request
Model ID to use for the response. Provider-selection-capable models may include routing preference suffixes such as ':fast', ':speed', ':cheap', ':price', ':latency', ':throughput', ':floor', ':tools', ':caching', ':cache', or ':cached'.
Prompt string or array of input items
Billing override to force pay-as-you-go without an explicit provider, or to apply saved provider preferences to subscription-included traffic. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.
Alias for billing_mode.
System instructions for the model
Maximum tokens in the response
x >= 16Sampling temperature (not supported by reasoning models)
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Function tools available to the model
How the model should use tools
Allow multiple tool calls in parallel
Enable streaming responses
Store response for later retrieval
Per-request retention override in days. Use null to disable request-level override.
0 <= x <= 365Alias for retention_days. If both are provided, values must match.
0 <= x <= 365Link to previous response for conversation threading
Reasoning configuration. Setting reasoning.effort to any non-none value explicitly requests reasoning mode.
Text/format configuration
Custom metadata
Truncation strategy
auto, disabled Unique user identifier
Random seed for reproducibility
Enable background/async processing
Optional service tier: "auto", "default", "flex", or "priority". Use "flex" for lower-cost variable-capacity processing or "priority" for higher-cost priority processing where supported by the routed model/provider.
auto, default, flex, priority Response
Response created
Response object returned by the Responses API