Create a response with the OpenAI-compatible Responses API
Documentation Index
Fetch the complete documentation index at: https://docs.nano-gpt.com/llms.txt
Use this file to discover all available pages before exploring further.
/v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:
X-X402: true header. See X-402 Micropayments for details.X-Provider explicitly selects a provider for the request and is always billed pay-as-you-go at the selected provider’s price, including provider-selection markup. For provider-selection-capable models, model may include routing preference suffixes such as :fast (alias for :speed) and :cheap (alias for :price). These are billed like explicit provider selection and follow the same conflict rules. For subscription users, sending X-Provider bypasses subscription coverage for that request; X-Billing-Mode: paygo is only needed when forcing pay-as-you-go without an explicit provider or when saved provider preferences should apply to subscription-included traffic. See Provider Selection, Model Suffixes, and Pay-As-You-Go Billing Override.x-team-id to choose team context when team defaults are evaluated (for example, retention defaults).
POST /v1/responses - Create a new response from the modelGET /v1/responses - Returns endpoint informationGET /v1/responses/{id} - Retrieve a stored response by IDDELETE /v1/responses/{id} - Delete a stored response (soft delete)store: true, you can optionally encrypt the stored response at rest using your own key or passphrase.
To encrypt a stored response, include one of these headers on POST /v1/responses:
x-encryption-key: YOUR_ENCRYPTION_KEYx-encryption-passphrase: YOUR_PASSPHRASE| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID to use for the response. Provider-selection-capable models may include routing preference suffixes such as :fast, :speed, :cheap, :price, :latency, :throughput, :floor, or :tools. |
input | string or array | Yes | The input prompt or array of input items |
instructions | string | No | System instructions for the model |
max_output_tokens | integer | No | Maximum tokens in the response (minimum: 16) |
max_tool_calls | integer | No | Maximum number of tool calls allowed |
temperature | number | No | Sampling temperature (0-2). If omitted, NanoGPT does not force a value and the routed provider/model default applies (OpenAI defaults to 1.0). Not supported by reasoning-capable models |
top_p | number | No | Nucleus sampling parameter. Not supported by reasoning-capable models |
presence_penalty | number | No | Presence penalty for sampling (-2.0 to 2.0) |
frequency_penalty | number | No | Frequency penalty for sampling (-2.0 to 2.0) |
top_logprobs | integer | No | Number of top logprobs to return (0-20) |
tools | array | No | Array of tools available to the model |
tool_choice | string or object | No | Tool use: auto, none, required, { type: "function", name: "..." }, or { type: "allowed_tools", ... } |
parallel_tool_calls | boolean | No | Allow multiple tool calls in parallel |
stream | boolean | No | Enable streaming responses (default: false) |
stream_options | object | No | Streaming options: { include_obfuscation?: boolean } |
store | boolean | No | Store response for later retrieval (default: false) |
retention_days | integer or null | No | Per-request retention override in days (0..365). null means no request-level override |
retentionDays | integer or null | No | Alias for retention_days. If both are sent, values must match |
previous_response_id | string | No | Link to previous response for conversation threading |
reasoning | object | No | Reasoning configuration. Setting reasoning.effort to any non-none value explicitly requests reasoning mode. |
text | object | No | Text output configuration (format + verbosity) |
metadata | object | No | Custom metadata (max 16 keys, 64 char keys, 512 char values) |
truncation | string | No | Truncation strategy: auto or disabled |
user | string | No | Unique user identifier |
seed | integer | No | Random seed for reproducibility |
conversation | object | No | Conversation context: { id?: string, messages?: InputItem[] } |
include | string[] | No | Additional fields to include in response |
safety_identifier | string | No | Safety tracking identifier |
prompt_cache_key | string | No | Key for prompt caching |
background | boolean | No | Enable background/async processing |
service_tier | string | No | Service tier: "auto", "default", "flex", or "priority". See Service tiers (flex and priority) near the end. |
/v1/responses resolves in this order:
retention_days / retentionDays)responses_retention_days)responsesRetentionDays)7 days)retention_days and retentionDays accept integer values 0..365, or null.null means “no request override” and falls back to team/user/platform defaults.400 with invalid_request_error.0 enables zero-retention behavior for that request.0:
previous_response_id is rejected.background is rejected.x-team-id is present and the caller is a member, that team is used.default_team_uuid / default_team_id) when membership is valid.input parameter accepts either a simple string or an array of input items.
| Type | Description |
|---|---|
message | A message with role and content |
function_call | A tool/function call made by the model |
function_call_output | The result of a tool/function call |
user, assistant, system, developer
Content can be a string or an array of content parts:
| Type | Description |
|---|---|
input_text | Text input |
input_image | Image input (via URL or file_id) |
input_file | File input |
output_text | Text output (includes annotations/logprobs) |
refusal | Model refusal |
detail parameter can be: auto, low, or high.
allowed_tools to restrict which tools the model may choose from:
reasoning to control depth and visibility of reasoning output:
| Parameter | Values | Description |
|---|---|---|
effort | none, minimal, low, medium, high, xhigh | Reasoning depth. Any value other than none explicitly requests reasoning mode. |
summary | none, auto, detailed, concise | Reasoning summary format |
exclude | true, false | Controls output visibility (hides reasoning fields/blocks). It does not inherently disable reasoning compute. |
{ "type": "text" } - Plain text (default){ "type": "json_object" } - JSON object output{ "type": "json_schema", "json_schema": { ... } } - Structured JSON with schemalow - Short, compact responsesmedium - Balanced detailhigh - Most detailed output| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier (format: resp_*) |
object | string | Always "response" |
created_at | integer | Unix timestamp of creation |
completed_at | integer or null | Unix timestamp when response completed |
model | string | Model used for the response |
status | string | Response status |
instructions | string or null | System instructions used |
previous_response_id | string or null | ID of previous response in conversation |
tools | array | Tools available (normalized with nullable fields) |
tool_choice | string or object | Tool choice setting used |
parallel_tool_calls | boolean | Whether parallel tool calls were enabled |
truncation | string | Truncation strategy: auto or disabled |
text | object | Resolved text configuration |
reasoning | object or null | Reasoning configuration |
temperature | number | Temperature used |
top_p | number | Top-p value used |
presence_penalty | number | Presence penalty used |
frequency_penalty | number | Frequency penalty used |
top_logprobs | number | Top logprobs setting |
max_output_tokens | integer or null | Max output tokens setting |
max_tool_calls | integer or null | Max tool calls setting |
user | string or null | User identifier |
store | boolean | Whether response was stored |
background | boolean | Whether processed in background |
safety_identifier | string or null | Safety identifier |
prompt_cache_key | string or null | Prompt cache key |
output | array | Array of output items |
output_text | string | Convenience field with concatenated text output |
usage | object | Token usage statistics |
error | object | Error details (if status is failed) |
incomplete_details | object | Details if status is incomplete |
metadata | object | Custom metadata (if provided) |
service_tier | string | Service tier used (echoed when provided) |
usage object always includes token details:
| Status | Description |
|---|---|
queued | Background request is queued |
in_progress | Request is being processed |
completed | Request completed successfully |
incomplete | Response was truncated |
failed | Request failed with error |
cancelled | Request was cancelled |
reasoning Response Fieldtext Response Field (Resolved)status field.
| Status | Description |
|---|---|
completed | Item finished successfully |
in_progress | Item still being generated |
incomplete | Item was truncated/interrupted |
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Processing started |
response.output_item.added | New output item started |
response.output_item.done | Output item completed |
response.content_part.added | Content part started |
response.content_part.done | Content part completed |
response.output_text.delta | Incremental text chunk |
response.output_text.done | Text content completed |
response.reasoning.delta | Incremental reasoning text |
response.reasoning.done | Reasoning content completed |
response.function_call_arguments.delta | Incremental function arguments |
response.function_call_arguments.done | Function call completed |
response.completed | Response completed successfully |
response.incomplete | Response truncated |
response.failed | Response failed |
item_id for the parent output item.logprobs.response.output_text.delta:
previous_response_id or the conversation object (id or messages) to manage context.
id: "resp_abc123"
previous_response_id requires authentication, store: true on previous responses, and effective retention greater than 0.
status is completed, failed, or incomplete.
Constraints:
stream: true0404 - Response not found or belongs to different account401 - Authentication required/invalid| HTTP Status | Description |
|---|---|
400 | Invalid request parameters |
401 | Missing or invalid API key |
403 | Insufficient permissions |
404 | Resource not found |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Service unavailable |
| Code | Description |
|---|---|
missing_required_parameter | Required parameter not provided |
model_not_found | Specified model does not exist |
response_not_found | Response ID not found |
invalid_response_id | Invalid response ID format |
invalid_request_error | Invalid request shape/value (for example retention out of range or mismatched alias fields) |
authentication_required | No API key provided |
invalid_api_key | API key is invalid or inactive |
/v1/chat/completions for these models.service_tier to request a non-default capacity tier on providers that support service tiers:
auto or omitted: use NanoGPT’s normal routing and the provider default.default: request the provider’s standard tier where the provider accepts an explicit default value.flex: request lower-cost, variable-capacity processing where supported.priority: request higher-cost priority processing where supported.X-Provider) and explicit provider selection are honored for pricing and x402 estimates.es2k pricing for GPT-5.5/GPT-5.4 where available.| Header | Description |
|---|---|
X-Request-ID | Unique request/response identifier |
Content-Type | application/json or text/event-stream |
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Optional explicit provider override for supported open-source models (case-insensitive). Explicit provider selection is billed pay-as-you-go at the selected provider's price, including provider-selection markup; for subscription users it bypasses subscription coverage for that request.
Optional billing override to force pay-as-you-go without an explicit provider, or to apply saved provider preferences to subscription-included traffic (e.g., paygo). Header name is case-insensitive.
Optional team context override for API-key requests. If provided, it must reference a team the caller belongs to.
Parameters for the response request
Model ID to use for the response. Provider-selection-capable models may include routing preference suffixes such as ':fast', ':speed', ':cheap', ':price', ':latency', ':throughput', ':floor', or ':tools'.
Prompt string or array of input items
Billing override to force pay-as-you-go without an explicit provider, or to apply saved provider preferences to subscription-included traffic. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.
Alias for billing_mode.
System instructions for the model
Maximum tokens in the response
x >= 16Sampling temperature (not supported by reasoning models)
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Function tools available to the model
How the model should use tools
Allow multiple tool calls in parallel
Enable streaming responses
Store response for later retrieval
Per-request retention override in days. Use null to disable request-level override.
0 <= x <= 365Alias for retention_days. If both are provided, values must match.
0 <= x <= 365Link to previous response for conversation threading
Reasoning configuration. Setting reasoning.effort to any non-none value explicitly requests reasoning mode.
Text/format configuration
Custom metadata
Truncation strategy
auto, disabled Unique user identifier
Random seed for reproducibility
Enable background/async processing
Optional service tier: "auto", "default", "flex", or "priority". Use "flex" for lower-cost variable-capacity processing or "priority" for higher-cost priority processing where supported by the routed model/provider.
auto, default, flex, priority Response created
Response object returned by the Responses API