Skip to main content
POST
/
v1
/
responses
cURL
curl --request POST \
  --url https://nano-gpt.com/api/v1/responses \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "billing_mode": "<string>",
  "billingMode": "<string>",
  "instructions": "<string>",
  "max_output_tokens": 17,
  "temperature": 1,
  "top_p": 0.5,
  "tools": [
    {}
  ],
  "tool_choice": "<string>",
  "parallel_tool_calls": true,
  "stream": false,
  "store": true,
  "previous_response_id": "<string>",
  "reasoning": {},
  "text": {},
  "metadata": {},
  "truncation": "auto",
  "user": "<string>",
  "seed": 123,
  "background": true,
  "service_tier": "auto"
}
'
{
  "id": "<string>",
  "object": "<string>",
  "created_at": 123,
  "model": "<string>",
  "status": "queued",
  "output": [
    {}
  ],
  "output_text": "<string>",
  "usage": {},
  "error": {},
  "incomplete_details": {},
  "metadata": {},
  "service_tier": "<string>"
}

Overview

The /v1/responses API is an OpenAI Responses API-compatible endpoint for creating AI model responses. It supports:
  • Stateless and stateful (conversation threading) chat completions
  • Streaming responses via Server-Sent Events (SSE)
  • Background (async) processing for long-running requests
  • Response storage and retrieval
  • Function/tool calling support
  • Multimodal inputs (images, files) for supported models
Provider selection is available for pay-as-you-go requests on supported open-source models. Set the X-Provider header or save preferences to choose a provider. If you are on a subscription and want provider selection for a subscription-included model, force paid routing with the pay-as-you-go billing override (billing_mode: "paygo" or X-Billing-Mode: paygo). See Provider Selection and Pay-As-You-Go Billing Override.

Authentication

All requests require authentication via API key:
Authorization: Bearer YOUR_API_KEY
Or alternatively:
x-api-key: YOUR_API_KEY

Endpoints

  • POST /v1/responses - Create a new response from the model
  • GET /v1/responses - Returns endpoint information
  • GET /v1/responses/{id} - Retrieve a stored response by ID
  • DELETE /v1/responses/{id} - Delete a stored response (soft delete)

Create Response

Request

POST /v1/responses
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

ParameterTypeRequiredDescription
modelstringYesThe model to use (e.g., openai/gpt-5.2, anthropic/claude-opus-4.5)
inputstring or arrayYesThe input prompt or array of input items
instructionsstringNoSystem instructions for the model
max_output_tokensintegerNoMaximum tokens in the response (minimum: 16)
max_tool_callsintegerNoMaximum number of tool calls allowed
temperaturenumberNoSampling temperature (0-2). Not supported by reasoning-capable models
top_pnumberNoNucleus sampling parameter. Not supported by reasoning-capable models
presence_penaltynumberNoPresence penalty for sampling (-2.0 to 2.0)
frequency_penaltynumberNoFrequency penalty for sampling (-2.0 to 2.0)
top_logprobsintegerNoNumber of top logprobs to return (0-20)
toolsarrayNoArray of tools available to the model
tool_choicestring or objectNoTool use: auto, none, required, { type: "function", name: "..." }, or { type: "allowed_tools", ... }
parallel_tool_callsbooleanNoAllow multiple tool calls in parallel
streambooleanNoEnable streaming responses (default: false)
stream_optionsobjectNoStreaming options: { include_obfuscation?: boolean }
storebooleanNoStore response for later retrieval (default: true)
previous_response_idstringNoLink to previous response for conversation threading
reasoningobjectNoReasoning configuration for reasoning-capable models
textobjectNoText output configuration (format + verbosity)
metadataobjectNoCustom metadata (max 16 keys, 64 char keys, 512 char values)
truncationstringNoTruncation strategy: auto or disabled
userstringNoUnique user identifier
seedintegerNoRandom seed for reproducibility
conversationobjectNoConversation context: { id?: string, messages?: InputItem[] }
includestring[]NoAdditional fields to include in response
safety_identifierstringNoSafety tracking identifier
prompt_cache_keystringNoKey for prompt caching
backgroundbooleanNoEnable background/async processing
service_tierstringNoService tier. Use "priority" where supported. See Service tiers (priority) near the end.

Input Types

The input parameter accepts either a simple string or an array of input items.

Simple String Input

{
  "model": "openai/gpt-5.2",
  "input": "What is the capital of France?"
}

Array Input

{
  "model": "openai/gpt-5.2",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}

Input Item Types

TypeDescription
messageA message with role and content
function_callA tool/function call made by the model
function_call_outputThe result of a tool/function call

Message Item

{
  "type": "message",
  "role": "user",
  "content": "Hello, how are you?"
}
Supported roles: user, assistant, system, developer Content can be a string or an array of content parts:
{
  "type": "message",
  "role": "user",
  "content": [
    { "type": "input_text", "text": "What's in this image?" },
    { "type": "input_image", "image_url": "https://example.com/image.jpg" }
  ]
}

Content Part Types

TypeDescription
input_textText input
input_imageImage input (via URL or file_id)
input_fileFile input
output_textText output (includes annotations/logprobs)
refusalModel refusal

Image Input

{
  "type": "input_image",
  "image_url": "https://example.com/image.jpg",
  "detail": "auto"
}
The detail parameter can be: auto, low, or high.

Function Call Item

{
  "type": "function_call",
  "id": "fc_123",
  "call_id": "call_abc123",
  "name": "get_weather",
  "arguments": "{\"location\": \"Paris\"}"
}

Function Call Output Item

{
  "type": "function_call_output",
  "call_id": "call_abc123",
  "output": "{\"temperature\": 22, \"condition\": \"sunny\"}"
}

Tools

Provide function tools and built-in tools the model can use:

Function Tool

Define functions that the model can call:
{
  "model": "openai/gpt-5.2",
  "input": "What's the weather in Paris?",
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name"
          }
        },
        "required": ["location"]
      },
      "strict": false
    }
  ],
  "tool_choice": "auto"
}

Web Search Tool

{
  "type": "web_search_preview",
  "search_context_size": "low",
  "user_location": {
    "type": "approximate",
    "country": "US",
    "city": "San Francisco",
    "region": "California"
  }
}

File Search Tool

{
  "type": "file_search",
  "vector_store_ids": ["vs_..."],
  "max_num_results": 10,
  "ranking_options": {
    "ranker": "auto",
    "score_threshold": 0.5
  }
}

Code Interpreter Tool

{
  "type": "code_interpreter",
  "container": { "type": "auto" }
}

MCP Tool

{
  "type": "mcp",
  "server_label": "my-server",
  "server_url": "https://...",
  "headers": { "Authorization": "Bearer ..." },
  "require_approval": "auto"
}

Image Generation Tool

{
  "type": "image_generation"
}

Tool Choice

Use allowed_tools to restrict which tools the model may choose from:
{
  "tool_choice": {
    "type": "allowed_tools",
    "tools": [{ "type": "function", "name": "get_weather" }],
    "mode": "auto"
  }
}

Function Tool Normalization

Function tools in responses always include nullable fields:
{
  "type": "function",
  "name": "get_weather",
  "description": null,
  "parameters": null,
  "strict": null
}

Reasoning Configuration

For reasoning-capable models:
{
  "model": "anthropic/claude-opus-4.5",
  "input": "Solve this complex problem...",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  }
}
ParameterValuesDescription
effortlow, medium, highHow much effort the model puts into reasoning
summarynone, auto, detailed, conciseReasoning summary format

Text/Format Configuration

Control response format and verbosity:
{
  "model": "openai/gpt-5.2",
  "input": "List 3 colors",
  "text": {
    "format": { "type": "json_object" },
    "verbosity": "medium"
  }
}

Text Parameter Structure

{
  "format": { "type": "text" } | { "type": "json_object" } | { "type": "json_schema", "json_schema": { ... } },
  "verbosity": "low" | "medium" | "high"
}

Format Types

  • { "type": "text" } - Plain text (default)
  • { "type": "json_object" } - JSON object output
  • { "type": "json_schema", "json_schema": { ... } } - Structured JSON with schema

Verbosity Values

  • low - Short, compact responses
  • medium - Balanced detail
  • high - Most detailed output

JSON Schema Format

{
  "text": {
    "format": {
      "type": "json_schema",
      "json_schema": {
        "name": "color_list",
        "schema": {
          "type": "object",
          "properties": {
            "colors": {
              "type": "array",
              "items": { "type": "string" }
            }
          }
        },
        "strict": true
      }
    }
  }
}

Response Format

Successful Response

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1699000000,
  "completed_at": 1699000001,
  "model": "openai/gpt-5.2",
  "status": "completed",
  "instructions": null,
  "previous_response_id": null,
  "tools": [],
  "tool_choice": "auto",
  "parallel_tool_calls": false,
  "truncation": "disabled",
  "text": {
    "format": { "type": "text" },
    "verbosity": "medium"
  },
  "reasoning": null,
  "temperature": 1,
  "top_p": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "top_logprobs": 0,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "user": null,
  "store": true,
  "background": false,
  "safety_identifier": null,
  "prompt_cache_key": null,
  "output": [
    {
      "type": "message",
      "id": "msg_xyz789",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": [],
          "logprobs": []
        }
      ]
    }
  ],
  "output_text": "The capital of France is Paris.",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 10,
    "total_tokens": 25,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  },
  "metadata": {},
  "service_tier": "auto"
}

Response Fields

All fields below are always present; nullable values indicate an option was not set.
FieldTypeDescription
idstringUnique response identifier (format: resp_*)
objectstringAlways "response"
created_atintegerUnix timestamp of creation
completed_atinteger or nullUnix timestamp when response completed
modelstringModel used for the response
statusstringResponse status
instructionsstring or nullSystem instructions used
previous_response_idstring or nullID of previous response in conversation
toolsarrayTools available (normalized with nullable fields)
tool_choicestring or objectTool choice setting used
parallel_tool_callsbooleanWhether parallel tool calls were enabled
truncationstringTruncation strategy: auto or disabled
textobjectResolved text configuration
reasoningobject or nullReasoning configuration
temperaturenumberTemperature used
top_pnumberTop-p value used
presence_penaltynumberPresence penalty used
frequency_penaltynumberFrequency penalty used
top_logprobsnumberTop logprobs setting
max_output_tokensinteger or nullMax output tokens setting
max_tool_callsinteger or nullMax tool calls setting
userstring or nullUser identifier
storebooleanWhether response was stored
backgroundbooleanWhether processed in background
safety_identifierstring or nullSafety identifier
prompt_cache_keystring or nullPrompt cache key
outputarrayArray of output items
output_textstringConvenience field with concatenated text output
usageobjectToken usage statistics
errorobjectError details (if status is failed)
incomplete_detailsobjectDetails if status is incomplete
metadataobjectCustom metadata (if provided)
service_tierstringService tier used (echoed when provided)

Usage Object

The usage object always includes token details:
{
  "input_tokens": 100,
  "output_tokens": 50,
  "total_tokens": 150,
  "input_tokens_details": {
    "cached_tokens": 0
  },
  "output_tokens_details": {
    "reasoning_tokens": 0
  }
}

Response Status Values

StatusDescription
queuedBackground request is queued
in_progressRequest is being processed
completedRequest completed successfully
incompleteResponse was truncated
failedRequest failed with error
cancelledRequest was cancelled

reasoning Response Field

{
  "effort": "low" | "medium" | "high" | null,
  "summary": "none" | "auto" | "detailed" | "concise" | null
}

text Response Field (Resolved)

{
  "format": { "type": "text" | "json_object" | "json_schema", "...": "..." },
  "verbosity": "low" | "medium" | "high" | undefined
}

Output Item Types

All output items include a status field.

Message Output

{
  "type": "message",
  "id": "msg_123",
  "role": "assistant",
  "status": "completed",
  "content": [
    {
      "type": "output_text",
      "text": "Response text here",
      "annotations": [],
      "logprobs": []
    }
  ]
}

Function Call Output

{
  "type": "function_call",
  "id": "fc_123",
  "call_id": "call_abc",
  "name": "get_weather",
  "arguments": "{\"location\": \"Paris\"}",
  "status": "completed"
}

Reasoning Output (reasoning-capable models)

{
  "type": "reasoning",
  "id": "reasoning_123",
  "status": "completed",
  "summary": [
    {
      "type": "summary_text",
      "text": "I analyzed the problem by..."
    }
  ],
  "content": [
    {
      "type": "reasoning_text",
      "text": "Detailed reasoning goes here."
    }
  ],
  "encrypted_content": null
}

Web Search Call Output

{
  "type": "web_search_call",
  "id": "ws_123",
  "status": "completed",
  "action": { "query": "search query" },
  "results": [{ "url": "...", "title": "...", "snippet": "..." }]
}

Image Generation Call Output

{
  "type": "image_generation_call",
  "id": "ig_123",
  "status": "completed",
  "result": {
    "b64_json": "...",
    "url": "...",
    "revised_prompt": "..."
  }
}

Computer Call Output

{
  "type": "computer_call",
  "id": "cc_123",
  "call_id": "call_abc123",
  "status": "completed",
  "action": { "type": "click" },
  "pending_safety_checks": [{ "id": "...", "code": "...", "message": "..." }]
}

Output Item Status Values

StatusDescription
completedItem finished successfully
in_progressItem still being generated
incompleteItem was truncated/interrupted

Output Text Parts

Output text parts include annotations and logprobs:
{
  "type": "output_text",
  "text": "Hello world",
  "annotations": [],
  "logprobs": [
    {
      "token": "Hello",
      "logprob": -0.5,
      "bytes": [72, 101, 108, 108, 111],
      "top_logprobs": [
        { "token": "Hello", "logprob": -0.5, "bytes": [72, 101, 108, 108, 111] },
        { "token": "Hi", "logprob": -1.2, "bytes": [72, 105] }
      ]
    }
  ]
}

Annotation Types

URL Citation

{
  "type": "url_citation",
  "start_index": 0,
  "end_index": 10,
  "url": "https://...",
  "title": "Page Title"
}

File Citation

{
  "type": "file_citation",
  "start_index": 0,
  "end_index": 10,
  "file_id": "file_..."
}

File Path

{
  "type": "file_path",
  "start_index": 0,
  "end_index": 10,
  "file_id": "file_..."
}

Streaming

Enable streaming to receive incremental response updates:
{
  "model": "openai/gpt-5.2",
  "input": "Write a short story",
  "stream": true
}

Streaming Response

The response is delivered as Server-Sent Events (SSE):
data: {"type":"response.created","response":{...},"sequence_number":0}

data: {"type":"response.in_progress","response":{...},"sequence_number":1}

data: {"type":"response.output_item.added","output_index":0,"item":{...},"sequence_number":2}

data: {"type":"response.output_text.delta","item_id":"msg_...","output_index":0,"content_index":0,"delta":"The ","logprobs":[...],"sequence_number":3}

data: {"type":"response.output_text.delta","item_id":"msg_...","output_index":0,"content_index":0,"delta":"capital ","logprobs":[...],"sequence_number":4}

data: {"type":"response.output_text.done","item_id":"msg_...","output_index":0,"content_index":0,"text":"The capital of France is Paris.","logprobs":[...],"sequence_number":10}

data: {"type":"response.completed","response":{...},"sequence_number":11}

data: [DONE]

Streaming Event Types

EventDescription
response.createdResponse object created
response.in_progressProcessing started
response.output_item.addedNew output item started
response.output_item.doneOutput item completed
response.content_part.addedContent part started
response.content_part.doneContent part completed
response.output_text.deltaIncremental text chunk
response.output_text.doneText content completed
response.reasoning.deltaIncremental reasoning text
response.reasoning.doneReasoning content completed
response.function_call_arguments.deltaIncremental function arguments
response.function_call_arguments.doneFunction call completed
response.completedResponse completed successfully
response.incompleteResponse truncated
response.failedResponse failed

Updated Event Fields

  • All content/output events include item_id for the parent output item.
  • Text delta/done events include logprobs.
Example response.output_text.delta:
{
  "type": "response.output_text.delta",
  "item_id": "msg_...",
  "output_index": 0,
  "content_index": 0,
  "delta": "Hello",
  "logprobs": [...],
  "sequence_number": 5
}

Conversation Threading

Chain responses together for multi-turn conversations. You can use previous_response_id or the conversation object (id or messages) to manage context.

First Request

{
  "model": "openai/gpt-5.2",
  "input": "My name is Alice."
}
Response includes id: "resp_abc123"

Follow-up Request

{
  "model": "openai/gpt-5.2",
  "input": "What is my name?",
  "previous_response_id": "resp_abc123"
}
The model has access to the conversation history and responds: “Your name is Alice.” Note: previous_response_id requires authentication and store: true (default) on previous responses.

Background Mode

For long-running requests, use background mode to receive an immediate response and poll for results.

Initiate Background Request

{
  "model": "openai/gpt-5.2",
  "input": "Write a detailed analysis...",
  "background": true
}

Immediate Response (202 Accepted)

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1699000000,
  "model": "openai/gpt-5.2",
  "status": "queued",
  "output": []
}

Poll for Completion

GET /v1/responses/resp_abc123
Authorization: Bearer YOUR_API_KEY
Keep polling until status is completed, failed, or incomplete. Constraints:
  • Cannot be combined with stream: true
  • Requires authentication
  • Maximum processing time: approximately 800 seconds

Retrieve Response

GET /v1/responses/{id}
Authorization: Bearer YOUR_API_KEY

Response

Returns the full response object (same format as POST response).

Errors

  • 404 - Response not found or belongs to different account
  • 401 - Authentication required/invalid

Delete Response

DELETE /v1/responses/{id}
Authorization: Bearer YOUR_API_KEY

Response

{
  "id": "resp_abc123",
  "object": "response.deleted",
  "deleted": true
}

Error Handling

Error Response Format

{
  "error": {
    "code": "missing_required_parameter",
    "message": "model is required"
  }
}

HTTP Status Codes

HTTP StatusDescription
400Invalid request parameters
401Missing or invalid API key
403Insufficient permissions
404Resource not found
429Rate limit exceeded
500Internal server error
503Service unavailable

Common Error Codes

CodeDescription
missing_required_parameterRequired parameter not provided
model_not_foundSpecified model does not exist
response_not_foundResponse ID not found
invalid_response_idInvalid response ID format
authentication_requiredNo API key provided
invalid_api_keyAPI key is invalid or inactive

Complete Examples

Simple Text Completion

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Explain quantum computing in one sentence."
  }'

Multi-turn Conversation

# First turn
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "I want to learn Python programming."
  }'

# Second turn (using response ID from first request)
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Where should I start?",
    "previous_response_id": "resp_abc123"
  }'

Streaming Response

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Write a haiku about programming",
    "stream": true
  }'

Function Calling

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "What is the weather in Tokyo?",
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    ]
  }'

Submitting Tool Results

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": "What is the weather in Tokyo?"
      },
      {
        "type": "function_call",
        "id": "fc_1",
        "call_id": "call_123",
        "name": "get_weather",
        "arguments": "{\"location\": \"Tokyo\"}"
      },
      {
        "type": "function_call_output",
        "call_id": "call_123",
        "output": "{\"temperature\": 18, \"condition\": \"cloudy\"}"
      }
    ]
  }'

Image Input (Vision)

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "What is in this image?" },
          { "type": "input_image", "image_url": "https://example.com/photo.jpg", "detail": "auto" }
        ]
      }
    ]
  }'

JSON Output

curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "List the planets in our solar system",
    "text": {
      "format": { "type": "json_object" }
    }
  }'

Background Processing

# Start background request
curl -X POST https://nano-gpt.com/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5.2",
    "input": "Generate a comprehensive report...",
    "background": true
  }'

# Poll for results
curl https://nano-gpt.com/api/v1/responses/resp_abc123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Limitations

  1. Deep research models: Deep research variants are not supported.
  2. GPU-TEE streaming: Streaming is not supported for GPU-TEE models. Use /v1/chat/completions for these models.
  3. Background mode: Maximum duration is approximately 800 seconds.
  4. Metadata limits: Maximum 16 keys, 64 character key names, 512 character values.

Service tiers (priority)

Set service_tier: "priority" to request priority processing on providers that support service tiers. Behavior notes:
  • Priority tiers are only applied when the routed provider supports them.
  • Priority tiers are gated on the routed provider, not just the model name.
  • Header provider overrides (like X-Provider) and explicit provider selection are honored for pricing and x402 estimates.
  • Provider-native web search can force routing; priority pricing follows that routing.
Billing note:
  • Priority tier billing uses priority pricing when applicable.

Example: priority tier

{
  "model": "gpt-5.2",
  "input": "Say hi in one sentence.",
  "service_tier": "priority"
}

Response Headers

All responses include:
HeaderDescription
X-Request-IDUnique request/response identifier
Content-Typeapplication/json or text/event-stream

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

X-Provider
string

Optional provider override for pay-as-you-go requests on supported open-source models (case-insensitive). Subscription requests ignore this header.

X-Billing-Mode
string

Optional billing override to force pay-as-you-go (e.g., paygo). Header name is case-insensitive.

Body

application/json

Parameters for the response request

model
string
required

Model ID to use for the response

input
required

Prompt string or array of input items

billing_mode
string

Billing override to force pay-as-you-go. Accepted values (case-insensitive): paygo, pay-as-you-go, pay_as_you_go, paid, payg.

billingMode
string

Alias for billing_mode.

instructions
string

System instructions for the model

max_output_tokens
integer

Maximum tokens in the response

Required range: x >= 16
temperature
number

Sampling temperature (not supported by reasoning models)

Required range: 0 <= x <= 2
top_p
number

Nucleus sampling parameter

Required range: 0 <= x <= 1
tools
object[]

Function tools available to the model

tool_choice

How the model should use tools

parallel_tool_calls
boolean

Allow multiple tool calls in parallel

stream
boolean
default:false

Enable streaming responses

store
boolean
default:true

Store response for later retrieval

previous_response_id
string

Link to previous response for conversation threading

reasoning
object

Reasoning configuration for reasoning-capable models

text
object

Text/format configuration

metadata
object

Custom metadata

truncation
enum<string>

Truncation strategy

Available options:
auto,
disabled
user
string

Unique user identifier

seed
integer

Random seed for reproducibility

background
boolean

Enable background/async processing

service_tier
enum<string>

Optional service tier. Set to "priority" to request priority processing when supported by the routed provider

Available options:
auto,
default,
flex,
priority

Response

Response created

Response object returned by the Responses API

id
string
object
string
created_at
integer
model
string
status
enum<string>
Available options:
queued,
in_progress,
completed,
incomplete,
failed,
cancelled
output
object[]
output_text
string
usage
object
error
object
incomplete_details
object
metadata
object
service_tier
string