Skip to main content

Provider Selection

Provider selection chooses the upstream provider for a supported model. It does not change the model ID. For example, to use Kimi K2.6 through Novita, keep the model as moonshotai/kimi-k2.6 or moonshotai/kimi-k2.6, and send the provider separately with X-Provider. Provider selection is optional; if you do nothing, the platform picks the provider. X-Provider, a body provider string, or a body provider object can select or constrain providers for a request. Explicit provider selection is always pay-as-you-go and is charged at the selected provider’s price, including provider-selection markup. For subscription users, sending an explicit provider selection bypasses subscription coverage for that request. X-Billing-Mode: paygo is only needed when forcing pay-as-you-go billing without an explicit provider, or when saved provider preferences should apply to subscription-included traffic. See Pay-As-You-Go Billing Override. This page intentionally documents only public provider IDs returned by the provider-discovery endpoints. Internal routing/provider names are never part of the public API contract.

When Provider Selection Applies

  • Provider selection only applies when a model reports supportsProviderSelection: true.
  • Use GET /api/models/:canonicalId/providers to discover eligible models and provider IDs.
  • supportsProviderSelection is exposed on the model-specific provider endpoint. Do not rely on /api/v1/models or /api/v1/models?detailed=true to determine provider-selection support.
  • If a model does not support provider selection, the request ignores provider preferences.
  • You cannot currently force a provider and have that same request count as subscription-included usage.

Discover Providers and Pricing

Use this endpoint to list available providers and the pricing you will pay when selecting one.
GET /api/models/:canonicalId/providers
Provider rows may also include distillationPolicy, which indicates whether that specific hosted provider route is allowed for output-based model training or distillation under NanoGPT’s recorded provider-terms and model-license rules. See Distillation Policy. This endpoint is not under /api/v1. If your base URL is https://nano-gpt.com/api/v1, do not append this path to that base URL. Use:
GET https://nano-gpt.com/api/models/moonshotai%2Fkimi-k2.6/providers
If the model ID contains /, URL-encode the slash when placing it in the path:
GET https://nano-gpt.com/api/models/moonshotai%2Fkimi-k2.6/providers
Do not use the unencoded form, because the slash is treated as a path separator:
GET https://nano-gpt.com/api/models/moonshotai/kimi-k2.6/providers
Here :canonicalId means the model ID or one of its supported aliases. For Kimi K2.6, both moonshotai/kimi-k2.6 and moonshotai/kimi-k2.6 work. Example response:
{
  "canonicalId": "moonshotai/kimi-k2.6",
  "displayName": "Kimi K2.6",
  "supportsProviderSelection": true,
  "defaultPrice": {
    "inputPer1kTokens": 0.0005,
    "outputPer1kTokens": 0.0026
  },
  "providers": [
    {
      "provider": "moonshot",
      "available": true
    },
    {
      "provider": "novita",
      "available": true,
      "distillationPolicy": {
        "status": "allowed",
        "label": "License permits distillation",
        "basis": "permissive-open-weights",
        "sourceUrl": "https://example.com/license-or-terms",
        "note": "Provider terms do not record an output-training restriction, and the model-level policy allows distillation."
      }
    },
    {
      "provider": "cloudflare",
      "available": true
    },
    {
      "provider": "baseten",
      "available": true
    }
  ]
}
Notes:
  • defaultPrice is used when you do not select a provider.
  • If providers[].pricing is present, it is what you pay when you select that provider (includes the markup).
  • Use the exact value from providers[].provider in the X-Provider header.
  • Provider-specific distillationPolicy can differ from the model-level policy. Explicit provider restrictions override model-level allowances.
  • If a model is unsupported, the response includes supportsProviderSelection: false.

Per-Request Provider Override

For a single request, send the provider ID in the X-Provider header.
curl https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Provider: novita" \
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      { "role": "user", "content": "Hello" }
    ]
  }'
Notes:
  • X-Provider is case-insensitive (x-provider also works).
  • The provider ID must be one of the values returned by the providers endpoint.
  • The body provider field also accepts the same provider ID as a string on POST /api/v1/chat/completions, POST /api/v1/completions, and POST /api/talk-to-gpt.
  • Recommended: use X-Provider or body provider for explicit provider overrides. The API also accepts a trailing provider suffix for user-selectable providers, such as model-id:cerebras, but request fields are clearer and easier to validate against provider-discovery responses.
  • To avoid Moonshot for Kimi K2.6, select any available non-Moonshot provider ID returned by the provider-discovery endpoint, such as novita, cloudflare, baseten, or inceptron.
  • If the request would otherwise be subscription-covered, explicit provider selection still bypasses subscription coverage and bills the request as pay-as-you-go. You do not need to also send billing_mode: "paygo" or X-Billing-Mode: paygo.

Provider Routing Object

POST /api/v1/chat/completions, POST /api/v1/completions, and POST /api/talk-to-gpt also accept a structured provider object for per-request routing controls.
{
  "model": "model-id",
  "provider": {
    "order": ["provider-a", "provider-b"],
    "only": ["provider-b"],
    "ignore": ["provider-c"],
    "sort": "throughput",
    "max_price": {
      "prompt": 0.5,
      "completion": 2
    },
    "allow_fallbacks": false,
    "require_parameters": true
  },
  "messages": [
    { "role": "user", "content": "Hello" }
  ]
}
Provider entries can use public NanoGPT provider IDs or accepted display names/aliases. Use GET /api/models/:canonicalId/providers to discover public provider IDs for a model. Do not rely on internal provider names that are not returned by public provider-discovery responses.
FieldTypeBehavior
provider.orderstring[]Soft preference order. NanoGPT tries these providers first when possible, but may route elsewhere. Unknown providers are ignored.
provider.onlystring[]Hard provider pin. NanoGPT restricts routing to these providers and fails instead of routing outside the list. Unknown providers return 400 with error.code: "provider_unknown_provider".
provider.ignorestring[]Excludes providers from routing. Unknown providers are ignored.
provider.sortstringRouting preference. Supported values are speed, throughput, latency, price, auto, none, and default. auto, none, and default suppress stored/default routing preferences without requesting a sort.
provider.max_priceobjectOptional prompt and/or completion price caps in USD per 1 million tokens. Pricing checks are best-effort when provider pricing is unknown.
provider.allow_fallbacksbooleanWhen false, disables cross-provider fallback for this request.
provider.require_parametersbooleanWhen true, requires the selected provider to support the requested parameters on routes that support parameter-level capability checks.
Examples:
{
  "provider": {
    "order": ["provider-a", "provider-b"]
  }
}
{
  "provider": {
    "only": ["provider-b"]
  }
}
{
  "provider": {
    "ignore": ["provider-c"]
  }
}
{
  "provider": {
    "max_price": {
      "prompt": 0.5,
      "completion": 2
    }
  }
}
{
  "provider": {
    "require_parameters": true
  }
}
{
  "provider": {
    "order": ["provider-b"],
    "allow_fallbacks": false
  }
}

Routing Semantics

  • order is soft. Use only when the request must fail instead of using any provider outside the list.
  • only is hard. It restricts routing to the listed providers and disables internal fallback outside the pinned set.
  • ignore and max_price can resolve to a concrete provider. For direct provider routing, NanoGPT cannot represent “any provider except X” as a multi-provider pool. If filters like ignore or max_price are provided without order or only, NanoGPT resolves to one cheapest concrete provider that satisfies those filters. That is treated as explicit provider selection for billing and routing because the request changed the provider candidate set.
  • If max_price is provided without order or an explicit sort, NanoGPT treats it as a request for price-aware routing among providers that satisfy the cap.
  • Per-request provider object controls take precedence over saved provider preferences for that request. They are not merged with saved preferences.

Validation

Invalid object shapes return a structured 400 error:
  • provider.order, provider.only, and provider.ignore must be arrays of strings.
  • Unknown providers in only return provider_unknown_provider.
  • Unknown providers in order and ignore are ignored.
  • provider.max_price.prompt and provider.max_price.completion must be non-negative numbers.
  • provider.allow_fallbacks and provider.require_parameters must be booleans.
Example error:
{
  "error": {
    "message": "Unknown or unavailable provider id in provider.only: not-a-provider",
    "type": "invalid_request_error",
    "param": "provider.only",
    "code": "provider_unknown_provider"
  }
}

Consecutive Prompts and Prompt Caching

When you explicitly select a provider with X-Provider or body provider, NanoGPT routes that request to the selected provider. For consecutive prompts, using the same provider helps keep routing stable, which is the setup you want for provider-side prompt-cache reuse when that provider and model support caching. For capability-based routing, use top-level caching: true or the :caching model suffix instead of choosing a specific provider. NanoGPT will route to any available provider that supports prompt/input caching for the requested provider-selection model, and it defaults to sticky provider routing so later matching requests prefer the same provider. General automatic routing may use cache-affinity routing for eligible cache-capable providers, but it does not require a cache-capable provider and may still choose different upstream providers when request shape, provider availability, or routing constraints change. For cache-sensitive workflows, either consistently send the same X-Provider value or use caching: true / :caching when any cache-capable provider is acceptable.

Cache-Capable Provider Routing

Set caching: true when you want NanoGPT to route a chat completion request to any available provider that supports prompt/input caching. This is capability-based provider selection, not explicit provider selection and not prompt-cache annotation. It does not add Anthropic-style cache_control markers or configure cache TTLs.
{
  "model": "model-id",
  "caching": true,
  "messages": [
    { "role": "user", "content": "Hello" }
  ]
}
If no usable cache-capable provider exists for the model, the request fails instead of falling back to a non-cache-capable provider. By default, caching: true is sticky. The first successful matching request records the selected provider by API key or session. Later matching requests prefer that same provider when it is still usable, improving the chance of provider-side cache hits. NanoGPT does not guarantee that the request will be served from cache. To require a cache-capable provider without stickiness:
{
  "model": "model-id",
  "caching": true,
  "stickyprovider": false,
  "messages": [
    { "role": "user", "content": "Hello" }
  ]
}
Top-level stickyProvider is accepted as a camelCase alias for stickyprovider. The model suffixes :caching, :cache, and :cached request the same cache-capable provider routing:
{
  "model": "moonshotai/kimi-k2.6:thinking:caching",
  "messages": [
    { "role": "user", "content": "Hello" }
  ]
}
Routing order for caching: true:
  1. Filter to providers that are available, not excluded by preferences, and marked as prompt-caching capable.
  2. If stickiness is enabled, prefer the previously recorded provider for the same cache-relevant request shape when still usable.
  3. Otherwise choose the cheapest cache-capable provider by base input + output price.
  4. Use cache write/read pricing only as tie-breakers.

Per-Request Routing Preference

For provider-selection-capable models, you can append a routing preference suffix to the model ID:
  • :speed / :fast: best estimated completion time using TTFT plus TPS.
  • :throughput: highest tokens per second.
  • :latency: lowest time to first token.
  • :price / :cheap: lowest base input-plus-output token price.
  • :floor: alias for :price.
  • :tools: route to a tools-capable provider path when the model supports tools provider selection.
Example:
{
  "model": "zai-org/glm-5:fast",
  "messages": [{ "role": "user", "content": "Hello" }]
}
Notes:
  • Routing preference suffixes are billed like explicit provider-selection requests.
  • They cannot be combined with X-Provider, body provider, or provider model suffixes.
  • :tools cannot be combined with routing preference suffixes or caching: true.
  • If the model does not support provider selection, the request returns an invalid request error for these suffixes.
See Model Suffixes for the complete suffix reference and composition rules.

Persistent Provider Preferences

These endpoints let a user save provider preferences in their session metadata.
GET /api/user/provider-preferences
PATCH /api/user/provider-preferences
DELETE /api/user/provider-preferences
These endpoints require a logged-in web session. API-key-only requests should use per-request provider selection with X-Provider, unless preferences were already saved on the associated user session. Saved provider preferences apply to pay-as-you-go traffic. For subscription-included traffic, send X-Billing-Mode: paygo or billing_mode: "paygo" when you want saved provider preferences to apply. Example GET response (placeholders):
{
  "preferredProviders": ["provider-a", "provider-b"],
  "excludedProviders": ["provider-c"],
  "enableFallback": true,
  "modelOverrides": {
    "model-id": {
      "preferredProviders": ["provider-b"],
      "enableFallback": false
    }
  },
  "availableProviders": ["provider-a", "provider-b", "provider-c"]
}
Example PATCH payload:
{
  "preferredProviders": ["provider-a", "provider-b"],
  "excludedProviders": ["provider-c"],
  "enableFallback": false,
  "modelOverrides": {
    "model-id": {
      "preferredProviders": ["provider-b"],
      "enableFallback": true
    }
  }
}
Field details:
  • preferredProviders: ordered list of allowed providers; the system tries each in order.
  • excludedProviders: providers that should never be used.
  • enableFallback: when true (default), fall back to the platform default if no preferred provider is available.
  • modelOverrides: optional per-model overrides for the fields above.
  • availableProviders: full set of provider IDs available to your account for the model.

Resolution Order

When caching: true is set, use the routing order in Cache-Capable Provider Routing. Otherwise, when multiple selections exist, the system resolves providers in this order:
  1. Per-request explicit provider selection (X-Provider, body provider string/object, or provider suffix)
  2. Per-request routing preference suffix (:fast, :cheap, etc.)
  3. Per-model preferredProviders
  4. Global preferredProviders
  5. Platform default (only if enableFallback is true)

Billing and Markup

  • If you explicitly select or constrain providers with X-Provider or body provider, billing uses the resolved provider-specific price plus a 5% markup and the request is treated as pay-as-you-go.
  • If saved preferences select a provider on pay-as-you-go traffic, billing uses the provider-specific price plus a 5% markup.
  • If you do not select a provider, billing uses the model’s default price.

Error Behavior

  • If enableFallback is false and no preferred provider is available, /api/v1/chat/completions returns 400 with error.code: "no_fallback_available".
  • Invalid body provider objects return 400 with error.type: "invalid_request_error".
  • PATCH /api/user/provider-preferences validates provider IDs and returns:
    • 422 INVALID_INPUT for malformed payloads.
    • 400 INVALID_EXCLUSIONS if exclusions would leave a model with no usable provider.