Overview
The NanoGPT API offers multiple ways to generate text, including OpenAI-compatible endpoints and our legacy options. This guide covers all available text generation methods. If you are using a TEE-backed model (e.g., prefixed withTEE/), you can also verify the enclave attestation and signatures for your chat completions. See the TEE Model Verification guide for more details.
OpenAI Compatible Endpoints
Chat Completions (v1/chat/completions)
This endpoint mimics OpenAI’s chat completions API:Text Completions (v1/completions)
This endpoint mimics OpenAI’s legacy text completions API:Legacy Text Completions
For the older, non-OpenAI compatible endpoint:Prompt Caching (Claude Models)
Claude caching follows Anthropic’s Messages schema:cache_control lives on the message content blocks you want to reuse. NanoGPT simply forwards those markers to Anthropic, so you decide where each cache breakpoint sits. The first invocation costs 1.25× (5 min TTL) or 2× (1 hour TTL); cached replays discount the same tokens by ~90%.
Explicit cache_control markers
cache_controlbelongs to the individual content blocks (system,user, tool definitions, etc.). Each marker caches the entire prefix up to and including that block, matching Anthropic’s behavior.- Supported TTLs are
5mand1h. Omitttlto use the default5mwindow. - Set the
anthropic-beta: prompt-caching-2024-07-31header on any request that contains cache markers; Anthropic rejects cache requests without the beta flag. - Check
usage.prompt_tokens_details.cached_tokensin NanoGPT’s response to confirm what was billed at the discounted rate.
Using the prompt_caching helper
If you prefer not to duplicate cache_control entries manually, NanoGPT accepts a helper object that tags the leading prefix for you.
cut_after_message_index is zero-based and points at the last message in the static prefix. NanoGPT will attach a cache_control block with your TTL to each message up to that index before forwarding the request to Anthropic—no additional heuristics are applied. If you need different cache durations or non-contiguous breakpoints, fall back to explicit cache_control markers in your messages array.
Chat Completions with Web Search
Enable real-time web access for any model by appending special suffixes:Web Search Options
:online- Standard search with 10 results ($0.006 per request):online/linkup-deep- Deep iterative search ($0.06 per request)