Image-conditioned models accept eitherimageDataUrl(base64) or a publicimageUrl. The service uses the explicit value you provide before checking any saved attachments.
Overview
POST /generate-video submits an asynchronous job to create, extend, or edit a video with one of NanoGPT’s provider integrations. The endpoint responds immediately with runId, model, and status: "pending". Poll the Video Status endpoint with that runId until you receive final assets. Duration-based billing is assessed after completion; align any pricing tables with lib/credits/videoPricingConfig.ts.
Provider errors include descriptive JSON payloads. Surface the error.message (and HTTP status) to help users correct content-policy or validation issues.
Request Schema
Only include the fields required by your chosenmodel. Unknown keys are ignored, but some providers fail when extra media fields are present.
Core Fields
| field | type | required | details |
|---|---|---|---|
model | string | yes | See Model Directory for the complete list. Use longstories-movie / longstories-pixel-art; promptchan-video has been removed. |
prompt | string | conditional | Required for text-to-video and edit models unless a structured script is supplied. |
negative_prompt | string | no | Suppresses specific content. Respected by Veo, Wan, Runway, Pixverse, and other models noted below. |
script | string | conditional | LongStories models accept full scripts instead of relying on prompt. |
storyConfig | object | conditional | LongStories structured payload (e.g. scenes, narration, voice). |
duration | string | conditional | Seconds as a string ("5", "8", "60"). Limits vary per model; see individual entries. |
seconds | string | conditional | Sora-specific duration selector ("4", "8", "12"). |
aspect_ratio | string | conditional | Provider-specific ratios such as 16:9, 9:16, 1:1, 3:4, 4:3, 21:9, auto. |
orientation | string | conditional | landscape or portrait for Sora and Wan text/image flows. |
resolution | string | conditional | Resolution tokens (480p, 580p, 720p, 1080p, 1792x1024, 2k, 4k). |
size | string | conditional | Provider preset for vidu-video, pixverse-*, wan-wavespeed-22-plus, wan-wavespeed-25, and upscalers. |
generateAudio | boolean | no | Adds AI audio on Veo 3 and Lightricks models. Defaults to false. |
enhancePrompt | boolean | no | Optional Veo 3 prompt optimizer. Defaults to false. |
pro_mode / pro | boolean | no | High-quality toggle for Sora and Hunyuan families. Defaults to false. |
enable_prompt_expansion | boolean | no | Prompt booster for Wan/Seedance/Minimax variants. Disabled by default. |
enable_safety_checker | boolean | no | Wan 2.2 Turbo safety switch. Defaults to provider configuration. |
camera_fix / camera_fixed / cameraFixed | boolean | no | Locks the virtual camera for Seedance and Wan variants. |
seed | number or string | no | Deterministic seed when supported (Veo, Wan, Pixverse). |
voice_id | string | conditional | Required by kling-lipsync-t2v. |
voice_language | string | conditional | en or zh for kling-lipsync-t2v. |
voice_speed | number | conditional | Range 0.8-2.0 for kling-lipsync-t2v. |
videoDuration / billedDuration | number | no | Optional overrides for upscaler billing calculations. |
adjust_fps_for_interpolation | boolean | no | Optional toggle for interpolation-aware upscaling. Defaults to false. |
Media Inputs
| field | type | required | details |
|---|---|---|---|
imageDataUrl | string | conditional | Base64-encoded data URL. Recommended for private assets or files larger than 4 MB. |
imageUrl | string | conditional | HTTPS link to a source image. |
image | string | conditional | Some Wavespeed/ByteDance providers expect this property instead of imageUrl. |
reference_image | string | conditional | Optional still image guiding runwayml-gen4-aleph. |
audioDataUrl | string | conditional | Base64 data URL for audio-driven models. |
audioUrl | string | conditional | HTTPS audio input. |
audio | string | conditional | Alternate audio field accepted by ByteDance and Kling lipsync providers. |
video | string | conditional | HTTPS link or data URL to the source video (edit, extend, upscaler, or lipsync jobs). |
videoUrl | string | conditional | Alias accepted by select providers. |
swapImage | string | conditional | Required by magicapi-video-face-swap. |
targetVideo | string | conditional | Required by magicapi-video-face-swap. |
targetFaceIndex | number | no | Optional face index for MagicAPI swaps. |
Provide only the media fields that your target model expects. Extra media inputs often trigger provider validation errors.
Advanced Controls
| field | type | models |
|---|---|---|
num_frames | integer | Wan 2.2 families, Seedance 22 5B, Wan image-to-video. |
frames_per_second | integer | Wan 2.2 5B. |
num_inference_steps | integer | Wan 2.2 families. |
guidance_scale | number | Wan 2.2 5B. |
shift | number | Wan 2.2 5B. |
interpolator_model | string | Wan 2.2 5B. |
num_interpolated_frames | integer | Wan 2.2 5B. |
movementAmplitude | string | vidu-video (auto, small, medium, large). |
motion | string | midjourney-video (low, high). |
style | string | vidu-video (general, anime), pixverse-* (various presets). |
effectType, effect, cameraMovement, motionMode, soundEffectSwitch, soundEffectPrompt | varies | Pixverse v4.5/v5. |
mode | string | wan-wavespeed-22-animate (animate, replace). |
prompt_optimizer | boolean | minimax-hailuo-02, minimax-hailuo-02-pro. |
Model Directory
Model strings are grouped by upstream provider. Each row lists the required media inputs and notable request fields. Code references point to the backend implementation.Core & LongStories Models
| model | input types | key request fields | code ref |
|---|---|---|---|
kling-video | text-to-video | prompt, optional negative_prompt, duration (defaults to 5) | lib/modelProviders/videoModels.ts |
kling-video-v2 | text/image-to-video | prompt, optional imageDataUrl/imageUrl, duration (5, 10), Kling guidance handled server-side | lib/modelProviders/videoModels.ts |
veo2-video | text/image-to-video | prompt, duration (5s-30s), aspect_ratio (16:9, 9:16, 1:1, 4:3, 3:4), optional negative_prompt, seed | lib/modelProviders/videoModels.ts |
minimax-video | text/image-to-video | duration (6, 10), optional enable_prompt_expansion | lib/modelProviders/videoModels.ts |
hunyuan-video | text-to-video | pro (boolean), resolution (480p, 720p, 1080p), num_frames, num_inference_steps, optional negative_prompt | lib/modelProviders/videoModels.ts |
hunyuan-video-image-to-video | image-to-video | Requires imageDataUrl/imageUrl, supports pro, resolution, num_frames, num_inference_steps | lib/modelProviders/videoModels.ts |
wan-video-image-to-video | image-to-video | Requires imageDataUrl/imageUrl, optional prompt, num_frames, frames_per_second, resolution, negative_prompt, seed | lib/modelProviders/videoModels.ts |
kling-v21-standard | image-to-video | Requires imageDataUrl/imageUrl, duration (5), optional negative_prompt, seed | lib/modelProviders/runwareVideo.ts |
kling-v21-pro | text/image-to-video | duration (5, 10), optional negative_prompt, seed | lib/modelProviders/runwareVideo.ts |
kling-v21-master | text/image-to-video | duration (5 default), negative_prompt, seed, Runware tuning parameters | lib/modelProviders/runwareVideo.ts |
longstories-movie | scripted generation | duration (30, 60, 180, 300, 600), accepts script/storyConfig payloads (use duration instead of legacy targetLengthInWords) | lib/modelProviders/longstoriesModel.ts |
longstories-pixel-art | scripted generation | duration (15, 30, 60, 180, 300, 600), same structured payloads as longstories-movie | lib/modelProviders/longstoriesPixelArtModel.ts |
Wavespeed & Partner Models
| model | input types | key request fields | code ref |
|---|---|---|---|
sora-2 | text-to-video, image-to-video | pro_mode (boolean), resolution (720p, 1792x1024), orientation (landscape, portrait), seconds (4, 8, 12), optional imageDataUrl/imageUrl | lib/modelProviders/sora2Video.ts |
veo3-1-video | text-to-video, image-to-video | duration (4, 6, 8), resolution (720p, 1080p), aspect_ratio (16:9, 9:16), generateAudio (default false), optional negative_prompt, seed | lib/modelProviders/wavespeedVeo31.ts |
wan-wavespeed-s2v | image + audio → video | Requires image + audio (URL or data URI), optional prompt, resolution (480p, 720p) | lib/modelProviders/wavespeedWanS2V.ts |
veed-fabric-1.0 | image + audio → talking head | Requires imageDataUrl/imageUrl + audioDataUrl/audioUrl, resolution (480p, 720p) | lib/modelProviders/wavespeedVeedFabric.ts, app/api/generate-video/route.ts |
bytedance-avatar-omni-human-1.5 | image + audio → avatar | Requires image + audio, duration derived from audio, optional prompt | lib/modelProviders/videoModels.ts:1115 |
kling-lipsync-t2v | focal video + script | Requires video URL, prompt/text, voice_id, voice_language (en, zh), voice_speed (0.8-2.0) | lib/modelProviders/wavespeedKlingLipsync.ts, lib/modelProviders/videoModels.ts:881 |
kling-lipsync-a2v | focal video + audio | Requires video + audio URLs, 2-10 s clip length | lib/modelProviders/wavespeedKlingLipsync.ts, lib/modelProviders/videoModels.ts:938 |
wan-wavespeed-video-edit | video edit | Requires source video, text prompt, resolution (480p, 720p) | lib/modelProviders/videoModels.ts:1075 |
wan-wavespeed-22-spicy-extend | video extend | Requires source video, resolution (480p, 720p), duration (5, 8) | lib/modelProviders/videoModels.ts:1094 |
wan-wavespeed-22-plus | text/image generative | resolution (480p, 720p, 1080p), orientation (text-only), fixed 5 s clips, optional enable_prompt_expansion | lib/modelProviders/videoModels.ts:1024, lib/modelProviders/wavespeedWan22Plus.ts |
wan-wavespeed-22-animate | reference image + driver video | Requires image + driver video, resolution (480p, 720p), mode (animate, replace) | lib/modelProviders/videoModels.ts:1602 |
wan-wavespeed-25 | text/image generative | resolution (480p, 720p, 1080p via resolution or size), duration (5, 10), optional enable_prompt_expansion | lib/modelProviders/videoModels.ts:1205 |
lightricks-ltx-2-fast | text/image, optional audio | duration (6-20 s), generateAudio toggle | lib/modelProviders/videoModels.ts:1638 |
lightricks-ltx-2-pro | text/image, optional audio | duration (6, 8, 10 s), generateAudio toggle | lib/modelProviders/videoModels.ts:1664 |
runwayml-gen4-aleph | video-to-video (+ optional reference image) | Requires video, optional reference_image, aspect_ratio (16:9, 4:3, 1:1, 3:4, 9:16, auto) | lib/modelProviders/wavespeedRunwayGen4Aleph.ts |
video-upscaler | video-to-video | Requires video, target_resolution (720p, 1080p, 2k, 4k); optional videoDuration / billedDuration | lib/modelProviders/videoModels.ts:1186, app/api/generate-video/route.ts |
bytedance-seedance-upscaler | video-to-video | Requires video, target_resolution (1080p, 2k, 4k) | lib/modelProviders/videoModels.ts:1197 |
bytedance-waver-1.0 | image-to-video | Requires image, fixed 5 s duration | lib/modelProviders/videoModels.ts:1231 |
bytedance-seedance-v1-pro-fast | text/image | resolution (480p, 720p, 1080p), duration (2-12 s), aspect_ratio (16:9, 9:16, 1:1, 3:4, 4:3, 21:9), camera_fixed | lib/modelProviders/videoModels.ts:1260 |
kling-v25-turbo-pro | text/image | duration (5, 10), aspect_ratio (16:9, 9:16, 1:1) | lib/modelProviders/videoModels.ts:1344 |
kling-v25-turbo-std | image-to-video | Requires image, duration (5, 10); Kling guidance handled within the Runware provider | lib/modelProviders/videoModels.ts:1391, lib/modelProviders/runwareVideo.ts |
minimax-hailuo-23-standard | text/image | duration (6, 10), optional enable_prompt_expansion | lib/modelProviders/wavespeedMinimaxHailuo23Standard.ts |
minimax-hailuo-23-pro | text/image | Fixed 5 s clips, optional enable_prompt_expansion | lib/modelProviders/wavespeedMinimaxHailuo23Pro.ts |
FAL-hosted Models
| model | input types | key request fields | code ref |
|---|---|---|---|
veo3-video | text/image | generateAudio, enhancePrompt, aspect_ratio (16:9), duration (5s-8s), resolution (720p, 1080p), optional negative_prompt, seed | lib/modelProviders/veo3Video.ts |
veo3-fast-video | text/image | generateAudio, enhancePrompt, aspect_ratio (16:9, 9:16), resolution (720p, 1080p) | lib/modelProviders/veo3FastVideo.ts |
veo2-video-image-to-video | image-to-video | Requires imageDataUrl/imageUrl, aspect_ratio (16:9, 9:16), duration (5s-8s) | lib/modelProviders/veo2VideoImageToVideo.ts |
wan-video-22 | text/image | resolution (480p, 720p), orientation (landscape, portrait), duration (5, 8), optional enable_prompt_expansion | lib/modelProviders/wanVideo22.ts |
wan-video-22-5b | text/image | negative_prompt, num_frames (81-121), frames_per_second (4-60), resolution (580p, 720p), aspect_ratio, num_inference_steps, enable_safety_checker, enable_prompt_expansion, guidance_scale, shift, interpolator_model, num_interpolated_frames | lib/modelProviders/wanVideo22-5b.ts |
wan-video-22-turbo | text/image | resolution (480p, 580p, 720p), aspect_ratio (auto, 16:9, 9:16, 1:1), enable_safety_checker, enable_prompt_expansion, optional seed | lib/modelProviders/wanVideo22Turbo.ts |
seedance-video | text/image | resolution (480p, 720p), aspect_ratio (16:9, 1:1, 3:4, 9:16, 21:9, 9:21), duration (5, 10), camera_fix | lib/modelProviders/seedanceVideo.ts |
seedance-lite-video | text/image | Default resolution 720p, aspect_ratio 16:9, duration 5; supports camera_fixed, optional seed | lib/modelProviders/falSeedanceLiteVideo.ts, app/api/generate-video/route.ts |
minimax-hailuo-02 | text/image | duration (6, 10), prompt_optimizer toggle | lib/modelProviders/minimaxHailuoVideo02.ts |
minimax-hailuo-02-pro | text/image | Fixed duration 6, prompt_optimizer toggle | lib/modelProviders/minimaxHailuoVideo02Pro.ts |
Runware-backed Models
| model | input types | key request fields | code ref |
|---|---|---|---|
kling-v21-master | text/image-to-video | Documented above; Runware-specific tuning parameters are auto-populated | lib/modelProviders/runwareVideo.ts, lib/modelProviders/videoModels.ts:369 |
veo3-fast-video | text/image | Same payload as the FAL variant; this route uses Runware infrastructure | lib/modelProviders/veo3FastVideo.ts |
vidu-video | text/image | size (16:9 / 1920×1080), style (general, anime), movementAmplitude (auto, small, medium, large), fixed 5 s duration | lib/modelProviders/viduVideo.ts |
pixverse-v45 | text/image | size presets (multiple aspect ratios), duration (5, 8), style, effectType, effect, cameraMovement, motionMode, soundEffectSwitch, soundEffectPrompt | lib/modelProviders/pixverseVideo.ts |
pixverse-v5 | text/image | Same field set as v4.5 with updated defaults | lib/modelProviders/pixverseVideoV5.ts |
Doubao / ByteDance (direct)
These models are also available outside Wavespeed; their field requirements mirror the entries above.| model | input types | notes | code ref |
|---|---|---|---|
bytedance-avatar-omni-human-1.5 | image + audio → avatar | Requires image + audio, duration derived from audio length, optional prompt | lib/modelProviders/videoModels.ts:1115 |
bytedance-waver-1.0 | image-to-video | Fixed 5 s duration, NSFW content may be filtered | lib/modelProviders/videoModels.ts:1231 |
bytedance-seedance-v1-pro-fast | text/image | resolution (480p, 720p, 1080p), duration (2-12 s), aspect_ratio (16:9, 9:16, 1:1, 3:4, 4:3, 21:9), camera_fixed | lib/modelProviders/videoModels.ts:1260 |
Other Providers
| model | input types | key request fields | code ref |
|---|---|---|---|
midjourney-video | image-to-video | Requires image (imageDataUrl or library asset), motion (low, high) | lib/modelProviders/midjourneyVideo.ts |
magicapi-video-face-swap | face swap | Requires swapImage, targetVideo, optional targetFaceIndex; supports videos ≤ 4 min; pricing varies by resolution | lib/modelProviders/magicapivideoModel.ts, app/api/generate-video/route.ts |
Async Processing & Status Polling
- The submission response includes
{ runId, model, status: "pending" }. - Poll
/video-statuswith the returnedrunIduntil the job reachesstatus: "succeeded"orstatus: "failed". - Many providers emit intermediate states (
queued,processing,generating,delivering). Persist them if you need audit trails. - Failed jobs include an
errorobject mirroring the upstream provider response. Surface the message and adjust prompts or inputs before retrying. - Duration and resolution determine credit usage; reconcile charges against
lib/credits/videoPricingConfig.ts.
Content & Safety Notes
Wan 2.2 Turbo, Veo 3, Kling, and other providers may block prompts that violate content policies. Non-200 responses describe the violation reason; relay these messages verbatim to users or implement automated prompt adjustments.Next Steps
- Poll the Video Status endpoint after every submission to retrieve final assets.
- Keep customer-facing pricing tables in sync with
lib/credits/videoPricingConfig.ts. - Remove any external references to
promptchan-video; the provider is disabled in code.
Authorizations
Body
Parameters for video generation across different models and providers
The video model to use for generation
longstories, longstories-kids, kling-video, kling-video-v2, veo2-video, minimax-video, hunyuan-video, hunyuan-video-image-to-video, wan-video-image-to-video, kling-v21-standard, kling-v21-pro, kling-v21-master, promptchan-video Text prompt describing the video to generate
"A serene lake at sunset with gentle ripples on the water"
Fully-written script for LongStories models (takes precedence over prompt)
UUID for conversation tracking
Project identifier for LongStories models
Story framework for LongStories models
default, emotional_story, product_showcase, tutorial Smart Enhancement: if true, automatically choose better framework and add Director Notes if necessary
Target length in words for LongStories models (legacy parameter)
Target length in seconds (alternative to words)
Prompt for the image generation engine (LongStories). Example: 'Warm lighting' or 'Make the first image very impactful'
"Warm, cozy lighting with focus on people interacting"
Video aspect ratio for LongStories
9:16, 16:9 Script generation configuration for LongStories
Image generation configuration for LongStories
Video generation configuration for LongStories
Voiceover configuration for LongStories
Captions configuration for LongStories
Effects configuration for LongStories
Music configuration for LongStories
Legacy: Voice ID for narration (use voiceoverConfig.voiceId instead)
"pNInz6obpgDQGcFmaJgB"
Legacy: Whether to show captions (use captionsConfig.captionsEnabled instead)
Legacy: Style for captions (use captionsConfig.captionsStyle instead)
default, minimal, neon, cinematic, fancy, tiktok, highlight, gradient, instagram, vida, manuscripts Legacy: Video effects configuration (use effectsConfig instead)
Legacy: Video quality (handled by videoConfig now)
low, medium, high Legacy: Motion configuration (handled by videoConfig now)
Legacy: Music track (use musicConfig instead)
"video-creation/music/dramatic_cinematic_score.mp3"
Video duration (format varies by model - '5s' for Veo2, '5' for Kling, etc.)
"5s"
Aspect ratio for FAL models
16:9, 9:16, 1:1, 4:3, 3:4 Negative prompt to avoid certain elements
"blur, distort, and low quality"
Classifier-free guidance scale
0 <= x <= 20Base64 data URL of input image for image-to-video models
Library attachment ID for input image
Whether to optimize the prompt (MiniMax model)
Number of inference steps
1 <= x <= 50Enable pro mode for Hunyuan Video
Video resolution
720p, 1080p, 540p Number of frames to generate
Frames per second
5 <= x <= 24Random seed for reproducible results
Enable safety content filtering
Allow explicit content (inverse of safety checker)
Enable automatic prompt expansion
Enable acceleration for faster processing
Shift parameter for certain models
Age setting for PromptChan model
18 <= x <= 60Enable audio for PromptChan model
Video quality for PromptChan model
Standard, High Aspect setting for PromptChan model
Portrait, Landscape, Square Response
Video generation request submitted successfully (asynchronous processing)
Unique identifier for the video generation request
Current status of the generation
pending, processing, completed, failed The model used for generation
Project identifier (for LongStories models)
Cost of the video generation
Payment source used (USD or XNO)
Remaining balance after the generation