Skip to main content
POST
/
generate-video
cURL
curl --request POST \
  --url https://nano-gpt.com/api/generate-video \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '{
  "model": "longstories",
  "prompt": "A serene lake at sunset with gentle ripples on the water",
  "script": "<string>",
  "conversationUUID": "<string>",
  "projectId": "<string>",
  "framework": "default",
  "shortRequestEnhancer": false,
  "targetLengthInWords": 70,
  "targetLengthInSeconds": 123,
  "directorNotes": "Warm, cozy lighting with focus on people interacting",
  "aspectRatio": "9:16",
  "scriptConfig": {
    "style": "default",
    "targetLengthInSeconds": 30
  },
  "imageConfig": {
    "model": "hidream_dev",
    "loraConfig": {
      "loraSlug": "ghibsky-comic-book"
    }
  },
  "videoConfig": {
    "enabled": true,
    "model": "kling_v2_1_std_5s"
  },
  "voiceoverConfig": {
    "enabled": true,
    "voiceId": "zWDA589rUKXuLnPRDtAG"
  },
  "captionsConfig": {
    "captionsEnabled": true,
    "captionsStyle": "tiktok"
  },
  "effectsConfig": {
    "transition": "fade",
    "floating": true
  },
  "musicConfig": {
    "enabled": true,
    "musicSlug": "gentle_ambient_loop",
    "volume": 0.3,
    "loop": true
  },
  "voice": "pNInz6obpgDQGcFmaJgB",
  "captionsShow": true,
  "captionsStyle": "default",
  "effects": {
    "transition": "fade",
    "floating": false
  },
  "quality": "medium",
  "motion": {
    "enabled": false,
    "strength": 3
  },
  "music": "video-creation/music/dramatic_cinematic_score.mp3",
  "duration": "5s",
  "aspect_ratio": "16:9",
  "negative_prompt": "blur, distort, and low quality",
  "cfg_scale": 0.5,
  "imageDataUrl": "<string>",
  "imageAttachmentId": "<string>",
  "prompt_optimizer": true,
  "num_inference_steps": 30,
  "pro_mode": false,
  "resolution": "720p",
  "num_frames": 123,
  "frames_per_second": 16,
  "seed": 123,
  "enable_safety_checker": true,
  "showExplicitContent": false,
  "enable_prompt_expansion": true,
  "acceleration": true,
  "shift": 123,
  "age_slider": 18,
  "audioEnabled": false,
  "video_quality": "Standard",
  "aspect": "Portrait"
}'
{
  "runId": "<string>",
  "projectId": "<string>",
  "status": "pending",
  "model": "<string>",
  "cost": 123,
  "paymentSource": "<string>",
  "remainingBalance": 123
}
Image-conditioned models accept either imageDataUrl (base64) or a public imageUrl. The service uses the explicit value you provide before checking any saved attachments.

Overview

POST /generate-video submits an asynchronous job to create, extend, or edit a video with one of NanoGPT’s provider integrations. The endpoint responds immediately with runId, model, and status: "pending". Poll the Video Status endpoint with that runId until you receive final assets. Duration-based billing is assessed after completion; align any pricing tables with lib/credits/videoPricingConfig.ts. Provider errors include descriptive JSON payloads. Surface the error.message (and HTTP status) to help users correct content-policy or validation issues.

Request Schema

Only include the fields required by your chosen model. Unknown keys are ignored, but some providers fail when extra media fields are present.

Core Fields

fieldtyperequireddetails
modelstringyesSee Model Directory for the complete list. Use longstories-movie / longstories-pixel-art; promptchan-video has been removed.
promptstringconditionalRequired for text-to-video and edit models unless a structured script is supplied.
negative_promptstringnoSuppresses specific content. Respected by Veo, Wan, Runway, Pixverse, and other models noted below.
scriptstringconditionalLongStories models accept full scripts instead of relying on prompt.
storyConfigobjectconditionalLongStories structured payload (e.g. scenes, narration, voice).
durationstringconditionalSeconds as a string ("5", "8", "60"). Limits vary per model; see individual entries.
secondsstringconditionalSora-specific duration selector ("4", "8", "12").
aspect_ratiostringconditionalProvider-specific ratios such as 16:9, 9:16, 1:1, 3:4, 4:3, 21:9, auto.
orientationstringconditionallandscape or portrait for Sora and Wan text/image flows.
resolutionstringconditionalResolution tokens (480p, 580p, 720p, 1080p, 1792x1024, 2k, 4k).
sizestringconditionalProvider preset for vidu-video, pixverse-*, wan-wavespeed-22-plus, wan-wavespeed-25, and upscalers.
generateAudiobooleannoAdds AI audio on Veo 3 and Lightricks models. Defaults to false.
enhancePromptbooleannoOptional Veo 3 prompt optimizer. Defaults to false.
pro_mode / probooleannoHigh-quality toggle for Sora and Hunyuan families. Defaults to false.
enable_prompt_expansionbooleannoPrompt booster for Wan/Seedance/Minimax variants. Disabled by default.
enable_safety_checkerbooleannoWan 2.2 Turbo safety switch. Defaults to provider configuration.
camera_fix / camera_fixed / cameraFixedbooleannoLocks the virtual camera for Seedance and Wan variants.
seednumber or stringnoDeterministic seed when supported (Veo, Wan, Pixverse).
voice_idstringconditionalRequired by kling-lipsync-t2v.
voice_languagestringconditionalen or zh for kling-lipsync-t2v.
voice_speednumberconditionalRange 0.8-2.0 for kling-lipsync-t2v.
videoDuration / billedDurationnumbernoOptional overrides for upscaler billing calculations.
adjust_fps_for_interpolationbooleannoOptional toggle for interpolation-aware upscaling. Defaults to false.

Media Inputs

fieldtyperequireddetails
imageDataUrlstringconditionalBase64-encoded data URL. Recommended for private assets or files larger than 4 MB.
imageUrlstringconditionalHTTPS link to a source image.
imagestringconditionalSome Wavespeed/ByteDance providers expect this property instead of imageUrl.
reference_imagestringconditionalOptional still image guiding runwayml-gen4-aleph.
audioDataUrlstringconditionalBase64 data URL for audio-driven models.
audioUrlstringconditionalHTTPS audio input.
audiostringconditionalAlternate audio field accepted by ByteDance and Kling lipsync providers.
videostringconditionalHTTPS link or data URL to the source video (edit, extend, upscaler, or lipsync jobs).
videoUrlstringconditionalAlias accepted by select providers.
swapImagestringconditionalRequired by magicapi-video-face-swap.
targetVideostringconditionalRequired by magicapi-video-face-swap.
targetFaceIndexnumbernoOptional face index for MagicAPI swaps.
Provide only the media fields that your target model expects. Extra media inputs often trigger provider validation errors.

Advanced Controls

fieldtypemodels
num_framesintegerWan 2.2 families, Seedance 22 5B, Wan image-to-video.
frames_per_secondintegerWan 2.2 5B.
num_inference_stepsintegerWan 2.2 families.
guidance_scalenumberWan 2.2 5B.
shiftnumberWan 2.2 5B.
interpolator_modelstringWan 2.2 5B.
num_interpolated_framesintegerWan 2.2 5B.
movementAmplitudestringvidu-video (auto, small, medium, large).
motionstringmidjourney-video (low, high).
stylestringvidu-video (general, anime), pixverse-* (various presets).
effectType, effect, cameraMovement, motionMode, soundEffectSwitch, soundEffectPromptvariesPixverse v4.5/v5.
modestringwan-wavespeed-22-animate (animate, replace).
prompt_optimizerbooleanminimax-hailuo-02, minimax-hailuo-02-pro.

Model Directory

Model strings are grouped by upstream provider. Each row lists the required media inputs and notable request fields. Code references point to the backend implementation.

Core & LongStories Models

modelinput typeskey request fieldscode ref
kling-videotext-to-videoprompt, optional negative_prompt, duration (defaults to 5)lib/modelProviders/videoModels.ts
kling-video-v2text/image-to-videoprompt, optional imageDataUrl/imageUrl, duration (5, 10), Kling guidance handled server-sidelib/modelProviders/videoModels.ts
veo2-videotext/image-to-videoprompt, duration (5s-30s), aspect_ratio (16:9, 9:16, 1:1, 4:3, 3:4), optional negative_prompt, seedlib/modelProviders/videoModels.ts
minimax-videotext/image-to-videoduration (6, 10), optional enable_prompt_expansionlib/modelProviders/videoModels.ts
hunyuan-videotext-to-videopro (boolean), resolution (480p, 720p, 1080p), num_frames, num_inference_steps, optional negative_promptlib/modelProviders/videoModels.ts
hunyuan-video-image-to-videoimage-to-videoRequires imageDataUrl/imageUrl, supports pro, resolution, num_frames, num_inference_stepslib/modelProviders/videoModels.ts
wan-video-image-to-videoimage-to-videoRequires imageDataUrl/imageUrl, optional prompt, num_frames, frames_per_second, resolution, negative_prompt, seedlib/modelProviders/videoModels.ts
kling-v21-standardimage-to-videoRequires imageDataUrl/imageUrl, duration (5), optional negative_prompt, seedlib/modelProviders/runwareVideo.ts
kling-v21-protext/image-to-videoduration (5, 10), optional negative_prompt, seedlib/modelProviders/runwareVideo.ts
kling-v21-mastertext/image-to-videoduration (5 default), negative_prompt, seed, Runware tuning parameterslib/modelProviders/runwareVideo.ts
longstories-moviescripted generationduration (30, 60, 180, 300, 600), accepts script/storyConfig payloads (use duration instead of legacy targetLengthInWords)lib/modelProviders/longstoriesModel.ts
longstories-pixel-artscripted generationduration (15, 30, 60, 180, 300, 600), same structured payloads as longstories-movielib/modelProviders/longstoriesPixelArtModel.ts

Wavespeed & Partner Models

modelinput typeskey request fieldscode ref
sora-2text-to-video, image-to-videopro_mode (boolean), resolution (720p, 1792x1024), orientation (landscape, portrait), seconds (4, 8, 12), optional imageDataUrl/imageUrllib/modelProviders/sora2Video.ts
veo3-1-videotext-to-video, image-to-videoduration (4, 6, 8), resolution (720p, 1080p), aspect_ratio (16:9, 9:16), generateAudio (default false), optional negative_prompt, seedlib/modelProviders/wavespeedVeo31.ts
wan-wavespeed-s2vimage + audio → videoRequires image + audio (URL or data URI), optional prompt, resolution (480p, 720p)lib/modelProviders/wavespeedWanS2V.ts
veed-fabric-1.0image + audio → talking headRequires imageDataUrl/imageUrl + audioDataUrl/audioUrl, resolution (480p, 720p)lib/modelProviders/wavespeedVeedFabric.ts, app/api/generate-video/route.ts
bytedance-avatar-omni-human-1.5image + audio → avatarRequires image + audio, duration derived from audio, optional promptlib/modelProviders/videoModels.ts:1115
kling-lipsync-t2vfocal video + scriptRequires video URL, prompt/text, voice_id, voice_language (en, zh), voice_speed (0.8-2.0)lib/modelProviders/wavespeedKlingLipsync.ts, lib/modelProviders/videoModels.ts:881
kling-lipsync-a2vfocal video + audioRequires video + audio URLs, 2-10 s clip lengthlib/modelProviders/wavespeedKlingLipsync.ts, lib/modelProviders/videoModels.ts:938
wan-wavespeed-video-editvideo editRequires source video, text prompt, resolution (480p, 720p)lib/modelProviders/videoModels.ts:1075
wan-wavespeed-22-spicy-extendvideo extendRequires source video, resolution (480p, 720p), duration (5, 8)lib/modelProviders/videoModels.ts:1094
wan-wavespeed-22-plustext/image generativeresolution (480p, 720p, 1080p), orientation (text-only), fixed 5 s clips, optional enable_prompt_expansionlib/modelProviders/videoModels.ts:1024, lib/modelProviders/wavespeedWan22Plus.ts
wan-wavespeed-22-animatereference image + driver videoRequires image + driver video, resolution (480p, 720p), mode (animate, replace)lib/modelProviders/videoModels.ts:1602
wan-wavespeed-25text/image generativeresolution (480p, 720p, 1080p via resolution or size), duration (5, 10), optional enable_prompt_expansionlib/modelProviders/videoModels.ts:1205
lightricks-ltx-2-fasttext/image, optional audioduration (6-20 s), generateAudio togglelib/modelProviders/videoModels.ts:1638
lightricks-ltx-2-protext/image, optional audioduration (6, 8, 10 s), generateAudio togglelib/modelProviders/videoModels.ts:1664
runwayml-gen4-alephvideo-to-video (+ optional reference image)Requires video, optional reference_image, aspect_ratio (16:9, 4:3, 1:1, 3:4, 9:16, auto)lib/modelProviders/wavespeedRunwayGen4Aleph.ts
video-upscalervideo-to-videoRequires video, target_resolution (720p, 1080p, 2k, 4k); optional videoDuration / billedDurationlib/modelProviders/videoModels.ts:1186, app/api/generate-video/route.ts
bytedance-seedance-upscalervideo-to-videoRequires video, target_resolution (1080p, 2k, 4k)lib/modelProviders/videoModels.ts:1197
bytedance-waver-1.0image-to-videoRequires image, fixed 5 s durationlib/modelProviders/videoModels.ts:1231
bytedance-seedance-v1-pro-fasttext/imageresolution (480p, 720p, 1080p), duration (2-12 s), aspect_ratio (16:9, 9:16, 1:1, 3:4, 4:3, 21:9), camera_fixedlib/modelProviders/videoModels.ts:1260
kling-v25-turbo-protext/imageduration (5, 10), aspect_ratio (16:9, 9:16, 1:1)lib/modelProviders/videoModels.ts:1344
kling-v25-turbo-stdimage-to-videoRequires image, duration (5, 10); Kling guidance handled within the Runware providerlib/modelProviders/videoModels.ts:1391, lib/modelProviders/runwareVideo.ts
minimax-hailuo-23-standardtext/imageduration (6, 10), optional enable_prompt_expansionlib/modelProviders/wavespeedMinimaxHailuo23Standard.ts
minimax-hailuo-23-protext/imageFixed 5 s clips, optional enable_prompt_expansionlib/modelProviders/wavespeedMinimaxHailuo23Pro.ts

FAL-hosted Models

modelinput typeskey request fieldscode ref
veo3-videotext/imagegenerateAudio, enhancePrompt, aspect_ratio (16:9), duration (5s-8s), resolution (720p, 1080p), optional negative_prompt, seedlib/modelProviders/veo3Video.ts
veo3-fast-videotext/imagegenerateAudio, enhancePrompt, aspect_ratio (16:9, 9:16), resolution (720p, 1080p)lib/modelProviders/veo3FastVideo.ts
veo2-video-image-to-videoimage-to-videoRequires imageDataUrl/imageUrl, aspect_ratio (16:9, 9:16), duration (5s-8s)lib/modelProviders/veo2VideoImageToVideo.ts
wan-video-22text/imageresolution (480p, 720p), orientation (landscape, portrait), duration (5, 8), optional enable_prompt_expansionlib/modelProviders/wanVideo22.ts
wan-video-22-5btext/imagenegative_prompt, num_frames (81-121), frames_per_second (4-60), resolution (580p, 720p), aspect_ratio, num_inference_steps, enable_safety_checker, enable_prompt_expansion, guidance_scale, shift, interpolator_model, num_interpolated_frameslib/modelProviders/wanVideo22-5b.ts
wan-video-22-turbotext/imageresolution (480p, 580p, 720p), aspect_ratio (auto, 16:9, 9:16, 1:1), enable_safety_checker, enable_prompt_expansion, optional seedlib/modelProviders/wanVideo22Turbo.ts
seedance-videotext/imageresolution (480p, 720p), aspect_ratio (16:9, 1:1, 3:4, 9:16, 21:9, 9:21), duration (5, 10), camera_fixlib/modelProviders/seedanceVideo.ts
seedance-lite-videotext/imageDefault resolution 720p, aspect_ratio 16:9, duration 5; supports camera_fixed, optional seedlib/modelProviders/falSeedanceLiteVideo.ts, app/api/generate-video/route.ts
minimax-hailuo-02text/imageduration (6, 10), prompt_optimizer togglelib/modelProviders/minimaxHailuoVideo02.ts
minimax-hailuo-02-protext/imageFixed duration 6, prompt_optimizer togglelib/modelProviders/minimaxHailuoVideo02Pro.ts

Runware-backed Models

modelinput typeskey request fieldscode ref
kling-v21-mastertext/image-to-videoDocumented above; Runware-specific tuning parameters are auto-populatedlib/modelProviders/runwareVideo.ts, lib/modelProviders/videoModels.ts:369
veo3-fast-videotext/imageSame payload as the FAL variant; this route uses Runware infrastructurelib/modelProviders/veo3FastVideo.ts
vidu-videotext/imagesize (16:9 / 1920×1080), style (general, anime), movementAmplitude (auto, small, medium, large), fixed 5 s durationlib/modelProviders/viduVideo.ts
pixverse-v45text/imagesize presets (multiple aspect ratios), duration (5, 8), style, effectType, effect, cameraMovement, motionMode, soundEffectSwitch, soundEffectPromptlib/modelProviders/pixverseVideo.ts
pixverse-v5text/imageSame field set as v4.5 with updated defaultslib/modelProviders/pixverseVideoV5.ts

Doubao / ByteDance (direct)

These models are also available outside Wavespeed; their field requirements mirror the entries above.
modelinput typesnotescode ref
bytedance-avatar-omni-human-1.5image + audio → avatarRequires image + audio, duration derived from audio length, optional promptlib/modelProviders/videoModels.ts:1115
bytedance-waver-1.0image-to-videoFixed 5 s duration, NSFW content may be filteredlib/modelProviders/videoModels.ts:1231
bytedance-seedance-v1-pro-fasttext/imageresolution (480p, 720p, 1080p), duration (2-12 s), aspect_ratio (16:9, 9:16, 1:1, 3:4, 4:3, 21:9), camera_fixedlib/modelProviders/videoModels.ts:1260

Other Providers

modelinput typeskey request fieldscode ref
midjourney-videoimage-to-videoRequires image (imageDataUrl or library asset), motion (low, high)lib/modelProviders/midjourneyVideo.ts
magicapi-video-face-swapface swapRequires swapImage, targetVideo, optional targetFaceIndex; supports videos ≤ 4 min; pricing varies by resolutionlib/modelProviders/magicapivideoModel.ts, app/api/generate-video/route.ts

Async Processing & Status Polling

  • The submission response includes { runId, model, status: "pending" }.
  • Poll /video-status with the returned runId until the job reaches status: "succeeded" or status: "failed".
  • Many providers emit intermediate states (queued, processing, generating, delivering). Persist them if you need audit trails.
  • Failed jobs include an error object mirroring the upstream provider response. Surface the message and adjust prompts or inputs before retrying.
  • Duration and resolution determine credit usage; reconcile charges against lib/credits/videoPricingConfig.ts.

Content & Safety Notes

Wan 2.2 Turbo, Veo 3, Kling, and other providers may block prompts that violate content policies. Non-200 responses describe the violation reason; relay these messages verbatim to users or implement automated prompt adjustments.

Next Steps

  • Poll the Video Status endpoint after every submission to retrieve final assets.
  • Keep customer-facing pricing tables in sync with lib/credits/videoPricingConfig.ts.
  • Remove any external references to promptchan-video; the provider is disabled in code.

Authorizations

x-api-key
string
header
required

Body

application/json

Parameters for video generation across different models and providers

model
enum<string>
default:longstories
required

The video model to use for generation

Available options:
longstories,
longstories-kids,
kling-video,
kling-video-v2,
veo2-video,
minimax-video,
hunyuan-video,
hunyuan-video-image-to-video,
wan-video-image-to-video,
kling-v21-standard,
kling-v21-pro,
kling-v21-master,
promptchan-video
prompt
string

Text prompt describing the video to generate

Example:

"A serene lake at sunset with gentle ripples on the water"

script
string

Fully-written script for LongStories models (takes precedence over prompt)

conversationUUID
string

UUID for conversation tracking

projectId
string

Project identifier for LongStories models

framework
enum<string>
default:default

Story framework for LongStories models

Available options:
default,
emotional_story,
product_showcase,
tutorial
shortRequestEnhancer
boolean
default:false

Smart Enhancement: if true, automatically choose better framework and add Director Notes if necessary

targetLengthInWords
integer
default:70

Target length in words for LongStories models (legacy parameter)

targetLengthInSeconds
integer

Target length in seconds (alternative to words)

directorNotes
string

Prompt for the image generation engine (LongStories). Example: 'Warm lighting' or 'Make the first image very impactful'

Example:

"Warm, cozy lighting with focus on people interacting"

aspectRatio
enum<string>
default:9:16

Video aspect ratio for LongStories

Available options:
9:16,
16:9
scriptConfig
object

Script generation configuration for LongStories

imageConfig
object

Image generation configuration for LongStories

videoConfig
object

Video generation configuration for LongStories

voiceoverConfig
object

Voiceover configuration for LongStories

captionsConfig
object

Captions configuration for LongStories

effectsConfig
object

Effects configuration for LongStories

musicConfig
object

Music configuration for LongStories

voice
string

Legacy: Voice ID for narration (use voiceoverConfig.voiceId instead)

Example:

"pNInz6obpgDQGcFmaJgB"

captionsShow
boolean
default:true

Legacy: Whether to show captions (use captionsConfig.captionsEnabled instead)

captionsStyle
enum<string>
default:default

Legacy: Style for captions (use captionsConfig.captionsStyle instead)

Available options:
default,
minimal,
neon,
cinematic,
fancy,
tiktok,
highlight,
gradient,
instagram,
vida,
manuscripts
effects
object

Legacy: Video effects configuration (use effectsConfig instead)

quality
enum<string>
default:medium

Legacy: Video quality (handled by videoConfig now)

Available options:
low,
medium,
high
motion
object

Legacy: Motion configuration (handled by videoConfig now)

music
string

Legacy: Music track (use musicConfig instead)

Example:

"video-creation/music/dramatic_cinematic_score.mp3"

duration

Video duration (format varies by model - '5s' for Veo2, '5' for Kling, etc.)

Example:

"5s"

aspect_ratio
enum<string>
default:16:9

Aspect ratio for FAL models

Available options:
16:9,
9:16,
1:1,
4:3,
3:4
negative_prompt
string

Negative prompt to avoid certain elements

Example:

"blur, distort, and low quality"

cfg_scale
number
default:0.5

Classifier-free guidance scale

Required range: 0 <= x <= 20
imageDataUrl
string

Base64 data URL of input image for image-to-video models

imageAttachmentId
string

Library attachment ID for input image

prompt_optimizer
boolean
default:true

Whether to optimize the prompt (MiniMax model)

num_inference_steps
integer
default:30

Number of inference steps

Required range: 1 <= x <= 50
pro_mode
boolean
default:false

Enable pro mode for Hunyuan Video

resolution
enum<string>
default:720p

Video resolution

Available options:
720p,
1080p,
540p
num_frames
default:81

Number of frames to generate

frames_per_second
integer
default:16

Frames per second

Required range: 5 <= x <= 24
seed
integer

Random seed for reproducible results

enable_safety_checker
boolean
default:true

Enable safety content filtering

showExplicitContent
boolean
default:false

Allow explicit content (inverse of safety checker)

enable_prompt_expansion
boolean

Enable automatic prompt expansion

acceleration
boolean

Enable acceleration for faster processing

shift
number

Shift parameter for certain models

age_slider
integer
default:18

Age setting for PromptChan model

Required range: 18 <= x <= 60
audioEnabled
boolean
default:false

Enable audio for PromptChan model

video_quality
enum<string>
default:Standard

Video quality for PromptChan model

Available options:
Standard,
High
aspect
enum<string>
default:Portrait

Aspect setting for PromptChan model

Available options:
Portrait,
Landscape,
Square

Response

Video generation request submitted successfully (asynchronous processing)

runId
string
required

Unique identifier for the video generation request

status
enum<string>
default:pending
required

Current status of the generation

Available options:
pending,
processing,
completed,
failed
model
string
required

The model used for generation

projectId
string

Project identifier (for LongStories models)

cost
number

Cost of the video generation

paymentSource
string

Payment source used (USD or XNO)

remainingBalance
number

Remaining balance after the generation