Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • How Accurate is Your Search?
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Eleven Labs x VideoDB: Adding AI Generated voiceovers to silent footage
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • AI Generated Ad Films for Product Videography: Wellsaid, Open AI & VideoDB
      • Fun with Keyword Search
      • AWS Rekognition and VideoDB - Effortlessly Remove Inappropriate Content from Video
      • Overlay a Word-Counter on Video Stream
      • Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB
    • Visual Search and Indexing
      • Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Video Scene Indexing
    • Multimodal Search
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Real‑Time Video Pipeline
      • Automated Traffic Violation Reporter
    • Meeting Recording SDK
    • icon picker
      Generative Media Quickstart
      • Generative Media Pricing
    • AI Video Editing Automation SDK
      • Fit & Position: Aspect Ratio Control
      • Trimming vs Timing: Two Independent Timelines
      • Advanced Clip Control: The Composition Layer
      • Caption & Subtitles: Auto-Generated Speech Synchronization
      • Notebooks
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Setup Director Locally
    • github
      Open Source Tools
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • zapier
      Zapier Integration
      • Auto-Dub Videos & Save to Google Drive
      • Create & Add Intelligent Video Highlights to Notion
      • Create GenAI Video Engine - Notion Ideas to Youtube
      • Automatically Detect Profanity in Videos with AI - Update on Slack
      • Generate and Store YouTube Video Summaries in Notion
      • Automate Subtitle Generation for Video Libraries
      • Solve customers queries with Video Answers
    • n8n
      N8N Workflows
      • AI-Powered Meeting Intelligence: Recording to Insights Automation
      • AI Powered Dubbing Workflow for Video Content
      • Automate Subtitle Generation for Video Libraries
      • Automate Interview Evaluations with AI
      • Turn Meeting Recordings into Actionable Summaries
      • Auto-Sync Sales Calls to HubSpot CRM with AI
      • Instant Notion Summaries for Your Youtube Playlist
    • mcp
      VideoDB MCP Server
    • Edge of Knowledge
      • Building Intelligent Machines
        • Part 1 - Define Intelligence
        • Part 2 - Observe and Respond
        • Part 3 - Training a Model
      • Society of Machines
        • Society of Machines
        • Autonomy - Do we have the choice?
        • Emergence - An Intelligence of the collective
      • From Language Models to World Models: The Next Frontier in AI
      • The Future Series
      • How VideoDB Solves Complex Visual Analysis Tasks
    • videodb
      Building World's First Video Database
      • Multimedia: From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Misalignment of Today's Web
      • Beyond Traditional Video Infrastructure
      • Research Grants
    • Customer Love
    • Team
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

Generative Media Quickstart

Welcome! This guide walks developers through the fastest path to creating images, music, sound effects, voices, and short video clips using the VideoDB Python SDK. It also covers transcript translation, automated dubbing, and YouTube search utilities.
Audience: Python developers who already have a VideoDB account and want to add generative features for their media workflows or localization for an application.

1. Installation

pip install --upgrade videodb
The SDK supports Python ≥ 3.8 on Linux, macOS, and Windows.

2. Authentication & First Collection

import videodb

API_KEY = "YOUR_API_KEY" # ▶️ Replace with the key from https://console.videodb.io
conn = videodb.connect(api_key=API_KEY)

# The default collection to store assets
coll = conn.get_collection()
print("Connected to collection:", coll.id)

If your organisation uses multiple collections, you can pass a collection_id argument instead of calling get_collection().

At‑a‑Glance Cheat Sheet
# Image
coll.generate_image(prompt, aspect_ratio='1:1', callback_url=None)
# Music
coll.generate_music(prompt, duration=5, callback_url=None)
# SFX
coll.generate_sound_effect(prompt, duration=2, config={}, callback_url=None)
# Voice
coll.generate_voice(text, voice_name='Default', config={}, callback_url=None)
# Dub
coll.dub_video(video_id, language_code, callback_url=None)
# Video
coll.generate_video(prompt, duration=5, callback_url=None)
# YouTube
conn.youtube_search(query, result_threshold=10, duration='medium')
# Translate
video.translate_transcript(language, additional_notes='', callback_url=None)


3. Generative End‑points

Each generative call is asynchronous: the SDK returns an asset object ( Audio, Video, Image) immediately.
Call .generate_url() (or .play() for video) to fetch the finished file.
Optionally supply a callback_url to receive a webhook when rendering completes.

generate_image()
Parameter
Type
Required
Default
Notes
prompt
str
Yes
Text description of the desired image.
aspect_ratio
Literal['1:1','9:16','16:9','4:3','3:4'] | None
No
'1:1'
Any other ratio raises ValueError.
callback_url
str | None
No
None
POSTed JSON when ready.
There are no rows in this table
# returns image object
image = coll.generate_image(
prompt="Green neon jellyfish photography",
aspect_ratio="9:16",
)
print(image.generate_url())


generate_music()
Parameter
Type
Required
Default
Notes
prompt
str
Yes
Musical style & mood.
duration
int
No
5
Total seconds. Values <1 or >300 raise ValueError.
callback_url
str | None
No
None
There are no rows in this table
# returns Audio object
music = coll.generate_music(prompt="Upbeat electronic background", duration=10)


generate_sound_effect()
Parameter
Type
Required
Default
Notes
prompt
str
Yes
duration
int
No
2 second
Short SFX ≤30 s recommended.
config
dict
No
{}
Model‑specific options such as prompt_influence.
callback_url
str | None
No
None
There are no rows in this table
generate_voice() (Text‑to‑Speech)
Parameter
Type
Required
Default
Notes
text
str
Yes
Up to 5 000 characters.
voice_name
str
No
'Default'
config
dict
No
{}
Provider‑specific keys such as stability, style, similarity_boost.
callback_url
str | None
No
None
There are no rows in this table
Config Parameters
"stability": 0.0, # Lower = more emotional variation; higher = more monotone
"similarity_boost": 1.0, # Higher = closer match to original voice
"style": 0.0 # Higher = more exaggerated speaking style

Voice catalogue (built‑in presets)

Name
Voice Style
Accent
Gender
Aria
Expressive
American
Female
Roger
Confident
American
Male
Sarah
Soft
American
Young Female
Laura
Upbeat
American
Young Female
Charlie
Natural
Australian
Male
George
Warm
British
Middle-aged Male
Callum
Intense
Transatlantic
Male
River
Confident
American
Non-binary
Liam
Articulate
American
Young Male
Charlotte
Seductive
Swedish
Young Female
Alice
Confident
British
Middle-aged Female
Matilda
Friendly
American
Middle-aged Female
Will
Friendly
American
Young Male
Jessica
Expressive
American
Young Female
Eric
Friendly
American
Middle-aged Male
Chris
Casual
American
Middle-aged Male
Brian
Deep
American
Middle-aged Male
Daniel
Authoritative
British
Middle-aged Male
Lily
Warm
British
Middle-aged Female
Bill
Trustworthy
American
Old Male
There are no rows in this table
generate_video()
Parameter
Type
Required
Default
Notes
prompt
str
Yes
duration
int
No
5
Must be 5 ‑ 8 s inclusive. Invalid values raise ValueError.
callback_url
str | None
No
None
There are no rows in this table
# returns a video object
clip = coll.generate_video(prompt="Cinematic lion close‑up", duration=7)
clip.play()

4. Dub video

Dub video into the language you provide. Returns a new video object that can be used for downstream tasks.
dubbed = coll.dub_video(video_id=video.id, language_code="hi")
dubbed.play()


dub_video() parameters
Parameter
Type
Required
Default
Notes
video_id
str
Yes
Must belong to caller's collection.
language_code
str
Yes
ISO 639‑1. Supported languages listed in docs.
callback_url
str | None
No
None
There are no rows in this table

5. YouTube Utilities

Search youtube directly from your python SDK.
results = conn.youtube_search(
query="learn python programming",
result_threshold=3,
duration="long"
)
youtube_search()
Parameter
Type
Required
Default
Notes
query
str
Yes
result_threshold
int | None
No
10
Max results. None returns all.
duration
str
No
'medium'
Duration filter:
short | medium | long
There are no rows in this table
youtube_search() returns a list of dicts containing at minimum title and link keys.

6. Transcript Translation

Upload
video = coll.upload(url="https://youtu.be/…")
video.play()
Index spoken words (required once)
video.index_spoken_words()
Translate transcript
fr_text = video.translate_transcript(language="fr")


translate_transcript() parameters
Parameter
Type
Required
Default
Notes
language
str
Yes
ISO 639‑1 code.
additional_notes
str
No
""
Style guidance for the model.
callback_url
str | None
No
None
There are no rows in this table


Callback Workflow (Optional)

All generative calls accept callback_url. VideoDB will POST a JSON payload when processing finishes:
{
"asset_id": "abc123",
"status": "completed",
"url": "https://cdn.videodb.io/..."
}

Error Handling & Webhooks

All generative methods can raise ValueError, VideoDBAPIError, or VideoDBRateLimitError.
400 – invalid parameters
401 – bad API key
429 – rate limit (check Retry-After header)

Next Steps

Checkout to create powerful video editing and creation workflow automations.
Checkout on our open source VideoDB Director framework for inspiration.
🎥 Must-See Tutorials: Check out these powerful demos and see GenAI integration in action:
We have some initial usage limits—DM us if you’d like additional access to fully explore before making your decision.
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.