Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
      • How Accurate is Your Search?
    • Visual Search and Indexing
      • Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Visual Indexing
      • How VideoDB Solves Complex Visual Analysis Tasks
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • icon picker
        VideoDB: Adding AI Generated voiceovers to silent footage
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • AI Generated Ad Films for Product Videography
      • Fun with Keyword Search
      • Overlay a Word-Counter on Video Stream
      • Generate Automated Video Outputs with Text Prompts | VideoDB
      • Eleven Labs x VideoDB: Adding AI Generated voiceovers to silent footage
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Multimodal Search
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Automated Traffic Violation Reporter
    • Live Video→ Instant Action
    • Generative Media Quickstart
      • Generative Media Pricing
    • Video Editing Automation
      • Fit & Position: Aspect Ratio Control
      • Trimming vs Timing: Two Independent Timelines
      • Advanced Clip Control: The Composition Layer
      • Caption & Subtitles: Auto-Generated Speech Synchronization
      • Notebooks
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • Setup Director Locally
    • Workflows and Integrations
      • zapier
        Zapier Integration
        • Auto-Dub Videos & Save to Google Drive
        • Create & Add Intelligent Video Highlights to Notion
        • Create GenAI Video Engine - Notion Ideas to Youtube
        • Automatically Detect Profanity in Videos with AI - Update on Slack
        • Generate and Store YouTube Video Summaries in Notion
        • Automate Subtitle Generation for Video Libraries
        • Solve customers queries with Video Answers
      • n8n
        N8N Workflows
        • AI-Powered Meeting Intelligence: Recording to Insights Automation
        • AI Powered Dubbing Workflow for Video Content
        • Automate Subtitle Generation for Video Libraries
        • Automate Interview Evaluations with AI
        • Turn Meeting Recordings into Actionable Summaries
        • Auto-Sync Sales Calls to HubSpot CRM with AI
        • Instant Notion Summaries for Your Youtube Playlist
    • Meeting Recording SDK
    • github
      Open Source
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • mcp
      VideoDB MCP Server
    • videodb
      Give your AI, Eyes and Ears
      • Building Infrastructure that “Sees” and “Edits”
      • Agents with Video Experience
      • From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Beyond Traditional Video Infrastructure
    • Customer Love
    • Join us
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • Edge of Knowledge
        • Language Models to World Models: The Next Frontier in AI
        • Society of Machines
          • Society of Machines
          • Autonomy - Do we have the choice?
          • Emergence - An Intelligence of the collective
        • Building Intelligent Machines
          • Part 1 - Define Intelligence
          • Part 2 - Observe and Respond
          • Part 3 - Training a Model
      • Updates
        • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

VideoDB: Adding AI Generated voiceovers to silent footage

Case: Automatically Creating Voiceover for Silent Footage of the Underwater World

Overview

Voiceovers are the secret sauce that turns silent footage into captivating stories. They add depth, emotion, and excitement, elevating the viewing experience.
Traditionally, this workflow required stitching together multiple tools: one for script writing (LLM), one for voice generation (TTS), and another for video editing.
VideoDB simplifies this by bringing everything under one roof. In this tutorial, we will:
Upload a silent video.
Analyze the video to understand its visual content.
Generate a narration script using VideoDB’s text generation.
Generate a professional AI voiceover using VideoDB’s voice generation.
Merge them instantly into a final video.

Setup

📦 Installing VideoDB

%pip install videodb

🔑 API Keys

You only need your VideoDB API Key.
Get your API key from . (Free for first 50 uploads, No credit card required).
import videodb
import os
from getpass import getpass

# Prompt user for API key securely
api_key = getpass("Please enter your VideoDB API Key: ")
os.environ["VIDEO_DB_API_KEY"] = api_key

Implementation


🌐 Step 1: Connect to VideoDB

Connect to VideoDB using your API key to establish a session.
from videodb import connect

# Connect to VideoDB
conn = connect()
coll = conn.get_collection()

🎥 Step 2: Upload Video

We’ll upload the silent underwater footage directly from YouTube.
# Upload a video by URL
video = coll.upload(url='https://youtu.be/RcRjY5kzia8')

🔍 Step 3: Analyze Visuals

We need to know what is happening in the video to write a script for it. We’ll use index_scenes() to analyze the visual content.
video_scenes_id = video.index_scenes()

Let's view the description of first scene from the video
video_scenes = video.get_scene_index(video_scenes_id)

import json
print(json.dumps(video_scenes[0], indent=2))
Output:
{
"description": "The scene immerses the viewer in a vibrant, fluid expanse dominated by myriad blue and aqua forms. These countless, somewhat irregular shapes are densely packed, giving the impression of an immense, teeming mass in constant, gentle motion. Each form possesses a darker core that gradually lightens towards its edges, creating a translucent, almost glowing effect, as if illuminated from within. The varying shades, ranging from deep sapphire to brilliant turquoise, blend and shift across the frame, conjuring the image of a vast underwater environment. It evokes a colossal school of luminous marine creatures, perhaps fish or jellyfish, drifting together in a mesmerizing, organic dance, filling the visual field with their shimmering presence and dynamic, watery energy.",
"end": 15.033,
"metadata": {},
"scene_metadata": {},
"start": 0.0
}

📝 Step 4: Generate Script

Now, we use VideoDB’s generate_text method to write a voiceover script based on the scene descriptions we just retrieved.
# Construct a prompt with the scene context
scene_context = "\n".join([f"- {scene['description']}" for scene in video_scenes])

prompt = f"""
Here is a visual description of a video about the underwater world:
{scene_context}

Based on this, write a short, engaging voiceover script in the style of a nature documentary narrator (like David Attenborough).
Keep it synced to the flow of the visuals described.
Return ONLY the raw text of the narration, no stage directions or titles.
"""

# Generate the script using VideoDB
script_response = coll.generate_text(
prompt=prompt,
model_name="pro"
)

print("--- Generated Script ---")
print(script_response)

🎙️ Step 5: Generate Voiceover Audio

We can now turn that text into speech using generate_voice. This returns an Audio object directly, so we don’t need to save or upload files manually.
# Generate speech directly as a VideoDB Audio Asset
audio = coll.generate_voice(
text=voiceover_script['output'],
voice_name="Default"
)

print(f"Generated Audio Asset ID: {audio.id}")

🎬 Step 6: Compose the Video

We have the video and the generated voiceover. Now we merge them using the Timeline Editor.
from videodb.editor import Timeline, Track, Clip, VideoAsset, AudioAsset

# Create a timeline
timeline = Timeline(conn)

# 1. Create a Video Track
video_track = Track()
video_asset = VideoAsset(id=video.id)
# Add the video clip
video_clip = Clip(asset=video_asset, duration=float(video.length))
video_track.add_clip(0, video_clip)

# 2. Create an Audio Track for the voiceover
audio_track = Track()
# Use the audio object we generated in Step 5
audio_asset = AudioAsset(id=audio.id)
audio_clip = Clip(asset=audio_asset, duration=float(audio.length))
audio_track.add_clip(0, audio_clip)

# Add tracks to timeline
timeline.add_track(video_track)
timeline.add_track(audio_track)

🪄 Step 7: Review and Share

Generate the final stream URL and watch your AI-narrated video!
from videodb import play_stream

stream_url = timeline.generate_stream()
play_stream(stream_url)

Output:

🎉 Conclusion:

Congratulations! You have successfully automated the process of creating custom and personalized voiceovers based on a simple prompt and raw video footage using VideoDB.
By leveraging advanced AI technologies, you can enhance the storytelling and immersive experience of your video content. Experiment with different prompts and scene analysis techniques to further improve the quality and accuracy of the voiceovers. Enjoy creating captivating narratives with AI-powered voiceovers using VideoDB!

Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.