Skip to main content
Case: Automatically Creating Voiceover for Silent Footage of the Underwater World
Open In Colab

Overview

Voiceovers are the secret sauce that turns silent footage into captivating stories. They add depth, emotion, and excitement, elevating the viewing experience. Traditionally, this workflow required stitching together multiple tools: one for script writing (LLM), one for voice generation (TTS), and another for video editing. VideoDB simplifies this by bringing everything under one roof. In this tutorial, we will:
  1. Upload a silent video.
  2. Analyze the video to understand its visual content.
  3. Generate a narration script using VideoDB’s text generation.
  4. Generate a professional AI voiceover using VideoDB’s voice generation.
  5. Merge them instantly into a final video.

Setup

Installing VideoDB

!pip install videodb

API Keys

You only need your VideoDB API Key. Get your API key from VideoDB Console. Free for first 50 uploads, no credit card required.

Implementation

Step 1: Connect to VideoDB

Connect to VideoDB using your API key to establish a session.
import videodb

# Set your API key
api_key = "your_api_key"

# Connect to VideoDB
conn = videodb.connect(api_key=api_key)
coll = conn.get_collection()

Step 2: Upload Video

We’ll upload the silent underwater footage directly from YouTube.
# Upload a video by URL
video = coll.upload(url='https://youtu.be/RcRjY5kzia8')

Step 3: Analyze Visuals

We need to know what is happening in the video to write a script for it. We’ll use index_scenes() to analyze the visual content.
video_scenes_id = video.index_scenes()
Let’s view the description of first scene from the video
video_scenes = video.get_scene_index(video_scenes_id)

import json
print(json.dumps(video_scenes[0], indent=2))
Output:
{
  "description": "The scene immerses the viewer in a vibrant, fluid expanse dominated by myriad blue and aqua forms. These countless, somewhat irregular shapes are densely packed, giving the impression of an immense, teeming mass in constant, gentle motion. Each form possesses a darker core that gradually lightens towards its edges, creating a translucent, almost glowing effect, as if illuminated from within. The varying shades, ranging from deep sapphire to brilliant turquoise, blend and shift across the frame, conjuring the image of a vast underwater environment. It evokes a colossal school of luminous marine creatures, perhaps fish or jellyfish, drifting together in a mesmerizing, organic dance, filling the visual field with their shimmering presence and dynamic, watery energy.",
  "end": 15.033,
  "metadata": {},
  "scene_metadata": {},
  "start": 0.0
}

Step 4: Generate Script

Now, we use VideoDB’s generate_text method to write a voiceover script based on the scene descriptions we just retrieved.
# Construct a prompt with the scene context
scene_context = "\n".join([f"- {scene['description']}" for scene in video_scenes])

prompt = f"""
Here is a visual description of a video about the underwater world:
{scene_context}

Based on this, write a short, engaging voiceover script in the style of a nature documentary narrator (like David Attenborough).
Keep it synced to the flow of the visuals described.
Return ONLY the raw text of the narration, no stage directions or titles.
"""

# Generate the script using VideoDB
script_response = coll.generate_text(
    prompt=prompt,
    model_name="pro")

print("--- Generated Script ---")
print(script_response)

Step 5: Generate Voiceover Audio

We can now turn that text into speech using generate_voice. This returns an Audio object directly, so we don’t need to save or upload files manually.
# Generate speech directly as a VideoDB Audio Asset
audio = coll.generate_voice(
    text=script_response,
    voice_name="Default")

print(f"Generated Audio Asset ID: {audio.id}")

Step 6: Compose the Video

We have the video and the generated voiceover. Now we merge them using the Timeline Editor.
from videodb.editor import Timeline, Track, Clip, VideoAsset, AudioAsset

# Create a timeline
timeline = Timeline(conn)

# 1. Create a Video Track
video_track = Track()
video_asset = VideoAsset(id=video.id)
# Add the video clip
video_clip = Clip(asset=video_asset, duration=float(video.length))
video_track.add_clip(0, video_clip)

# 2. Create an Audio Track for the voiceover
audio_track = Track()
# Use the audio object we generated in Step 5
audio_asset = AudioAsset(id=audio.id)
audio_clip = Clip(asset=audio_asset, duration=float(audio.length))
audio_track.add_clip(0, audio_clip)

# Add tracks to timeline
timeline.add_track(video_track)
timeline.add_track(audio_track)

Step 7: Review and Share

Generate the final stream URL and watch your AI-narrated video!
from videodb import play_stream

stream_url = timeline.generate_stream()
play_stream(stream_url)
Output:

Conclusion

Congratulations! You have successfully automated the process of creating custom and personalized voiceovers based on a simple prompt and raw video footage using VideoDB. By leveraging advanced AI technologies, you can enhance the storytelling and immersive experience of your video content. Experiment with different prompts and scene analysis techniques to further improve the quality and accuracy of the voiceovers. Enjoy creating captivating narratives with AI-powered voiceovers using VideoDB!

Explore Full Notebook

Open the complete implementation in Google Colab with all code examples.