Case: Automatically Creating Voiceover for Silent Footage of the Underwater World
Overview
Voiceovers are the secret sauce that turns silent footage into captivating stories. They add depth, emotion, and excitement, elevating the viewing experience.
Traditionally, this workflow required stitching together multiple tools: one for script writing (LLM), one for voice generation (TTS), and another for video editing.
VideoDB simplifies this by bringing everything under one roof. In this tutorial, we will:
Analyze the video to understand its visual content. Generate a narration script using VideoDB’s text generation. Generate a professional AI voiceover using VideoDB’s voice generation. Merge them instantly into a final video. Setup
📦 Installing VideoDB
🔑 API Keys
You only need your VideoDB API Key.
Get your API key from . (Free for first 50 uploads, No credit card required).
import videodb
import os
from getpass import getpass
# Prompt user for API key securely
api_key = getpass("Please enter your VideoDB API Key: ")
os.environ["VIDEO_DB_API_KEY"] = api_key
Implementation
🌐 Step 1: Connect to VideoDB
Connect to VideoDB using your API key to establish a session.
from videodb import connect
# Connect to VideoDB
conn = connect()
coll = conn.get_collection()
🎥 Step 2: Upload Video
We’ll upload the silent underwater footage directly from YouTube.
# Upload a video by URL
video = coll.upload(url='https://youtu.be/RcRjY5kzia8')
🔍 Step 3: Analyze Visuals
We need to know what is happening in the video to write a script for it. We’ll use index_scenes() to analyze the visual content.
video_scenes_id = video.index_scenes()
Let's view the description of first scene from the video
video_scenes = video.get_scene_index(video_scenes_id)
import json
print(json.dumps(video_scenes[0], indent=2))
Output:
{
"description": "The scene immerses the viewer in a vibrant, fluid expanse dominated by myriad blue and aqua forms. These countless, somewhat irregular shapes are densely packed, giving the impression of an immense, teeming mass in constant, gentle motion. Each form possesses a darker core that gradually lightens towards its edges, creating a translucent, almost glowing effect, as if illuminated from within. The varying shades, ranging from deep sapphire to brilliant turquoise, blend and shift across the frame, conjuring the image of a vast underwater environment. It evokes a colossal school of luminous marine creatures, perhaps fish or jellyfish, drifting together in a mesmerizing, organic dance, filling the visual field with their shimmering presence and dynamic, watery energy.",
"end": 15.033,
"metadata": {},
"scene_metadata": {},
"start": 0.0
}
📝 Step 4: Generate Script
Now, we use VideoDB’s generate_text method to write a voiceover script based on the scene descriptions we just retrieved.
# Construct a prompt with the scene context
scene_context = "\n".join([f"- {scene['description']}" for scene in video_scenes])
prompt = f"""
Here is a visual description of a video about the underwater world:
{scene_context}
Based on this, write a short, engaging voiceover script in the style of a nature documentary narrator (like David Attenborough).
Keep it synced to the flow of the visuals described.
Return ONLY the raw text of the narration, no stage directions or titles.
"""
# Generate the script using VideoDB
script_response = coll.generate_text(
prompt=prompt,
model_name="pro"
)
print("--- Generated Script ---")
print(script_response)
🎙️ Step 5: Generate Voiceover Audio
We can now turn that text into speech using generate_voice. This returns an Audio object directly, so we don’t need to save or upload files manually.
# Generate speech directly as a VideoDB Audio Asset
audio = coll.generate_voice(
text=voiceover_script['output'],
voice_name="Default"
)
print(f"Generated Audio Asset ID: {audio.id}")
🎬 Step 6: Compose the Video
We have the video and the generated voiceover. Now we merge them using the Timeline Editor.
from videodb.editor import Timeline, Track, Clip, VideoAsset, AudioAsset
# Create a timeline
timeline = Timeline(conn)
# 1. Create a Video Track
video_track = Track()
video_asset = VideoAsset(id=video.id)
# Add the video clip
video_clip = Clip(asset=video_asset, duration=float(video.length))
video_track.add_clip(0, video_clip)
# 2. Create an Audio Track for the voiceover
audio_track = Track()
# Use the audio object we generated in Step 5
audio_asset = AudioAsset(id=audio.id)
audio_clip = Clip(asset=audio_asset, duration=float(audio.length))
audio_track.add_clip(0, audio_clip)
# Add tracks to timeline
timeline.add_track(video_track)
timeline.add_track(audio_track)
🪄 Step 7: Review and Share
Generate the final stream URL and watch your AI-narrated video!
from videodb import play_stream
stream_url = timeline.generate_stream()
play_stream(stream_url)
Output:
🎉 Conclusion:
Congratulations! You have successfully automated the process of creating custom and personalized voiceovers based on a simple prompt and raw video footage using VideoDB.
By leveraging advanced AI technologies, you can enhance the storytelling and immersive experience of your video content. Experiment with different prompts and scene analysis techniques to further improve the quality and accuracy of the voiceovers. Enjoy creating captivating narratives with AI-powered voiceovers using VideoDB!