Voiceovers are the secret sauce that turns silent footage into captivating stories. They add depth, emotion, and excitement, elevating the viewing experience.Traditionally, this workflow required stitching together multiple tools: one for script writing (LLM), one for voice generation (TTS), and another for video editing.VideoDB simplifies this by bringing everything under one roof. In this tutorial, we will:
Upload a silent video.
Analyze the video to understand its visual content.
Generate a narration script using VideoDB’s text generation.
Generate a professional AI voiceover using VideoDB’s voice generation.
{ "description": "The scene immerses the viewer in a vibrant, fluid expanse dominated by myriad blue and aqua forms. These countless, somewhat irregular shapes are densely packed, giving the impression of an immense, teeming mass in constant, gentle motion. Each form possesses a darker core that gradually lightens towards its edges, creating a translucent, almost glowing effect, as if illuminated from within. The varying shades, ranging from deep sapphire to brilliant turquoise, blend and shift across the frame, conjuring the image of a vast underwater environment. It evokes a colossal school of luminous marine creatures, perhaps fish or jellyfish, drifting together in a mesmerizing, organic dance, filling the visual field with their shimmering presence and dynamic, watery energy.", "end": 15.033, "metadata": {}, "scene_metadata": {}, "start": 0.0}
Now, we use VideoDB’s generate_text method to write a voiceover script based on the scene descriptions we just retrieved.
# Construct a prompt with the scene contextscene_context = "\n".join([f"- {scene['description']}" for scene in video_scenes])prompt = f"""Here is a visual description of a video about the underwater world:{scene_context}Based on this, write a short, engaging voiceover script in the style of a nature documentary narrator (like David Attenborough).Keep it synced to the flow of the visuals described.Return ONLY the raw text of the narration, no stage directions or titles."""# Generate the script using VideoDBscript_response = coll.generate_text( prompt=prompt, model_name="pro")script_text = script_response["output"]print("--- Generated Script ---")print(script_text)
We have the video and the generated voiceover. Now we merge them using the Timeline Editor.
from videodb.editor import Timeline, Track, Clip, VideoAsset, AudioAsset# Create a timelinetimeline = Timeline(conn)# 1. Create a Video Trackvideo_track = Track()video_asset = VideoAsset(id=video.id)# Add the video clipvideo_clip = Clip(asset=video_asset, duration=float(video.length))video_track.add_clip(0, video_clip)# 2. Create an Audio Track for the voiceoveraudio_track = Track()# Use the audio object we generated in Step 5audio_asset = AudioAsset(id=audio.id)audio_clip = Clip(asset=audio_asset, duration=float(audio.length))audio_track.add_clip(0, audio_clip)# Add tracks to timelinetimeline.add_track(video_track)timeline.add_track(audio_track)
Congratulations! You have successfully automated the process of creating custom and personalized voiceovers based on a simple prompt and raw video footage using VideoDB.By leveraging advanced AI technologies, you can enhance the storytelling and immersive experience of your video content. Experiment with different prompts and scene analysis techniques to further improve the quality and accuracy of the voiceovers. Enjoy creating captivating narratives with AI-powered voiceovers using VideoDB!
Explore Full Notebook
Open the complete implementation in Google Colab with all code examples.