Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • How Accurate is Your Search?
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
    • Visual Search and Indexing
      • Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Video Scene Indexing
    • Multimodal Search
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Real‑Time Video Pipeline
    • Meeting Recording Agent Quickstart
    • How VideoDB Solves Complex Visual Analysis Tasks
    • Generative Media Quickstart
      • Generative Media Pricing
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Eleven Labs x VideoDB: Adding AI Generated voiceovers to silent footage
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Enhancing Video Captions with VideoDB Subtitle Styling
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • Adding AI Generated Voiceovers with VideoDB and LOVO
      • AI Generated Ad Films for Product Videography: Wellsaid, Open AI & VideoDB
      • Fun with Keyword Search
      • AWS Rekognition and VideoDB - Intelligent Video Clips
      • AWS Rekognition and VideoDB - Effortlessly Remove Inappropriate Content from Video
      • Overlay a Word-Counter on Video Stream
      • icon picker
        Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB
    • Dynamic Video Streams
      • Ref: TextAsset
      • Guide : TextAsset
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Setup Director Locally
    • github
      Open Source Tools
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • zapier
      Zapier Integration
      • Auto-Dub Videos & Save to Google Drive
      • Create & Add Intelligent Video Highlights to Notion
      • Create GenAI Video Engine - Notion Ideas to Youtube
      • Automatically Detect Profanity in Videos with AI - Update on Slack
      • Generate and Store YouTube Video Summaries in Notion
      • Automate Subtitle Generation for Video Libraries
      • Solve customers queries with Video Answers
    • n8n
      N8N Workflows
      • AI-Powered Meeting Intelligence: Recording to Insights Automation
      • AI Powered Dubbing Workflow for Video Content
      • Automate Subtitle Generation for Video Libraries
      • Automate Interview Evaluations with AI
      • Turn Meeting Recordings into Actionable Summaries
      • Auto-Sync Sales Calls to HubSpot CRM with AI
      • Instant Notion Summaries for Your Youtube Playlist
    • mcp
      VideoDB MCP Server
    • Edge of Knowledge
      • Building Intelligent Machines
        • Part 1 - Define Intelligence
        • Part 2 - Observe and Respond
        • Part 3 - Training a Model
      • Society of Machines
        • Society of Machines
        • Autonomy - Do we have the choice?
        • Emergence - An Intelligence of the collective
      • From Language Models to World Models: The Next Frontier in AI
      • The Future Series
    • videodb
      Building World's First Video Database
      • Multimedia: From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Misalignment of Today's Web
      • Beyond Traditional Video Infrastructure
      • Research Grants
    • Customer Love
    • Team
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB


💬 Overview

Generative AI apps and tools are on the rise, but most of them are still focused on automating singular chunk of a workflow- which may be rather time-consuming to put together later on. The biggest blocker in video content generation is the tight duration constraint (owing to token limits & loads). But here’s a clever solution to remove these blockers and keep your creativity flowing.
VideoDB is here to serve as a platform that can bring multiple generative AI outputs together using it’s multimodal capabilities. This tutorial demonstrates how VideoDB can enable the creation of new-age, generative AI apps/ tools using multimodal inputs and outputs by showcasing a 'storyboarding' tool as an example.
Crafting engaging video storyboards for app user flows is often laborious, requiring manual creation of assets like images and voiceovers. However, with VideoDB's powerful integration with AI models like DALL-E, OpenAI, and Eleven Labs, you can automate this process entirely through a simple tool.
For this demo storyboarding tool, we would require just 2 simple text inputs from the user (app name and description, and the steps in their user flow). Based on this information alone, the goal is to generate a video walkthrough of the app's user flow, complete with AI-generated images and audio. Here’s a look at what the storyboarding tool would look when complete:

Setup

📦 Installing packages

%pip install openai
%pip install videodb

🔑 API Keys

Before proceeding, ensure access to , , and API key. If not, sign up for API access on the respective platforms.
light
Get your API key from . ( Free for first 50 uploads, No credit card required ) 🎉
import os

os.environ["OPENAI_API_KEY"] = ""
os.environ["ELEVEN_LABS_API_KEY"] = ""
os.environ["VIDEO_DB_API_KEY"] = ""

🎙️ ElevenLab's Voice ID

You will also need ElevenLab's VoiceID of a Voice that you want to use.
For this demo, we will be using their default Voice ID l0CzJ3s4XFnGAHKDinPf , but this can be customised easily. ElevenLabs has a large variety of voices to choose from (browse them
). Once finalized, copy the Voice ID from ElevenLabs and link it here.
voiceover_artist_id = "VOICEOVER_ARTIST_ID"

Implementation


🌐 Step 1: Connect to VideoDB

Connect to VideoDB using your API key to establish a session for uploading and manipulating video files.
# Setup VideoDB Connection
from videodb import connect

conn = connect()
coll = conn.get_collection()

💬 Step 2: Set up the primary text inputs

While building an app, these input fields will be exposed to your users and this input will then become the foundation for the rest of this workflow.
For the purpose of this tutorial, we are using the sample use case of a user requesting a storyboard for their meditation app via the storyboarding tool that we’re building.

# Define Your App
app_description = "A meditation app for busy people with anxiety."
raw_steps = [
"Set up profile",
"Select preference for theme & music",
"Set meditation session timing",
"Start the session"
]


🕹️ Step 3: Generating Assets using other Generative AI tools

This step can be divided into 2 parts:
Step 3.1: Crafting a prompt for to generate step descriptions, which will ultimately inform the prompts for images and voiceover scripts
Step 3.2: Creating assets using these prompts:
Step descriptions, image prompts and voiceover scripts from Open AI
Images from DALL-E
Voiceover audio from ElevenLabs
Processing all these assets to be ready for import

Step 3.1: Writing Prompts for Step Descriptions, Voiceover Scripts and Image Generation
First, we’ll set up a prompt structure to generate step descriptions for each step defined earlier. Creating a step description helps in setting the context for the image and voice over script prompts.
Once that’s set up, we can focus on the prompts for the images and voiceover scripts for each step.
(Tip: a detailed prompt along with clear specifications about the tone, language, art style, colours and scene descriptions can result in better outputs. The example below illustrates creative ways to ensure consistency in the outputs for each step, while maintaining the standard of quality.)
def prompt_voiceover_scripts(steps, app_description):
prompt = f"Generate a structured response for {app_description}. in the user journey. This description should capture the essence of the action performed by the user during this step. This application aims to {app_description}. Here are the steps involved in the user journey, Elaborate the each step and involved the specifc steps requird in the stage:"
for step in steps:
prompt += f"""\n-
Create a concise description for the step '{step['step']}' in the user journey. This description should capture the essence of the action performed by the user during this step.
Create a conversational and engaging script for an app where the user is {step['step']}.
Keep it narrative-driven, within two sentences.
"""
prompt += """Return a response in json fromat, with key 'steps', and value being a list of dicts, where each dict has two keys 'step_description' and 'voiceover_script'
{
steps: [
{
'step_description': 'A concise description for the step',
'voiceover_script': 'A conversational and engaging script for the step. Keep it narrative-driven, within two sentences. Add "-- -- --" at the very end.'
}
]
}
"""
return prompt


def prompt_image_generation(step, app_description):
consistent_part = "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements."
variable_part = f"This illustration is a part of a storyboard to explain the user journey of an app built for {app_description}. This image will portray the '{step['step']}' stage in the app. Step description: {step['step_description']}. This illustration is meant for professional storyboarding, so understand the requirements accordingly and create a suitable illustration with the woman as a central character in the frame, but include other supporting props that can indicate that she's in the {step} step in the user flow."
prompt = f"{consistent_part}\n- {variable_part}"

return prompt


Step 3.2: Creating assets using the prompts
Generating Voiceover Scripts & Step Descriptions with OpenAI
Generate voiceover scripts and step descriptions using OpenAI's language model based on the prompts above.
import openai
import json

def generate_voiceover_scripts(steps):
print("\nGenerating Voiceover script and Step Description...")
client = openai.OpenAI()
prompt = prompt_voiceover_scripts(steps, app_description)
openai_res = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "system", "content": prompt}],
response_format={"type": "json_object"},
)

openai_res = json.loads(openai_res.choices[0].message.content)
return openai_res["steps"]



Converting Voiceover Scripts to Audio with Eleven Labs
Convert voiceover scripts to audio using Eleven Labs' API.
import requests

def generate_voiceover_audio(script, file):
print("\nConverting Voiceover Script to Audio...")
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voiceover_artist_id}"
try:
headers = {"xi-api-key": os.environ.get("ELEVEN_LABS_API_KEY"), "Content-Type": "application/json"}
payload = {
"model_id": "eleven_monolingual_v1",
"text": script,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.5},
}
elevenlabs_res = requests.request("POST", url, json=payload, headers=headers)
elevenlabs_res.raise_for_status()
# Save the audio file
with open(file, "wb") as f:
f.write(elevenlabs_res.content)
print(f"Result : voiceover audio saved as {file}")
except Exception as e:
print("An Error occurred while converting the voiceover script to audio: ", e)




Generating Images with DALL-E
Generate images using DALL-E's powerful image generation capabilities.
def generate_image_dalle(step, app_description):
print("\nGenerating Image...")
prompt = prompt_image_generation(step, app_description)
try:
client = openai.Client()
response = client.images.generate(
model="dall-e-3", prompt=prompt, n=1, size="1024x1024"
)
print("Result : ", response.data[0].url)
return response.data[0].url
except Exception as e:
print(f"An error occurred while generating the image: {e}")
return None


Final step: Processing the User Journey and Creating Assets
Process the user journey and generate assets for the app's video walkthrough.
def process_user_journey(steps, app_description):
print("App Description:", app_description)

step_scripts = generate_voiceover_scripts(steps)
for index, step in enumerate(step_scripts):
steps[index]["step_description"] = step["step_description"]
steps[index]["voiceover_script"] = step["voiceover_script"]

for index, step in enumerate(steps):
print(f"\n---------------------- \nProcessing step: {step['step']}")

voiceover_script = step["voiceover_script"]
if voiceover_script:
voiceover_file_name = f"voiceover_{index}.mp3"
step["voiceover_filename"] = voiceover_file_name
generate_voiceover_audio(voiceover_script, voiceover_file_name)
image_url = generate_image_dalle(step, app_description)
if image_url:
step["image_url"] = image_url

steps = []
for app_step in raw_steps:
steps.append({"step": app_step})
process_user_journey(steps, app_description)


🎥 Step 4: Combining Assets and Creating the Video Walkthrough

Uploading Generated Assets to VideoDB
Upload the generated assets to VideoDB for seamless integration.
# Upload Assets to VideoDB
from videodb import MediaType

for step in steps:
print(f"""\n----------------------\nProcessing step: {step['step']}""")

print("\nUploading Image...")
image = coll.upload(url=step["image_url"], media_type=MediaType.image)
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.