videodb
VideoDB Documentation
videodb
VideoDB Documentation
Examples and Tutorials

icon picker
Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB


💬 Overview

Generative AI apps and tools are on the rise, but most of them are still focused on automating singular chunk of a workflow- which may be rather time-consuming to put together later on. The biggest blocker in video content generation is the tight duration constraint (owing to token limits & loads). But here’s a clever solution to remove these blockers and keep your creativity flowing.
VideoDB is here to serve as a platform that can bring multiple generative AI outputs together using it’s multimodal capabilities. This tutorial demonstrates how VideoDB can enable the creation of new-age, generative AI apps/ tools using multimodal inputs and outputs by showcasing a 'storyboarding' tool as an example.
Crafting engaging video storyboards for app user flows is often laborious, requiring manual creation of assets like images and voiceovers. However, with VideoDB's powerful integration with AI models like DALL-E, OpenAI, and Eleven Labs, you can automate this process entirely through a simple tool.
For this demo storyboarding tool, we would require just 2 simple text inputs from the user (app name and description, and the steps in their user flow). Based on this information alone, the goal is to generate a video walkthrough of the app's user flow, complete with AI-generated images and audio. Here’s a look at what the storyboarding tool would look when complete:

Setup

📦 Installing packages

%pip install openai
%pip install videodb

🔑 API Keys

Before proceeding, ensure access to , , and API key. If not, sign up for API access on the respective platforms.
light
Get your API key from . ( Free for first 50 uploads, No credit card required ) 🎉
import os

os.environ["OPENAI_API_KEY"] = ""
os.environ["ELEVEN_LABS_API_KEY"] = ""
os.environ["VIDEO_DB_API_KEY"] = ""

🎙️ ElevenLab's Voice ID

You will also need ElevenLab's VoiceID of a Voice that you want to use.
For this demo, we will be using their default Voice ID l0CzJ3s4XFnGAHKDinPf , but this can be customised easily. ElevenLabs has a large variety of voices to choose from (browse them
). Once finalized, copy the Voice ID from ElevenLabs and link it here.
voiceover_artist_id = "VOICEOVER_ARTIST_ID"

Implementation


🌐 Step 1: Connect to VideoDB

Connect to VideoDB using your API key to establish a session for uploading and manipulating video files.
# Setup VideoDB Connection
from videodb import connect

conn = connect()
coll = conn.get_collection()

💬 Step 2: Set up the primary text inputs

While building an app, these input fields will be exposed to your users and this input will then become the foundation for the rest of this workflow.
For the purpose of this tutorial, we are using the sample use case of a user requesting a storyboard for their meditation app via the storyboarding tool that we’re building.

# Define Your App
app_description = "A meditation app for busy people with anxiety."
raw_steps = [
"Set up profile",
"Select preference for theme & music",
"Set meditation session timing",
"Start the session"
]


🕹️ Step 3: Generating Assets using other Generative AI tools

This step can be divided into 2 parts:
Step 3.1: Crafting a prompt for to generate step descriptions, which will ultimately inform the prompts for images and voiceover scripts
Step 3.2: Creating assets using these prompts:
Step descriptions, image prompts and voiceover scripts from Open AI
Images from DALL-E
Voiceover audio from ElevenLabs
Processing all these assets to be ready for import

Step 3.1: Writing Prompts for Step Descriptions, Voiceover Scripts and Image Generation
First, we’ll set up a prompt structure to generate step descriptions for each step defined earlier. Creating a step description helps in setting the context for the image and voice over script prompts.
Once that’s set up, we can focus on the prompts for the images and voiceover scripts for each step.
(Tip: a detailed prompt along with clear specifications about the tone, language, art style, colours and scene descriptions can result in better outputs. The example below illustrates creative ways to ensure consistency in the outputs for each step, while maintaining the standard of quality.)
def prompt_voiceover_scripts(steps, app_description):
prompt = f"Generate a structured response for {app_description}. in the user journey. This description should capture the essence of the action performed by the user during this step. This application aims to {app_description}. Here are the steps involved in the user journey, Elaborate the each step and involved the specifc steps requird in the stage:"
for step in steps:
prompt += f"""\n-
Create a concise description for the step '{step['step']}' in the user journey. This description should capture the essence of the action performed by the user during this step.
Create a conversational and engaging script for an app where the user is {step['step']}.
Keep it narrative-driven, within two sentences.
"""
prompt += """Return a response in json fromat, with key 'steps', and value being a list of dicts, where each dict has two keys 'step_description' and 'voiceover_script'
{
steps: [
{
'step_description': 'A concise description for the step',
'voiceover_script': 'A conversational and engaging script for the step. Keep it narrative-driven, within two sentences. Add "-- -- --" at the very end.'
}
]
}
"""
return prompt


def prompt_image_generation(step, app_description):
consistent_part = "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements."
variable_part = f"This illustration is a part of a storyboard to explain the user journey of an app built for {app_description}. This image will portray the '{step['step']}' stage in the app. Step description: {step['step_description']}. This illustration is meant for professional storyboarding, so understand the requirements accordingly and create a suitable illustration with the woman as a central character in the frame, but include other supporting props that can indicate that she's in the {step} step in the user flow."
prompt = f"{consistent_part}\n- {variable_part}"

return prompt


Step 3.2: Creating assets using the prompts
Generating Voiceover Scripts & Step Descriptions with OpenAI
Generate voiceover scripts and step descriptions using OpenAI's language model based on the prompts above.
import openai
import json

def generate_voiceover_scripts(steps):
print("\nGenerating Voiceover script and Step Description...")
client = openai.OpenAI()
prompt = prompt_voiceover_scripts(steps, app_description)
openai_res = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "system", "content": prompt}],
response_format={"type": "json_object"},
)

openai_res = json.loads(openai_res.choices[0].message.content)
return openai_res["steps"]



Converting Voiceover Scripts to Audio with Eleven Labs
Convert voiceover scripts to audio using Eleven Labs' API.
import requests

def generate_voiceover_audio(script, file):
print("\nConverting Voiceover Script to Audio...")
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voiceover_artist_id}"
try:
headers = {"xi-api-key": os.environ.get("ELEVEN_LABS_API_KEY"), "Content-Type": "application/json"}
payload = {
"model_id": "eleven_monolingual_v1",
"text": script,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.5},
}
elevenlabs_res = requests.request("POST", url, json=payload, headers=headers)
elevenlabs_res.raise_for_status()
# Save the audio file
with open(file, "wb") as f:
f.write(elevenlabs_res.content)
print(f"Result : voiceover audio saved as {file}")
except Exception as e:
print("An Error occurred while converting the voiceover script to audio: ", e)




Generating Images with DALL-E
Generate images using DALL-E's powerful image generation capabilities.
def generate_image_dalle(step, app_description):
print("\nGenerating Image...")
prompt = prompt_image_generation(step, app_description)
try:
client = openai.Client()
response = client.images.generate(
model="dall-e-3", prompt=prompt, n=1, size="1024x1024"
)
print("Result : ", response.data[0].url)
return response.data[0].url
except Exception as e:
print(f"An error occurred while generating the image: {e}")
return None


Final step: Processing the User Journey and Creating Assets
Process the user journey and generate assets for the app's video walkthrough.
def process_user_journey(steps, app_description):
print("App Description:", app_description)

step_scripts = generate_voiceover_scripts(steps)
for index, step in enumerate(step_scripts):
steps[index]["step_description"] = step["step_description"]
steps[index]["voiceover_script"] = step["voiceover_script"]

for index, step in enumerate(steps):
print(f"\n---------------------- \nProcessing step: {step['step']}")

voiceover_script = step["voiceover_script"]
if voiceover_script:
voiceover_file_name = f"voiceover_{index}.mp3"
step["voiceover_filename"] = voiceover_file_name
generate_voiceover_audio(voiceover_script, voiceover_file_name)
image_url = generate_image_dalle(step, app_description)
if image_url:
step["image_url"] = image_url

steps = []
for app_step in raw_steps:
steps.append({"step": app_step})
process_user_journey(steps, app_description)


🎥 Step 4: Combining Assets and Creating the Video Walkthrough

Uploading Generated Assets to VideoDB
Upload the generated assets to VideoDB for seamless integration.
# Upload Assets to VideoDB
from videodb import MediaType

for step in steps:
print(f"""\n----------------------\nProcessing step: {step['step']}""")

print("\nUploading Image...")
image = coll.upload(url=step["image_url"], media_type=MediaType.image)
print("Uploaded Image")

print("\nUploading Voiceover Audio...")
audio = coll.upload(file_path=step["voiceover_filename"])
print("Uploaded Voiceover Audio")

step["image_id"] = image.id
step["audio_id"] = audio.id


Creating Timeline for the Video Storyboard in VideoDB
Create a timeline to sequence the assets and create a dynamic video walkthrough. We will need a video in order to create a timeline, so we’ll be using this as a base. This can be used as a background for the generated images too, so feel free to replace this url with any video of your choice.
Bonus: Use
Text Assets
to display the step name in the user flow. It can be customised using
Text Styling.

# Create Timeline in VideoDB
from videodb.asset import VideoAsset, ImageAsset, AudioAsset, TextAsset
from videodb.timeline import Timeline
from videodb import TextStyle

timeline = Timeline(conn)

seeker = 0

# Create Asset for Image and Audio Asset
for step in steps:
audio = coll.get_audio(step["audio_id"])
image = coll.get_image(step["image_id"])

audio_duration = float(audio.length)

image_asset = ImageAsset(
image.id, duration=audio_duration, x="(main_w-overlay_w)/2", y="(main_h-overlay_h)/2", height="w=iw/3", width="h=ih/3",
)
audio_asset = AudioAsset(audio.id, disable_other_tracks=True)
text_asset = TextAsset(
step["step"],
duration=audio_duration,
style=TextStyle(
x="(w-text_w)/2",
y="(h-text_h*2)",
font="League Spartan",
fontsize = "(h/20)",
fontcolor="Snow",
boxcolor="OrangeRed",
boxborderw=10,
),
)

timeline.add_overlay(seeker, audio_asset)
timeline.add_overlay(seeker, image_asset)
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.