VideoDB Documentation

Pages

Examples and Tutorials

Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB

⁠
⁠

💬 Overview

Generative AI apps and tools are on the rise, but most of them are still focused on automating singular chunk of a workflow- which may be rather time-consuming to put together later on. The biggest blocker in video content generation is the tight duration constraint (owing to token limits & loads). But here’s a clever solution to remove these blockers and keep your creativity flowing.

VideoDB is here to serve as a platform that can bring multiple generative AI outputs together using it’s multimodal capabilities. This tutorial demonstrates how VideoDB can enable the creation of new-age, generative AI apps/ tools using multimodal inputs and outputs by showcasing a 'storyboarding' tool as an example.

Crafting engaging video storyboards for app user flows is often laborious, requiring manual creation of assets like images and voiceovers. However, with VideoDB's powerful integration with AI models like DALL-E, OpenAI, and Eleven Labs, you can automate this process entirely through a simple tool.

For this demo storyboarding tool, we would require just 2 simple text inputs from the user (app name and description, and the steps in their user flow). Based on this information alone, the goal is to generate a video walkthrough of the app's user flow, complete with AI-generated images and audio. Here’s a look at what the storyboarding tool would look when complete:

⁠

Setup

📦 Installing packages

%pip install openai

%pip install videodb

🔑 API Keys

Before proceeding, ensure access to

VideoDB⁠

OpenAI⁠

, and

ElevenLabs⁠

API key. If not, sign up for API access on the respective platforms.

Get your API key from

VideoDB Console⁠

. ( Free for first 50 uploads, No credit card required ) 🎉

import os

os.environ["OPENAI_API_KEY"] = ""

os.environ["ELEVEN_LABS_API_KEY"] = ""

os.environ["VIDEO_DB_API_KEY"] = ""

🎙️ ElevenLab's Voice ID

You will also need ElevenLab's VoiceID of a Voice that you want to use.

For this demo, we will be using their default Voice ID l0CzJ3s4XFnGAHKDinPf , but this can be customised easily. ElevenLabs has a large variety of voices to choose from (browse them

here⁠

). Once finalized, copy the Voice ID from ElevenLabs and link it here.

voiceover_artist_id = "VOICEOVER_ARTIST_ID"

Implementation

🌐 Step 1: Connect to VideoDB

Connect to VideoDB using your API key to establish a session for uploading and manipulating video files.

# Setup VideoDB Connection

from videodb import connect

conn = connect()

coll = conn.get_collection()

💬 Step 2: Set up the primary text inputs

While building an app, these input fields will be exposed to your users and this input will then become the foundation for the rest of this workflow.

For the purpose of this tutorial, we are using the sample use case of a user requesting a storyboard for their meditation app via the storyboarding tool that we’re building.

# Define Your App

app_description = "A meditation app for busy people with anxiety."

raw_steps = [

"Set up profile",

"Select preference for theme & music",

"Set meditation session timing",

"Start the session"

]

⁠

🕹️ Step 3: Generating Assets using other Generative AI tools

This step can be divided into 2 parts:

Step 3.1: Crafting a prompt for to generate step descriptions, which will ultimately inform the prompts for images and voiceover scripts

Step 3.2: Creating assets using these prompts:

Step descriptions, image prompts and voiceover scripts from Open AI

Images from DALL-E

Voiceover audio from ElevenLabs

Processing all these assets to be ready for import

⁠

Step 3.1: Writing Prompts for Step Descriptions, Voiceover Scripts and Image Generation

First, we’ll set up a prompt structure to generate step descriptions for each step defined earlier. Creating a step description helps in setting the context for the image and voice over script prompts.

Once that’s set up, we can focus on the prompts for the images and voiceover scripts for each step.

(Tip: a detailed prompt along with clear specifications about the tone, language, art style, colours and scene descriptions can result in better outputs. The example below illustrates creative ways to ensure consistency in the outputs for each step, while maintaining the standard of quality.)

def prompt_voiceover_scripts(steps, app_description):

prompt = f"Generate a structured response for {app_description}. in the user journey. This description should capture the essence of the action performed by the user during this step. This application aims to {app_description}. Here are the steps involved in the user journey, Elaborate the each step and involved the specifc steps requird in the stage:"

for step in steps:

prompt += f"""\n-

Create a concise description for the step '{step['step']}' in the user journey. This description should capture the essence of the action performed by the user during this step.

Create a conversational and engaging script for an app where the user is {step['step']}.

Keep it narrative-driven, within two sentences.

"""

prompt += """Return a response in json fromat, with key 'steps', and value being a list of dicts, where each dict has two keys 'step_description' and 'voiceover_script'

{

steps: [

{

'step_description': 'A concise description for the step',

'voiceover_script': 'A conversational and engaging script for the step. Keep it narrative-driven, within two sentences. Add "-- -- --" at the very end.'

}

]

}

"""

return prompt

def prompt_image_generation(step, app_description):

consistent_part = "Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements."

variable_part = f"This illustration is a part of a storyboard to explain the user journey of an app built for {app_description}. This image will portray the '{step['step']}' stage in the app. Step description: {step['step_description']}. This illustration is meant for professional storyboarding, so understand the requirements accordingly and create a suitable illustration with the woman as a central character in the frame, but include other supporting props that can indicate that she's in the {step} step in the user flow."

prompt = f"{consistent_part}\n- {variable_part}"

return prompt

Step 3.2: Creating assets using the prompts

Generating Voiceover Scripts & Step Descriptions with OpenAI

Generate voiceover scripts and step descriptions using OpenAI's language model based on the prompts above.

import openai

import json

def generate_voiceover_scripts(steps):

print("\nGenerating Voiceover script and Step Description...")

client = openai.OpenAI()

prompt = prompt_voiceover_scripts(steps, app_description)

openai_res = client.chat.completions.create(

model="gpt-3.5-turbo",

messages=[{"role": "system", "content": prompt}],

response_format={"type": "json_object"},

)

openai_res = json.loads(openai_res.choices[0].message.content)

return openai_res["steps"]

Converting Voiceover Scripts to Audio with Eleven Labs

Convert voiceover scripts to audio using Eleven Labs' API.

import requests

def generate_voiceover_audio(script, file):

print("\nConverting Voiceover Script to Audio...")

url = f"https://api.elevenlabs.io/v1/text-to-speech/{voiceover_artist_id}"

try:

headers = {"xi-api-key": os.environ.get("ELEVEN_LABS_API_KEY"), "Content-Type": "application/json"}

payload = {

"model_id": "eleven_monolingual_v1",

"text": script,

"voice_settings": {"stability": 0.5, "similarity_boost": 0.5},

}

elevenlabs_res = requests.request("POST", url, json=payload, headers=headers)

elevenlabs_res.raise_for_status()

# Save the audio file

with open(file, "wb") as f:

f.write(elevenlabs_res.content)

print(f"Result : voiceover audio saved as {file}")

except Exception as e:

print("An Error occurred while converting the voiceover script to audio: ", e)

Generating Images with DALL-E

Generate images using DALL-E's powerful image generation capabilities.

def generate_image_dalle(step, app_description):

print("\nGenerating Image...")

prompt = prompt_image_generation(step, app_description)

try:

client = openai.Client()

response = client.images.generate(

model="dall-e-3", prompt=prompt, n=1, size="1024x1024"

)

print("Result : ", response.data[0].url)

return response.data[0].url

except Exception as e:

print(f"An error occurred while generating the image: {e}")

return None

Final step: Processing the User Journey and Creating Assets

Process the user journey and generate assets for the app's video walkthrough.

def process_user_journey(steps, app_description):

print("App Description:", app_description)

step_scripts = generate_voiceover_scripts(steps)

for index, step in enumerate(step_scripts):

steps[index]["step_description"] = step["step_description"]

steps[index]["voiceover_script"] = step["voiceover_script"]

for index, step in enumerate(steps):

print(f"\n---------------------- \nProcessing step: {step['step']}")

voiceover_script = step["voiceover_script"]

if voiceover_script:

voiceover_file_name = f"voiceover_{index}.mp3"

step["voiceover_filename"] = voiceover_file_name

generate_voiceover_audio(voiceover_script, voiceover_file_name)

image_url = generate_image_dalle(step, app_description)

if image_url:

step["image_url"] = image_url

steps = []

for app_step in raw_steps:

steps.append({"step": app_step})

process_user_journey(steps, app_description)

⁠

🎥 Step 4: Combining Assets and Creating the Video Walkthrough

Uploading Generated Assets to VideoDB

Upload the generated assets to VideoDB for seamless integration.

# Upload Assets to VideoDB

from videodb import MediaType

for step in steps:

print(f"""\n----------------------\nProcessing step: {step['step']}""")

print("\nUploading Image...")

image = coll.upload(url=step["image_url"], media_type=MediaType.image)

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.