Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB
💬 Overview
Generative AI apps and tools are on the rise, but most of them are still focused on automating singular chunk of a workflow- which may be rather time-consuming to put together later on. The biggest blocker in video content generation is the tight duration constraint (owing to token limits & loads). But here’s a clever solution to remove these blockers and keep your creativity flowing.
VideoDB is here to serve as a platform that can bring multiple generative AI outputs together using it’s multimodal capabilities. This tutorial demonstrates how VideoDB can enable the creation of new-age, generative AI apps/ tools using multimodal inputs and outputs by showcasing a 'storyboarding' tool as an example.
Crafting engaging video storyboards for app user flows is often laborious, requiring manual creation of assets like images and voiceovers. However, with VideoDB's powerful integration with AI models like DALL-E, OpenAI, and Eleven Labs, you can automate this process entirely through a simple tool.
For this demo storyboarding tool, we would require just 2 simple text inputs from the user (app name and description, and the steps in their user flow). Based on this information alone, the goal is to generate a video walkthrough of the app's user flow, complete with AI-generated images and audio. Here’s a look at what the storyboarding tool would look when complete:
. ( Free for first 50 uploads, No credit card required ) 🎉
import os
os.environ["OPENAI_API_KEY"]=""
os.environ["ELEVEN_LABS_API_KEY"]=""
os.environ["VIDEO_DB_API_KEY"]=""
🎙️ ElevenLab's Voice ID
You will also need ElevenLab's VoiceID of a Voice that you want to use.
For this demo, we will be using their default Voice ID l0CzJ3s4XFnGAHKDinPf , but this can be customised easily. ElevenLabs has a large variety of voices to choose from (browse them
). Once finalized, copy the Voice ID from ElevenLabs and link it here.
voiceover_artist_id ="VOICEOVER_ARTIST_ID"
Implementation
🌐 Step 1: Connect to VideoDB
Connect to VideoDB using your API key to establish a session for uploading and manipulating video files.
# Setup VideoDB Connection
from videodb import connect
conn =connect()
coll = conn.get_collection()
💬 Step 2: Set up the primary text inputs
While building an app, these input fields will be exposed to your users and this input will then become the foundation for the rest of this workflow.
For the purpose of this tutorial, we are using the sample use case of a user requesting a storyboard for their meditation app via the storyboarding tool that we’re building.
# Define Your App
app_description ="A meditation app for busy people with anxiety."
raw_steps =[
"Set up profile",
"Select preference for theme & music",
"Set meditation session timing",
"Start the session"
]
🕹️ Step 3: Generating Assets using other Generative AI tools
This step can be divided into 2 parts:
Step 3.1: Crafting a prompt for to generate step descriptions, which will ultimately inform the prompts for images and voiceover scripts
Step 3.2: Creating assets using these prompts:
Step descriptions, image prompts and voiceover scripts from Open AI
Images from DALL-E
Voiceover audio from ElevenLabs
Processing all these assets to be ready for import
Step 3.1: Writing Prompts for Step Descriptions, Voiceover Scripts and Image Generation
First, we’ll set up a prompt structure to generate step descriptions for each step defined earlier. Creating a step description helps in setting the context for the image and voice over script prompts.
Once that’s set up, we can focus on the prompts for the images and voiceover scripts for each step.
(Tip: a detailed prompt along with clear specifications about the tone, language, art style, colours and scene descriptions can result in better outputs. The example below illustrates creative ways to ensure consistency in the outputs for each step, while maintaining the standard of quality.)
prompt =f"Generate a structured response for {app_description}. in the user journey. This description should capture the essence of the action performed by the user during this step. This application aims to {app_description}. Here are the steps involved in the user journey, Elaborate the each step and involved the specifc steps requird in the stage:"
for step in steps:
prompt +=f"""\n-
Create a concise description for the step '{step['step']}' in the user journey. This description should capture the essence of the action performed by the user during this step.
Create a conversational and engaging script for an app where the user is {step['step']}.
Keep it narrative-driven, within two sentences.
"""
prompt +="""Return a response in json fromat, with key 'steps', and value being a list of dicts, where each dict has two keys 'step_description' and 'voiceover_script'
{
steps: [
{
'step_description': 'A concise description for the step',
'voiceover_script': 'A conversational and engaging script for the step. Keep it narrative-driven, within two sentences. Add "-- -- --" at the very end.'
consistent_part ="Create a stippling black ballpoint pen illustration of a Nigerian woman with a tight afro, living in her minimalist New York apartment. Keep the illustration simple with minimal elements."
variable_part =f"This illustration is a part of a storyboard to explain the user journey of an app built for {app_description}. This image will portray the '{step['step']}' stage in the app. Step description: {step['step_description']}. This illustration is meant for professional storyboarding, so understand the requirements accordingly and create a suitable illustration with the woman as a central character in the frame, but include other supporting props that can indicate that she's in the {step} step in the user flow."
prompt =f"{consistent_part}\n- {variable_part}"
return prompt
Step 3.2: Creating assets using the prompts
Generating Voiceover Scripts & Step Descriptions with OpenAI
Generate voiceover scripts and step descriptions using OpenAI's language model based on the prompts above.
import openai
import json
defgenerate_voiceover_scripts(steps):
print("\nGenerating Voiceover script and Step Description...")