Imagine watching or sharing your videos, but instead of the original low-quality audio, you hear your voice rendered in crystal-clear, studio-quality sound.
A cloned voice can breathe new life into your videos and in this blog, I’ll show you how to build a Voice Replacement Agent in under 1 hour using Director.
🎯 Game Plan for Voice Replacement Agent
The steps are fairly simple:
User uploads voice samples (to clone) and a video (whose voice needs to be replaced). Clone the voice from the provided samples. Extract the transcript from the video. Generate an audio clip in the cloned voice, narrating the transcript. Overlay the new audio onto the video, replacing the original voice. ⭕️ Core Architecture
The Voice Replacement Agent is built on Director's extensible framework, leveraging its session management and state tracking capabilities while adding specialised voice replacement functionality. This is how the flow from the user’s input to the video output will look like.
Required Inputs
The following parameters will be needed for the agent, to know about the importance of an agent’s parameters, check out : 💡 The collection_id parameter refers to the ID of a VideoDB collection where all your videos are stored and the generated (synthesised) audio file can be stored for future use. To learn more, explore .
From ElevenLabs:
You’ll need the following two key methods from the ElevenLabs SDK:
clone: Used to clone a voice based on the provided audio samples. generate: Generates synthesised audio from text using the cloned voice. You will need to implement these above methods in the VideoDB Director for the same.
⚒️ Setup
VideoDB and Director Setup
Get your API key from . Install the latest SDK Follow instructions mentioned at
⚙️ Create ElevenLabs methods
For the voice cloning feature, the clone and generate methods given by ElevenLabs needs to be implemented. For this, you can access the existing ElevenLabs tool present in the /backend/director/tools . Lets go to the ElevenLabs tool in the elevenlabs.py. Define the required methods In the ElevenLabsTool.
For cloning, create the clone_audio method
def clone_audio(self, audio_files: list[str], name_of_voice, description):
voice = self.client.clone(
name=name_of_voice,
files=audio_files,
description=description
)
return voice
And for generating the audio, create the synthesise_text method
def synthesise_text(self, voice:Voice, text_to_synthesis:str):
audio = self.client.generate(text=text_to_synthesis, voice=voice, model="eleven_multilingual_v2")
return audio
🤖 Building the Agent
1. Import the required components
Create a voice_replacement.py file inside backend/director/agents and add all our imports to it. They include: 2. Define the parameters for the agent
Referring the Game Plan, Create a JSON schema for these parameters.
3. Implement Agent Class
We will now create the agent class. The parameters set here ( self.agent_name, self.description and the self.parameters) determine how the agent interacts with the reasoning engine.
4. Implement the core logic of the voice replacement agent
1. Declare a run method.
We will need to implement a run method in the agent’s class. This is the heart of the agent as this is the method that runs when the agent is called.
In the run method, define the required parameters which will be used to implement the agent.
2. Check Authorisation: If the user isn't authorised, return an error response
3. Initialise Tools: Check for the ElevenLabs API key and initialise the required tools.
4. Save the audio files locally: From the video stored in VideoDB, we will need to extract the required audio file. This can be broken into following steps:
Generate a stream of the video for the sample based on start and end time of sample audio Download the video based on the stream Extract the audio file from the video For generating the stream, we will use the existing get_video_stream method from the VideoDBTool :
For downloading the video, first we will get the download link of the above generated stream
Now, we will write a _download_video_file method which will download the video via the download_url and give the path where it is saved
Now, let’s use the method to get the video_path
We will now extract the audio from a video using VideoDB. For this, we will need a extract_audio_from_video method.
Start by defining a method _extract_audio_from_video, which takes a video file path as input and extracts audio from it.
Now, modify the existing get_audio method inside tools/videodb-tool.py to include a url field in the returned object. This field provides a direct link to the extracted audio file. Create the _download_audio_file method which will take the URL
Finally, let’s use this method to get the audio file from it
5. Clone the Voice: Call ElevenLabs' clone_audio method to create the cloned voice.
6. Start processing all videos: We have generated the cloned voice and we can now create overlays for the videos. We will start processing the videos for each video_id present in the video_ids list
7. Extract transcript from video: For the generation of audio for the video, we will need to extract the transcript from the video. For this, we will make a method in the agent which will take the video_id and return the transcript
And now, you can get the transcript from the video in the run method.
8. Synthesise Text: Use ElevenLabs' synthesise_text method to generate audio from the input text
💡 To communicate the steps that the agent is taking, you can simply use the self.output_message.actions and the self.output_message.push_update methods to send the updates to the client. This will allow you to communicate with the user about what the agent is achieving at a particular time.
9. Save the Audio File: Store the generated audio file locally
10. Upload to VideoDB: We will upload the generated audio file to VideoDB and retrieve its unique audio ID.
11. Overlay audio on to the video: We will use VideoDB’s timeline feature to overlay the cloned voice onto the video.
💡 To know more about timeline and audio overlays, you can visit our doc about the same:
For this, make a method for adding the overlay using the video_id and audio_id which returns a stream link so that we can stream the video.
Now, in the run method, before adding the overlay to the video, we will use VideoContent to display the video output in the Director’s UI to show that adding the audio overlay is in progress.
We will now add the overlay and pass the generated stream link to the video_content so that we can watch it in the Director’s UI.
12. Publish the updates and return an AgentResponse to end the agents process
Once all the videos are processed, we will use the publish method to save the messages as a final step.
Also, you can send certain information such as the cloned_voice_id and audio_id as a response so that any subsequent chat requests will be able to use them to generate further responses.
13. Implement Error Handling
Ensure robust error handling to manage failures gracefully.
5. Register the Agent
To use the agent, go inside the backend/director/handler.py and import the agent. And add the agent in the self.agents list
And that is it! It was this straightforward to write an agent. Now you can try out this agent in the Director locally and explore the cloning capabilities of the ElevenLabs’ cloning feature and seamless video overlay feature from VideoDB to breathe new life into your videos!
🚀 Using the agent
Simply go to the frontend at and refresh the page to see the agent available in the options for use. 💡 Conclusion
The Voice Transformation Agent demonstrates the power and flexibility of Director's agent framework. By leveraging VideoDB's video overlay feature and robust database system and ElevenLabs' voice cloning capabilities, we've created a secure, scalable solution for your voice cloning and video overlay needs.
Key takeaways:
Simple integration with Director's framework Robust error handling and security measures Scalable architecture for audio processing Seamless way of adding audio overlays on videos Ethical considerations built into the design Creating an agent in VideoDB Director is incredibly easy, allowing you to build powerful and customised solutions quickly. What will you build? 🚀
📖 Resources
Ready to build your own AI agents? Join our and start exploring what's possible with Director! We encourage you to experiment with the framework and contribute to its growing ecosystem. Share your creations and insights with the community, and help shape the future of AI-powered video creation.
📚 - Comprehensive guides and API references 💡 - Real-world examples and implementations 💬 - Connect with other agent developers Need help? Reach out to our community or open an issue on . Happy building! 🚀