Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • How Accurate is Your Search?
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Eleven Labs x VideoDB: Adding AI Generated voiceovers to silent footage
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • AI Generated Ad Films for Product Videography: Wellsaid, Open AI & VideoDB
      • Fun with Keyword Search
      • AWS Rekognition and VideoDB - Effortlessly Remove Inappropriate Content from Video
      • Overlay a Word-Counter on Video Stream
      • Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB
    • Visual Search and Indexing
      • Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Video Scene Indexing
    • Multimodal Search
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Real‑Time Video Pipeline
    • Meeting Recording SDK
    • Generative Media Quickstart
      • Generative Media Pricing
    • Realtime Video Editor SDK
      • Fit & Position: Aspect Ratio Control
      • Trimming vs Timing: Two Independent Timelines
      • Advanced Clip Control: The Composition Layer
      • icon picker
        Caption & Subtitles: Auto-Generated Speech Synchronization
      • Notebooks
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Setup Director Locally
    • github
      Open Source Tools
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • zapier
      Zapier Integration
      • Auto-Dub Videos & Save to Google Drive
      • Create & Add Intelligent Video Highlights to Notion
      • Create GenAI Video Engine - Notion Ideas to Youtube
      • Automatically Detect Profanity in Videos with AI - Update on Slack
      • Generate and Store YouTube Video Summaries in Notion
      • Automate Subtitle Generation for Video Libraries
      • Solve customers queries with Video Answers
    • n8n
      N8N Workflows
      • AI-Powered Meeting Intelligence: Recording to Insights Automation
      • AI Powered Dubbing Workflow for Video Content
      • Automate Subtitle Generation for Video Libraries
      • Automate Interview Evaluations with AI
      • Turn Meeting Recordings into Actionable Summaries
      • Auto-Sync Sales Calls to HubSpot CRM with AI
      • Instant Notion Summaries for Your Youtube Playlist
    • mcp
      VideoDB MCP Server
    • Edge of Knowledge
      • Building Intelligent Machines
        • Part 1 - Define Intelligence
        • Part 2 - Observe and Respond
        • Part 3 - Training a Model
      • Society of Machines
        • Society of Machines
        • Autonomy - Do we have the choice?
        • Emergence - An Intelligence of the collective
      • From Language Models to World Models: The Next Frontier in AI
      • The Future Series
      • How VideoDB Solves Complex Visual Analysis Tasks
    • videodb
      Building World's First Video Database
      • Multimedia: From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Misalignment of Today's Web
      • Beyond Traditional Video Infrastructure
      • Research Grants
    • Customer Love
    • Team
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

Caption & Subtitles: Auto-Generated Speech Synchronization

CaptionAsset synchronizes text to audio timestamps, creating subtitles that move with spoken words.
Think of it like automatic subtitles that know exactly when each word is spoken - the text appears and animates in perfect sync with your video’s audio.
Unlike TextAsset which just displays static text overlays at fixed positions, CaptionAsset is specifically built for speech-driven content where timing matters.

CaptionAsset vs TextAsset

Feature
TextAsset
CaptionAsset
Timeline Sync
No
Yes (word-level timestamps)
Data Source
Manual text input
Auto-generated from speech
Animation
Static only
reveal, karaoke, supersize, box_highlight
Format
Font/Border/Shadow objects
ASS (Advanced SubStation Alpha)
There are no rows in this table
CaptionAsset uses ASS format for subtitle rendering, which enables time-synchronized animations and professional subtitle styling.

Auto-Caption Generation

CaptionAsset can automatically generate subtitles from speech in your video. This means you don’t need to manually type out transcripts or time-stamp each word - the system listens to your audio and creates perfectly synchronized captions for you.

Required: Video Indexing

Before using src=“auto”, you must index the video for spoken words:
video.index_spoken_words()
This is a one-time operation that analyzes your video’s audio track and figures out when each word is spoken.
The indexing creates a timestamp map that tells the caption system exactly when to display each word. Without this indexing step, the auto-caption feature won’t have the timing data it needs to work.

Basic Usage

from videodb.editor import CaptionAsset, Clip, Track

caption_clip = Clip(
asset=CaptionAsset(src="auto"),
duration=float(video.length)
)

track = Track()
track.add_clip(0, caption_clip)
Example:
The caption clip duration should match or exceed the video duration to ensure all words display.

Animation Types

CaptionAsset supports four animation modes that make your subtitles more dynamic:
Animation
Effect
reveal
Words appear one-by-one as they’re spoken
karaoke
Active word changes color (primary → secondary) while speaking
supersize
Active word scales up in size for emphasis
impact
Only the active word appears on the screen
color_highlight
Active word highlights with a distinct color for emphasis
There are no rows in this table
Code Example
from videodb.editor import CaptionAnimation

caption_asset = CaptionAsset(
src="auto",
animation=CaptionAnimation.karaoke,
primary_color="&H00FFFFFF", # White
secondary_color="&H0000FFFF" # Yellow highlight
)
Example with CaptionAnimation.karaoke

ASS Color Format

ASS (Advanced SubStation Alpha) is a professional subtitle format that’s been used in video production for years.
It uses BGR (Blue-Green-Red) byte order with an alpha channel - which is backwards from the RGB format you might be used to from web colors.
This quirk exists for historical reasons in subtitle rendering systems.

Format Structure

&HAABBGGRR or &H00BBGGRR
AA = Alpha (00 = opaque, FF = transparent)
BB = Blue channel
GG = Green channel
RR = Red channel

HTML to ASS Conversion

To convert HTML colors to ASS format:
HTML #RRGGBB → Extract RGB bytes
Reverse to BGR order
Add prefix &H00 (opaque) or &HAA (with transparency)
Example: HTML #FF6600 (orange)
RGB: Red=FF, Green=66, Blue=00
BGR: 00-66-FF
ASS: &H000066FF

Common Colors

HTML
ASS
Color
#FFFFFF
&H00FFFFFF
White
#000000
&H00000000
Black
#FF0000
&H000000FF
Red
#FFFF00
&H0000FFFF
Yellow
#00FF00
&H0000FF00
Green
There are no rows in this table

Styling Parameters

CaptionAsset styling is organized into three parameter groups: FontStyling, Positioning, and BorderAndShadow.

FontStyling

Controls how your subtitle text looks - the font face, size, and whether it’s bold or italic. Think of this as the basic typography settings for making your captions readable and on-brand.
from videodb.editor import FontStyling

FontStyling(
size=36, # Font size in points
bold=True, # Bold weight
italic=False, # Italic style
name="Arial" # Font family
)
Parameter
Type
Description
size
int
Font size in points (not pixels)
bold
bool
Bold weight (True) or normal (False)
italic
bool
Italic style
name
str
Font family name (must be available on server)
There are no rows in this table

Positioning

Controls where on the screen your captions appear and how much spacing you want from the edges. You can place captions at the bottom like traditional subtitles, or anywhere else on screen with precise margin control.
Positioning(
alignment=CaptionAlignment.bottom_center,
margin_v=100, # Vertical margin in pixels
margin_l=20, # Left margin in pixels
margin_r=20 # Right margin in pixels
)
Parameter
Type
Description
alignment
CaptionAlignment
Where on screen the captions appear (see alignment options below)
margin_v
int
Vertical margin in pixels from top or bottom edge
margin_l
int
Left margin in pixels from left edge
margin_r
int
Right margin in pixels from right edge
There are no rows in this table
# Corners
CaptionAlignment.top_left
CaptionAlignment.top_right
CaptionAlignment.bottom_left
CaptionAlignment.bottom_right

# Edges
CaptionAlignment.top
CaptionAlignment.bottom
CaptionAlignment.left
CaptionAlignment.right
CaptionAlignment.center

# Center positions
CaptionAlignment.middle_center
CaptionAlignment.bottom_center
Example with :
position=Positioning(
alignment=CaptionAlignment.bottom_center,
margin_v=50 # 50px from bottom
),
font=FontStyling(
size=48,
bold=True,
name = "Clear Sans",
)

BorderAndShadow

Controls outlines and shadows that make your text readable over any background.
These parameters are crucial because subtitles need to be legible whether they’re over bright skies, dark scenes, or complex imagery - borders and shadows ensure the text always stands out.
from videodb.editor import BorderAndShadow, CaptionBorderStyle

BorderAndShadow(
style=CaptionBorderStyle.outline_and_shadow,
outline=3.0, # Outline width in pixels
shadow=2.0, # Shadow depth in pixels
outline_color="&H00000000", # Black outline (ASS format)
shadow_color="&H80000000" # Semi-transparent black shadow
)
Parameter
Type
Description
style
CaptionBorderStyle
How the border/background is rendered
outline
float
Outline width in pixels around each letter
shadow
float
Shadow depth in pixels for drop shadow effect
outline_color
str
Outline color in ASS format
shadow_color
str
Shadow color in ASS format
There are no rows in this table
CaptionBorderStyle Options:
CaptionBorderStyle.outline_and_shadow - Outline + drop shadow
CaptionBorderStyle.opaque_box - Solid background box
Example:
border=BorderAndShadow(
style=CaptionBorderStyle.outline_and_shadow,
outline=5,
outline_color="&H00000000", # Black outline
shadow=3
)

Complete Example

From the notebook, here’s a complete CaptionAsset with all styling parameters:
from videodb.editor import CaptionAsset, CaptionAnimation, Positioning, CaptionAlignment, FontStyling, BorderAndShadow
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.