Documentation Index Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
Use this file to discover all available pages before exploring further.
Indexes turn raw video into structured, searchable data. Create a spoken word index for dialogue and narration, or a scene index for visual content.
Quick Example
import videodb
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video( "m-xxx" )
# Index spoken content (dialogue, narration)
video.index_spoken_words()
# Index visual content (scenes, objects, actions)
scene_index_id = video.index_scenes(
prompt = "Describe what's happening in the scene"
)
# Search both
results = video.search( "car chase through the city" )
results.play()
Spoken Word Index
Transcribes audio into timestamped text using automatic speech recognition (ASR).
video.index_spoken_words()
What it captures:
Dialogue and conversations
Narration and voiceovers
Lectures and presentations
Interviews and podcasts
Language Support
Major languages are auto-detected. For others, pass the language code:
# Auto-detect (English, Spanish, French, German, Italian, Portuguese, Dutch)
video.index_spoken_words()
# Explicit language code
video.index_spoken_words( language_code = "hi" ) # Hindi
video.index_spoken_words( language_code = "ja" ) # Japanese
video.index_spoken_words( language_code = "zh" ) # Chinese
Language Code English (Global) enEnglish (US/UK/AU) en_us, en_uk, en_auSpanish esFrench frGerman deHindi hiJapanese jaChinese zhKorean koRussian ru
Scene Index
Analyzes video frames using vision models to describe visual content.
scene_index_id = video.index_scenes(
prompt = "Describe the scene in detail"
)
What it captures:
Objects and people
Actions and activities
Environments and settings
Visual transitions
Prompt Shapes the Index
The prompt you provide determines what gets indexed:
# Focus on people
video.index_scenes( prompt = "Describe the people and their actions" )
# Focus on environment
video.index_scenes( prompt = "Describe the location and setting" )
# Focus on specific objects
video.index_scenes( prompt = "Identify all vehicles and their colors" )
Control how frames are sampled - choose between frame segmentation (regular intervals) and scene segmentation (automatic transitions):
from videodb import SceneExtractionType
# Time-based: every N seconds
video.index_scenes(
extraction_type = SceneExtractionType.time_based,
extraction_config = { "time" : 10 , "frame_count" : 2 },
prompt = "Describe the scene"
)
< img
src = "/assets/indexing/time-based-extraction.avif"
style = {{width: "auto" , height: "auto" }}
alt = "Time-based extraction example showing consistent frame sampling at regular intervals"
/>
# Shot-based: detect visual transitions
video.index_scenes(
extraction_type = SceneExtractionType.shot_based,
extraction_config = { "threshold" : 20 , "frame_count" : 1 },
prompt = "Describe the scene"
)
Method Best For Time-based Consistent sampling, dynamic content Shot-based Edited videos with clear scene changes
Managing Indexes
List All Scene Indexes
indexes = video.list_scene_index()
for idx in indexes:
print ( f " { idx.id } : { idx.name } - { idx.status } " )
Get Index Details
scene_index = video.get_scene_index(scene_index_id)
for scene in scene_index:
print ( f " { scene.start } - { scene.end } : { scene.description } " )
Delete an Index
video.delete_scene_index(scene_index_id)
Async Processing with Callbacks
For long videos, use callbacks to get notified when indexing completes:
scene_index_id = video.index_scenes(
prompt = "Describe the scene" ,
callback_url = "https://your-backend.com/webhooks/index-complete"
)
What You Can Build
Keyword Search Compilation Index spoken words, then search to create highlight reels
Multimodal Search Combine spoken word and scene indexes for powerful queries
Baby Crib Monitoring Scene indexing enables real-time infant monitoring
Intrusion Detection Index camera feeds to detect unauthorized access
Next Steps
Multimodal Indexing Extraction strategies for video + audio
Multiple Indexes Layer different perspectives on the same media