Create an Index

Indexes turn raw video into structured, searchable data. Create a spoken word index for dialogue and narration, or a scene index for visual content.

Quick Example

import videodb

conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("m-xxx")

# Index spoken content (dialogue, narration)
video.index_spoken_words()

# Index visual content (scenes, objects, actions)
scene_index_id = video.index_scenes(
    prompt="Describe what's happening in the scene"
)

# Search both
results = video.search("car chase through the city")
results.play()

Spoken Word Index

Transcribes audio into timestamped text using automatic speech recognition (ASR).

video.index_spoken_words()

What it captures:

Dialogue and conversations
Narration and voiceovers
Lectures and presentations
Interviews and podcasts

Language Support

Major languages are auto-detected. For others, pass the language code:

# Auto-detect (English, Spanish, French, German, Italian, Portuguese, Dutch)
video.index_spoken_words()

# Explicit language code
video.index_spoken_words(language_code="hi")  # Hindi
video.index_spoken_words(language_code="ja")  # Japanese
video.index_spoken_words(language_code="zh")  # Chinese

Language	Code
English (Global)	`en`
English (US/UK/AU)	`en_us`, `en_uk`, `en_au`
Spanish	`es`
French	`fr`
German	`de`
Hindi	`hi`
Japanese	`ja`
Chinese	`zh`
Korean	`ko`
Russian	`ru`

Scene Index

Analyzes video frames using vision models to describe visual content.

scene_index_id = video.index_scenes(
    prompt="Describe the scene in detail"
)

What it captures:

Objects and people
Actions and activities
Environments and settings
Visual transitions

Prompt Shapes the Index

The prompt you provide determines what gets indexed:

# Focus on people
video.index_scenes(prompt="Describe the people and their actions")

# Focus on environment
video.index_scenes(prompt="Describe the location and setting")

# Focus on specific objects
video.index_scenes(prompt="Identify all vehicles and their colors")

Extraction Configuration

Control how frames are sampled - choose between frame segmentation (regular intervals) and scene segmentation (automatic transitions):

Comparison of frame segmentation and scene segmentation extraction types

from videodb import SceneExtractionType

# Time-based: every N seconds
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 10, "frame_count": 2},
    prompt="Describe the scene"
)

<img
  src="/assets/indexing/time-based-extraction.avif"
  style={{width: "auto", height: "auto"}}
  alt="Time-based extraction example showing consistent frame sampling at regular intervals"
/>

# Shot-based: detect visual transitions
video.index_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={"threshold": 20, "frame_count": 1},
    prompt="Describe the scene"
)

Method	Best For
Time-based	Consistent sampling, dynamic content
Shot-based	Edited videos with clear scene changes

Managing Indexes

List All Scene Indexes

indexes = video.list_scene_index()
for idx in indexes:
    print(f"{idx.id}: {idx.name} - {idx.status}")

List of scene indexes showing id, name, and status

Get Index Details

scene_index = video.get_scene_index(scene_index_id)
for scene in scene_index:
    print(f"{scene.start}-{scene.end}: {scene.description}")

Delete an Index

video.delete_scene_index(scene_index_id)

Async Processing with Callbacks

For long videos, use callbacks to get notified when indexing completes:

scene_index_id = video.index_scenes(
    prompt="Describe the scene",
    callback_url="https://your-backend.com/webhooks/index-complete"
)

What You Can Build

Keyword Search Compilation

Index spoken words, then search to create highlight reels

Multimodal Search

Combine spoken word and scene indexes for powerful queries

Baby Crib Monitoring

Scene indexing enables real-time infant monitoring

Intrusion Detection

Index camera feeds to detect unauthorized access

Next Steps

Multimodal Indexing

Extraction strategies for video + audio

Multiple Indexes

Layer different perspectives on the same media

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

Quick Example

Spoken Word Index

Language Support

Scene Index

Prompt Shapes the Index

Extraction Configuration

Managing Indexes

List All Scene Indexes

Get Index Details

Delete an Index

Async Processing with Callbacks

What You Can Build

Keyword Search Compilation

Multimodal Search

Baby Crib Monitoring

Intrusion Detection

Next Steps

Multimodal Indexing

Multiple Indexes

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

​Quick Example

​Spoken Word Index

​Language Support

​Scene Index

​Prompt Shapes the Index

​Extraction Configuration

​Managing Indexes

​List All Scene Indexes

​Get Index Details

​Delete an Index

​Async Processing with Callbacks

​What You Can Build

Keyword Search Compilation

Multimodal Search

Baby Crib Monitoring

Intrusion Detection

​Next Steps

Multimodal Indexing

Multiple Indexes

Quick Example

Spoken Word Index

Language Support

Scene Index

Prompt Shapes the Index

Extraction Configuration

Managing Indexes

List All Scene Indexes

Get Index Details

Delete an Index

Async Processing with Callbacks

What You Can Build

Next Steps