Skip to main content
Indexes turn raw video into structured, searchable data. Create a spoken word index for dialogue and narration, or a scene index for visual content.

Quick Example

import videodb

conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("m-xxx")

# Index spoken content (dialogue, narration)
video.index_spoken_words()

# Index visual content (scenes, objects, actions)
scene_index_id = video.index_scenes(
    prompt="Describe what's happening in the scene"
)

# Search both
results = video.search("car chase through the city")
results.play()

Spoken Word Index

Transcribes audio into timestamped text using automatic speech recognition (ASR).
video.index_spoken_words()
What it captures:
  • Dialogue and conversations
  • Narration and voiceovers
  • Lectures and presentations
  • Interviews and podcasts

Language Support

Major languages are auto-detected. For others, pass the language code:
# Auto-detect (English, Spanish, French, German, Italian, Portuguese, Dutch)
video.index_spoken_words()

# Explicit language code
video.index_spoken_words(language_code="hi")  # Hindi
video.index_spoken_words(language_code="ja")  # Japanese
video.index_spoken_words(language_code="zh")  # Chinese
LanguageCode
English (Global)en
English (US/UK/AU)en_us, en_uk, en_au
Spanishes
Frenchfr
Germande
Hindihi
Japaneseja
Chinesezh
Koreanko
Russianru

Scene Index

Analyzes video frames using vision models to describe visual content.
scene_index_id = video.index_scenes(
    prompt="Describe the scene in detail"
)
What it captures:
  • Objects and people
  • Actions and activities
  • Environments and settings
  • Visual transitions

Prompt Shapes the Index

The prompt you provide determines what gets indexed:
# Focus on people
video.index_scenes(prompt="Describe the people and their actions")

# Focus on environment
video.index_scenes(prompt="Describe the location and setting")

# Focus on specific objects
video.index_scenes(prompt="Identify all vehicles and their colors")

Extraction Configuration

Control how frames are sampled - choose between frame segmentation (regular intervals) and scene segmentation (automatic transitions): Comparison of frame segmentation and scene segmentation extraction types
from videodb import SceneExtractionType

# Time-based: every N seconds
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 10, "frame_count": 2},
    prompt="Describe the scene"
)

<img
  src="/assets/indexing/time-based-extraction.avif"
  style={{width: "auto", height: "auto"}}
  alt="Time-based extraction example showing consistent frame sampling at regular intervals"
/>

# Shot-based: detect visual transitions
video.index_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={"threshold": 20, "frame_count": 1},
    prompt="Describe the scene"
)
MethodBest For
Time-basedConsistent sampling, dynamic content
Shot-basedEdited videos with clear scene changes

Managing Indexes

List All Scene Indexes

indexes = video.list_scene_index()
for idx in indexes:
    print(f"{idx.id}: {idx.name} - {idx.status}")
List of scene indexes showing id, name, and status

Get Index Details

scene_index = video.get_scene_index(scene_index_id)
for scene in scene_index:
    print(f"{scene.start}-{scene.end}: {scene.description}")

Delete an Index

video.delete_scene_index(scene_index_id)

Async Processing with Callbacks

For long videos, use callbacks to get notified when indexing completes:
scene_index_id = video.index_scenes(
    prompt="Describe the scene",
    callback_url="https://your-backend.com/webhooks/index-complete"
)

What You Can Build

Keyword Search Compilation

Index spoken words, then search to create highlight reels

Multimodal Search

Combine spoken word and scene indexes for powerful queries

Baby Crib Monitoring

Scene indexing enables real-time infant monitoring

Intrusion Detection

Index camera feeds to detect unauthorized access

Next Steps

Multimodal Indexing

Extraction strategies for video + audio

Multiple Indexes

Layer different perspectives on the same media