Frame Processing Primitives

Quick Example

from videodb import SceneExtractionType

# Index spoken content
video.index_spoken_words()

# Index visual content with extraction strategy
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 5, "frame_count": 3},
    prompt="Describe the scene, people, and any visible text"
)

Extraction Strategies

Time-Based Extraction

Split video into fixed intervals. Simple and predictable.

from videodb import SceneExtractionType

video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={
        "time": 10,           # Scene length in seconds
        "frame_count": 2      # Frames to analyze per scene
    },
    prompt="Describe what's happening"
)

Parameter	Type	Default	Description
`time`	int	10	Interval in seconds
`frame_count`	int	1	Frames per scene
`select_frames`	list	`["first"]`	Which frames: `"first"`, `"middle"`, `"last"`

Use either frame_count or select_frames, not both.

Best for:

Surveillance and monitoring
Live streams
Content with no clear scene boundaries
Consistent sampling across long videos

Shot-Based Extraction

Detect visual transitions (cuts, fades) to identify natural scene boundaries.

from videodb import SceneExtractionType

video.index_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={
        "threshold": 20,      # Sensitivity (lower = more sensitive)
        "frame_count": 1      # Frames per detected shot
    },
    prompt="Describe the scene"
)

Parameter	Type	Default	Description
`threshold`	int	20	Detection sensitivity
`frame_count`	int	1	Frames per shot

Best for:

Movies and TV shows
Edited content with clear cuts
Music videos
Commercials

Prompt Engineering

The prompt shapes what gets extracted. Think of it as telling the vision model what to look for.

Basic Prompts

# General description
prompt = "Describe what's happening in this scene"

# Object-focused
prompt = "Identify all objects and people visible"

# Action-focused
prompt = "Describe the activities and movements"

Domain-Specific Prompts

# Retail / E-commerce
video.index_scenes(
    prompt="Identify products, brands, and pricing visible on screen"
)

# Sports
video.index_scenes(
    prompt="Describe the play, players involved, and outcome"
)

# Security
video.index_scenes(
    prompt="Identify people, vehicles, and any unusual activity"
)

# Education
video.index_scenes(
    prompt="Describe the topic being taught and any diagrams or text shown"
)

Structured Output Prompts

Guide the model to produce consistent, parseable output:

prompt = """
Describe this scene with the following structure:
- Setting: Where is this taking place?
- People: Who is present and what are they doing?
- Objects: What notable items are visible?
- Action: What is happening?
"""

Frame Selection Strategy

More frames = more detail but higher cost. Choose based on your content.

Static Content (1 frame)

For content where a single frame captures the scene:

# One frame is enough for static shots
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 10, "frame_count": 1},
    prompt="Describe the scene"
)

Motion and Activity (3-5 frames)

For understanding movement and temporal changes:

# Multiple frames to capture motion
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 5, "frame_count": 5},
    prompt="Describe the activity and how it progresses"
)

Key Moment Selection

Select specific frames within each scene:

# First and last frames only
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 10, "select_frames": ["first", "last"]},
    prompt="Describe how the scene changes from start to end"
)

Combining Modalities

Index both spoken and visual content, then search across both:

from videodb import IndexType, SearchType

# Index both modalities
video.index_spoken_words()
video.index_scenes(prompt="Describe the visual content")

# Search spoken content
spoken_results = video.search(
    query="discusses climate change",
    index_type=IndexType.spoken_word
)

# Search visual content
visual_results = video.search(
    query="shows melting glaciers",
    index_type=IndexType.scene
)

Extraction Examples

Traffic Monitoring

# Detect vehicle colors (single frame sufficient)
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 1, "frame_count": 1},
    prompt="Identify the color and type of each vehicle"
)

# Detect stopped vehicles (need multiple frames)
video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 4, "frame_count": 5},
    prompt="Identify if any vehicle has stopped or is moving slowly"
)

Educational Content

# Combine visual and spoken indexing
video.index_spoken_words()

video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 30, "select_frames": ["first", "middle", "last"]},
    prompt="Describe diagrams, equations, or visual aids shown"
)

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

Quick Example

Extraction Strategies

Time-Based Extraction

Shot-Based Extraction

Prompt Engineering

Basic Prompts

Domain-Specific Prompts

Structured Output Prompts

Frame Selection Strategy

Static Content (1 frame)

Motion and Activity (3-5 frames)

Key Moment Selection

Combining Modalities

Extraction Examples

Traffic Monitoring

Educational Content

Next Steps

Multiple Indexes

Accuracy Tips

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

​Quick Example

​Extraction Strategies

​Time-Based Extraction

​Shot-Based Extraction

​Prompt Engineering

​Basic Prompts

​Domain-Specific Prompts

​Structured Output Prompts

​Frame Selection Strategy

​Static Content (1 frame)

​Motion and Activity (3-5 frames)

​Key Moment Selection

​Combining Modalities

​Extraction Examples

​Traffic Monitoring

​Educational Content

​Next Steps

Multiple Indexes

Accuracy Tips

Quick Example

Extraction Strategies

Time-Based Extraction

Shot-Based Extraction

Prompt Engineering

Basic Prompts

Domain-Specific Prompts

Structured Output Prompts

Frame Selection Strategy

Static Content (1 frame)

Motion and Activity (3-5 frames)

Key Moment Selection

Combining Modalities

Extraction Examples

Traffic Monitoring

Educational Content

Next Steps