> ## Documentation Index
> Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
> Use this file to discover all available pages before exploring further.

> Video carries information in multiple modalities: what's said, what's shown, and how it changes over time. Multimodal indexing extracts meaning from all of these through extraction strategies for combining visual and spoken content analysis.

# Frame Processing Primitives

## Quick Example

<CodeGroup>
  ```python Python theme={null}
  from videodb import SceneExtractionType

  # Index spoken content
  video.index_spoken_words()

  # Index visual content with extraction strategy
  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 5, "frame_count": 3},
      prompt="Describe the scene, people, and any visible text"
  )
  ```

  ```javascript Node.js theme={null}
  // Index spoken content
  await video.indexSpokenWords();

  // Index visual content with extraction strategy
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 5, frame_count: 3 },
      prompt: "Describe the scene, people, and any visible text"
  });
  ```
</CodeGroup>

***

## Extraction Strategies

### Time-Based Extraction

Split video into fixed intervals. Simple and predictable.

<img src="https://mintcdn.com/videodb/6KL5X6-sIPSRpEUt/assets/indexing/time-based-extraction.avif?fit=max&auto=format&n=6KL5X6-sIPSRpEUt&q=85&s=ad5e70511eed02ed94e367e473a35a43" style={{width: "auto", height: "auto"}} alt="Time-based extraction example showing consistent frame sampling at regular intervals" width="1866" height="1082" data-path="assets/indexing/time-based-extraction.avif" />

<CodeGroup>
  ```python Python theme={null}
  from videodb import SceneExtractionType

  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={
          "time": 10,           # Scene length in seconds
          "frame_count": 2      # Frames to analyze per scene
      },
      prompt="Describe what's happening"
  )
  ```

  ```javascript Node.js theme={null}
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: {
          time: 10,           // Scene length in seconds
          frame_count: 2      // Frames to analyze per scene
      },
      prompt: "Describe what's happening"
  });
  ```
</CodeGroup>

| Parameter       | Type | Default     | Description                                   |
| :-------------- | :--- | :---------- | :-------------------------------------------- |
| `time`          | int  | 10          | Interval in seconds                           |
| `frame_count`   | int  | 1           | Frames per scene                              |
| `select_frames` | list | `["first"]` | Which frames: `"first"`, `"middle"`, `"last"` |

<Note>
  Use either `frame_count` or `select_frames`, not both.
</Note>

**Best for:**

* Surveillance and monitoring
* Live streams
* Content with no clear scene boundaries
* Consistent sampling across long videos

### Shot-Based Extraction

Detect visual transitions (cuts, fades) to identify natural scene boundaries.

<img src="https://mintcdn.com/videodb/6KL5X6-sIPSRpEUt/assets/indexing/shot-based-extraction.avif?fit=max&auto=format&n=6KL5X6-sIPSRpEUt&q=85&s=f007bdd7bd6afdc2946a1cade7718860" style={{width: "auto", height: "auto"}} alt="Shot-based extraction example showing automatic detection of visual transitions and scene boundaries" width="1618" height="940" data-path="assets/indexing/shot-based-extraction.avif" />

<CodeGroup>
  ```python Python theme={null}
  from videodb import SceneExtractionType

  video.index_scenes(
      extraction_type=SceneExtractionType.shot_based,
      extraction_config={
          "threshold": 20,      # Sensitivity (lower = more sensitive)
          "frame_count": 1      # Frames per detected shot
      },
      prompt="Describe the scene"
  )
  ```

  ```javascript Node.js theme={null}
  await video.indexScenes({
      extractionType: 'shot',
      extractionConfig: {
          threshold: 20,      // Sensitivity (lower = more sensitive)
          frame_count: 1      // Frames per detected shot
      },
      prompt: "Describe the scene"
  });
  ```
</CodeGroup>

| Parameter     | Type | Default | Description           |
| :------------ | :--- | :------ | :-------------------- |
| `threshold`   | int  | 20      | Detection sensitivity |
| `frame_count` | int  | 1       | Frames per shot       |

**Best for:**

* Movies and TV shows
* Edited content with clear cuts
* Music videos
* Commercials

***

## Prompt Engineering

The prompt shapes what gets extracted. Think of it as telling the vision model what to look for.

### Basic Prompts

```python theme={null}
# General description
prompt = "Describe what's happening in this scene"

# Object-focused
prompt = "Identify all objects and people visible"

# Action-focused
prompt = "Describe the activities and movements"
```

### Domain-Specific Prompts

<CodeGroup>
  ```python Python theme={null}
  # Retail / E-commerce
  video.index_scenes(
      prompt="Identify products, brands, and pricing visible on screen"
  )

  # Sports
  video.index_scenes(
      prompt="Describe the play, players involved, and outcome"
  )

  # Security
  video.index_scenes(
      prompt="Identify people, vehicles, and any unusual activity"
  )

  # Education
  video.index_scenes(
      prompt="Describe the topic being taught and any diagrams or text shown"
  )
  ```

  ```javascript Node.js theme={null}
  // Retail / E-commerce
  await video.indexScenes({
      prompt: "Identify products, brands, and pricing visible on screen"
  });

  // Sports
  await video.indexScenes({
      prompt: "Describe the play, players involved, and outcome"
  });

  // Security
  await video.indexScenes({
      prompt: "Identify people, vehicles, and any unusual activity"
  });

  // Education
  await video.indexScenes({
      prompt: "Describe the topic being taught and any diagrams or text shown"
  });
  ```
</CodeGroup>

### Structured Output Prompts

Guide the model to produce consistent, parseable output:

```python theme={null}
prompt = """
Describe this scene with the following structure:
- Setting: Where is this taking place?
- People: Who is present and what are they doing?
- Objects: What notable items are visible?
- Action: What is happening?
"""
```

***

## Frame Selection Strategy

More frames = more detail but higher cost. Choose based on your content.

### Static Content (1 frame)

For content where a single frame captures the scene:

<CodeGroup>
  ```python Python theme={null}
  # One frame is enough for static shots
  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 10, "frame_count": 1},
      prompt="Describe the scene"
  )
  ```

  ```javascript Node.js theme={null}
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 10, frame_count: 1 },
      prompt: "Describe the scene"
  });
  ```
</CodeGroup>

### Motion and Activity (3-5 frames)

For understanding movement and temporal changes:

<CodeGroup>
  ```python Python theme={null}
  # Multiple frames to capture motion
  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 5, "frame_count": 5},
      prompt="Describe the activity and how it progresses"
  )
  ```

  ```javascript Node.js theme={null}
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 5, frame_count: 5 },
      prompt: "Describe the activity and how it progresses"
  });
  ```
</CodeGroup>

### Key Moment Selection

Select specific frames within each scene:

<CodeGroup>
  ```python Python theme={null}
  # First and last frames only
  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 10, "select_frames": ["first", "last"]},
      prompt="Describe how the scene changes from start to end"
  )
  ```

  ```javascript Node.js theme={null}
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 10, select_frames: ["first", "last"] },
      prompt: "Describe how the scene changes from start to end"
  });
  ```
</CodeGroup>

***

## Combining Modalities

Index both spoken and visual content, then search across both:

<CodeGroup>
  ```python Python theme={null}
  from videodb import IndexType, SearchType

  # Index both modalities
  video.index_spoken_words()
  video.index_scenes(prompt="Describe the visual content")

  # Search spoken content
  spoken_results = video.search(
      query="discusses climate change",
      index_type=IndexType.spoken_word
  )

  # Search visual content
  visual_results = video.search(
      query="shows melting glaciers",
      index_type=IndexType.scene
  )
  ```

  ```javascript Node.js theme={null}
  import { IndexTypeValues, SearchTypeValues } from 'videodb';

  // Index both modalities
  await video.indexSpokenWords();
  await video.indexScenes({ prompt: "Describe the visual content" });

  // Search spoken content
  const spokenResults = await video.search(
      "discusses climate change",
      SearchTypeValues.semantic,
      IndexTypeValues.spoken
  );

  // Search visual content
  const visualResults = await video.search(
      "shows melting glaciers",
      SearchTypeValues.semantic,
      IndexTypeValues.scene
  );
  ```
</CodeGroup>

***

## Extraction Examples

### Traffic Monitoring

<CodeGroup>
  ```python Python theme={null}
  # Detect vehicle colors (single frame sufficient)
  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 1, "frame_count": 1},
      prompt="Identify the color and type of each vehicle"
  )

  # Detect stopped vehicles (need multiple frames)
  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 4, "frame_count": 5},
      prompt="Identify if any vehicle has stopped or is moving slowly"
  )
  ```

  ```javascript Node.js theme={null}
  // Detect vehicle colors (single frame sufficient)
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 1, frame_count: 1 },
      prompt: "Identify the color and type of each vehicle"
  });

  // Detect stopped vehicles (need multiple frames)
  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 4, frame_count: 5 },
      prompt: "Identify if any vehicle has stopped or is moving slowly"
  });
  ```
</CodeGroup>

### Educational Content

<CodeGroup>
  ```python Python theme={null}
  # Combine visual and spoken indexing
  video.index_spoken_words()

  video.index_scenes(
      extraction_type=SceneExtractionType.time_based,
      extraction_config={"time": 30, "select_frames": ["first", "middle", "last"]},
      prompt="Describe diagrams, equations, or visual aids shown"
  )
  ```

  ```javascript Node.js theme={null}
  // Combine visual and spoken indexing
  await video.indexSpokenWords();

  await video.indexScenes({
      extractionType: 'time',
      extractionConfig: { time: 30, select_frames: ["first", "middle", "last"] },
      prompt: "Describe diagrams, equations, or visual aids shown"
  });
  ```
</CodeGroup>

***

## Next Steps

<CardGroup cols={2}>
  <Card icon="search" title="Multiple Indexes" href="/pages/understand/indexing-pipelines/multiple-indexes">
    Layer different perspectives on the same media
  </Card>

  <Card icon="target" title="Accuracy Tips" href="/pages/understand/quality-and-evaluation/accuracy-tips">
    Improve precision and recall
  </Card>
</CardGroup>
