VideoDB Documentation

Pages

Visual Search and Indexing

Scene-Level Metadata: Smarter Video Search & Retrieval

⁠

Introduction

James and Mark are video engineers at leading sports entertainment companies, responsible for managing, storing, and editing vast amounts of sports footage. Their work requires them to extract and highlight the most exciting moments from hours of raw video.

Both rely on VideoDB to streamline their workflow, but their approaches differ.

Mark follows the traditional method: he indexes the entire video and runs a search for relevant scenes. VideoDB processes every indexed scene, analyzing descriptions either semantically or via keywords. The results are useful but not always efficient—especially when the relevant content spans just a few minutes in a multi-hour video. The search still scans everything, sometimes returning unrelated clips.

James, on the other hand, has a smarter strategy. Instead of searching the entire video, he first filters out irrelevant scenes, ensuring that only important moments are considered. This results in faster and more precise searches. How does he achieve this? By using Scene-Level Metadata.

What is Scene-Level Metadata?

⁠

Scene-Level Metadata acts as smart tags for individual video scenes, allowing them to be filtered during search. Instead of relying solely on text descriptions, VideoDB enables metadata-based filtering to refine search results and make retrieval more efficient.

Why is this necessary?

Every video consists of multiple scenes, each composed of frames. By default, VideoDB scans every scene to find relevant content.

This works well for short videos, but when handling longer videos, only a few scenes may actually be relevant. Searching across the entire video can lead to:

Slower retrieval times

Less accurate results

By tagging scenes with metadata, we can focus the search only on relevant parts of the video, significantly improving accuracy and efficiency.

How is Metadata Stored?

Metadata is stored as a dictionary in the Scene object, with a maximum of five key-value pairs per scene.

Here’s an example:

scene = Scene(

video_id=video.id,

start=60,

end=62,

description="A Red Bull car speeds down the straight at Monza.",

metadata={"camera_view": "road_ahead", "action_type": "chasing"}

)

With Scene-Level Metadata, we can apply targeted filters, ensuring that searches return only highly relevant scenes.

Example: Using Scene-Level Metadata in an F1 Race

⁠

James, our video engineer, works with Formula 1 race footage, which consists of continuous laps of high-speed action. To create engaging highlights, he needs to focus on the most thrilling moments:

Chasing battles

Sharp turns

Overtaking maneuvers

Dramatic crashes

Instead of searching the entire race, James applies Scene-Level Metadata to tag these key moments, ensuring faster and more accurate retrieval.

Defining Metadata Filters

James decides to apply metadata filters using the "action_type" key, assigning one of the following values:

📌 ["chase", "turn", "overtake", "crash"]

For simplicity, he uses only one key-value pair per scene, but he could add multiple filters (e.g., "camera_view", "lap_number") for even more precise results.

James' Workflow with VideoDB

⁠

Step 1: Extract Scenes from the Footage

To improve indexing, James splits the video into 2-second scenes and extracts a single key frame per scene.

scene_collection = video.extract_scenes(

extraction_type=SceneExtractionType.time_based,

extraction_config={"time": 2, "select_frames": ["middle"]}

)

scenes = scene_collection.scenes # Fetch extracted scenes

Step 2: Assign Metadata to Each Scene

James uses AI-powered descriptions to automatically tag scenes with the correct action type before indexing.

described_scenes = []

for scene in scenes:

# use describe to create smart metadata, category, filter etc.

action_type = scene.describe('Select one: ["chase", "turn", "overtake", "crash"]')

# use prompt to index contextual information that you need to search in vectors.

# use metadata to add structured information to each scene.

described_scene = Scene(

video_id=video.id,

start=scene.start,

end=scene.end,

description=scene.describe("Describe this scene briefly."),

metadata={"action_type": action_type}

)

described_scenes.append(described_scene)

Step 3: Index the Video with Scene Metadata

Once metadata is assigned, James indexes the scenes for efficient searching.

scene_index_id = video.index_scenes(

scenes=described_scenes,

name="F1 Highlight Scenes"

)

Step 4: Searching with Metadata Filters

Now, instead of searching the entire video, James can filter his search to focus on only specific race moments.

Applying Metadata Filters in Search

⁠

Example 1: Finding Intense Overtakes

To find all overtaking moments, James applies a metadata filter:

search_results = video.search(

query="A thrilling overtaking maneuver",

filter=[{"action_type": "overtake"}], # Apply metadata filter

search_type=SearchType.semantic,

index_type=IndexType.scene,

scene_index_id=scene_index_id

)

search_results.play() # View the retrieved scenes

Example 2: Finding Chase Scenes in the Race

To retrieve close pursuit moments, James filters for chase scenes:

search_results = video.search(

query="An aggressive chase on the track",

filter=[{"action_type": "chase"}], # Apply metadata filter

search_type=SearchType.semantic,

index_type=IndexType.scene,

scene_index_id=scene_index_id

)

search_results.play()

By applying Scene-Level Metadata, James dramatically improves his video search workflow.

Index level Metadata

⁠

metadata can be passed as parameter to the index_scenes function as well.

scene_index_id = video.index_scenes(

extraction_type=SceneExtractionType.time_based,

extraction_config={"time": 540},

metadata={"category": "news", "topic": "airplane"},

)

The metadata you passed during the indexing process, would apply to all the scenes that you are indexing.

Depending on your application, you may have additional scene-related metadata, which can be included within the metadata parameter. Please refer to the metadata guidelines.

Metadata Guidelines:

metadata must be a dictionary containing key-value pairs.

Both keys and values can be of type int or string.

A maximum of 5 key-value pairs is allowed.

The length of keys and values must not exceed 20 characters.

Filter results based on your criteria and you can pass more than one filter. This can be useful in timestamp based filtering of results, while exploring archival content and many such more categorical approaches to find the right content.

results = video.search(

query="airport",

filter=[{"category": "news"}],

index_type=IndexType.scene

)

results = coll.search(

query="airport",

filter=[{"category": "news"}],

index_type=IndexType.scene

)

Filter Guidelines:

Filter must be a list of dictionaries.

Each dictionary specifies a key-value pair to filter the results based on metadata.

Expanding the Use Cases

⁠

Metadata isn't just for sports highlights—it has applications across multiple industries:

🔹 Wildlife Documentation A raw wildlife documentary may contain hours of footage capturing slow-moving landscapes and sudden bursts of animal activity. But let’s say we’re only interested in tracking a pride of lions. With metadata tagging, we can filter out only the scenes featuring lions, making it easier to find the right content.

🔹 Tech Conferences & Keynote Events A multi-hour tech conference covers various topics—Blockchain, GenAI, Quantum Computing, etc. Instead of searching through entire sessions, we can tag segments based on their subjects and filter out irrelevant sections, making topic-based retrieval seamless.

🔹 Security & Surveillance In CCTV surveillance, hours of footage may contain only a few moments of interest, such as unauthorized access or suspicious activity. By tagging scenes based on motion detection, time of day, or facial recognition, security teams can instantly retrieve critical footage.

⁠

The Future of Smart Video Retrieval

Scene-Level Metadata is a game-changer in video indexing and retrieval. It enhances:

✅ Precision – Finds exactly what you’re looking for. ✅ Efficiency – Speeds up the search process. ✅ Scalability – Works with large video datasets effortlessly.

From Formula 1 highlights to security footage analysis, metadata-driven search makes video retrieval faster, smarter, and more intuitive than ever before.