VideoDB Documentation

Pages

Visual Search and Indexing

Scene Extraction Algorithms

SceneExtractionType

A video is a series of images that are called frames, these frames can be processed using multimodal modals or computer vision pipelines. There are many ways to identify the temporal change of concepts in the video.

⁠

Screenshot 2024-07-04 at 11.41.39 AM.jpg

⁠

SceneExtractionType and extraction_config can be used with two functions as parameters for scene identification.

It can be passed to index_scenes() function as an argument.

It can be pass as an argument to extract_scenes() function.

Checkout

Advanced Visual Search Pipelines⁠

for Scene and Frame object details.

⁠

Screenshot 2024-07-04 at 12.03.45 PM.jpg

⁠

Time based extraction is a simple way to break video into scenes. You define a frequency at which you want to split the video in scenes, for example, you may consider every 10 second as a one scene. This method is useful when you have no information about the nature of video or the video is random & dynamic. You can even create scenes with 1 second time interval.

This method has following extraction_config :

time : The interval (in seconds) at which scenes are segmented. Default value is 10 — which means every 10 seconds segment is a scene.

frame_count: The number of frames to extract per scene. This allows you to increase the number of frames collected for more context. Default value is 1.

select_frames: A list of frames to select from each segment. The list can contain strings from ["first", "middle", or "last"] which selects the respective frames. Default value is ["first"]

Note: You can use either select_frames or frame_count strategy to extract frames for the scene.

wait_index = traffic_video.index_scenes(

extraction_type=SceneExtractionType.time_based,

extraction_config={"time": 4, "frame_count": 5},

prompt="Identify when multiple cars are slowing down or waiting. Mention that cars are waiting or stopping and also specify the lane as left, middle, or right. For example, you can say `cars in the middle lanes are waiting`.",

name="wait_index"

)

extraction_type=SceneExtractionType.time_based,

extraction_config={"time":10, "select_frames": ['first']},

⁠

Screenshot 2024-07-04 at 12.13.39 PM.jpg

⁠

Videos share context between timestamps. A scene is a logical segment of a video that completes a concept. You can identify scene changes based on visual content within the video.

Key factors for calculating changes are significant changes in the visual content, such as transitions, lights and movement.

This method has following extraction_config :

threshold: Determines the sensitivity of the model towards scene changes within the video. Default value is 20, which known to be good for detecting camera shot changes from a video.

frame_count: Accepts a number that specifies how many frames to pick from each shot. Default value is 1. Increasing this number will result in more frames being selected from each shot, which could provide a more detailed analysis of the scene.

extraction_type=SceneExtractionType.shot_based,

extraction_config={"threshold":20, "frame_count":4},

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.