A video is a series of images that are called frames, these frames can be processed using multimodal modals or computer vision pipelines. There are many ways to identify the temporal change of concepts in the video.
SceneExtractionTypeand extraction_configcan be used with two functions as parameters for scene identification.
It can be passed to index_scenes() function as an argument.
It can be pass as an argument to extract_scenes() function.
Time based extraction is a simple way to break video into scenes. You define a frequency at which you want to split the video in scenes, for example, you may consider every 10 second as a one scene. This method is useful when you have no information about the nature of video or the video is random & dynamic. You can even create scenes with 1 second time interval.
This method has following extraction_config:
time : The interval (in seconds) at which scenes are segmented. Default value is 10 — which means every 10 seconds segment is a scene.
select_frames: A list of frames to select from each segment. The list can contain strings from ["first", "middle", or "last"] which selects the respective frames. Default value is ["first"]
Videos share context between timestamps. A scene is a logical segment of a video that completes a concept. You can identify scene changes based on visual content within the video.
Key factors for calculating changes are significant changes in the visual content, such as transitions, lights and movement.
This method has following extraction_config:
threshold: Determines the sensitivity of the model towards scene changes within the video. Default value is 20, which known to be good for detecting camera shot changes from a video.
frame_count: Accepts a number that specifies how many frames to pick from each shot. Default value is 1. Increasing this number will result in more frames being selected from each shot, which could provide a more detailed analysis of the scene.