Skip to content
videodb
VideoDB Documentation
  • Pages
    • Welcome to VideoDB Docs
    • Quick Start Guide
      • Video Indexing Guide
      • Semantic Search
      • How Accurate is Your Search?
      • Collections
      • Public Collections
      • Callback Details
      • Ref: Subtitle Styles
      • Language Support
      • Guide: Subtitles
    • Examples and Tutorials
      • Dubbing - Replace Soundtrack with New Audio
      • VideoDB x TwelveLabs: Real-Time Video Understanding
      • Beep curse words in real-time
      • Remove Unwanted Content from videos
      • Instant Clips of Your Favorite Characters
      • Insert Dynamic Ads in real-time
      • Adding Brand Elements with VideoDB
      • Eleven Labs x VideoDB: Adding AI Generated voiceovers to silent footage
      • Elevating Trailers with Automated Narration
      • Add Intro/Outro to Videos
      • Audio overlay + Video + Timeline
      • Building Dynamic Video Streams with VideoDB: Integrating Custom Data and APIs
      • AI Generated Ad Films for Product Videography: Wellsaid, Open AI & VideoDB
      • Fun with Keyword Search
      • AWS Rekognition and VideoDB - Effortlessly Remove Inappropriate Content from Video
      • Overlay a Word-Counter on Video Stream
      • Generate Automated Video Outputs with Text Prompts | DALL-E + ElevenLabs + OpenAI + VideoDB
    • icon picker
      Visual Search and Indexing
      • Scene Extraction Algorithms
      • Custom Annotations
      • Scene-Level Metadata: Smarter Video Search & Retrieval
      • Advanced Visual Search Pipelines
      • Playground for Scene Extractions
      • Deep Dive into Prompt Engineering : Mastering Video Scene Indexing
    • Multimodal Search
      • Multimodal Search: Quickstart
      • Conference Slide Scraper with VideoDB
    • Real‑Time Video Pipeline
    • Meeting Recording SDK
    • Generative Media Quickstart
      • Generative Media Pricing
    • Realtime Video Editor SDK
      • Fit & Position: Aspect Ratio Control
      • Trimming vs Timing: Two Independent Timelines
      • Advanced Clip Control: The Composition Layer
      • Caption & Subtitles: Auto-Generated Speech Synchronization
      • Notebooks
    • Transcoding Quickstart
    • director-light
      Director - Video Agent Framework
      • Agent Creation Playbook
      • How I Built a CRM-integrated Sales Assistant Agent in 1 Hour
      • Make Your Video Sound Studio Quality with Voice Cloning
      • Setup Director Locally
    • github
      Open Source Tools
      • llama
        LlamaIndex VideoDB Retriever
      • PromptClip: Use Power of LLM to Create Clips
      • StreamRAG: Connect ChatGPT to VideoDB
    • zapier
      Zapier Integration
      • Auto-Dub Videos & Save to Google Drive
      • Create & Add Intelligent Video Highlights to Notion
      • Create GenAI Video Engine - Notion Ideas to Youtube
      • Automatically Detect Profanity in Videos with AI - Update on Slack
      • Generate and Store YouTube Video Summaries in Notion
      • Automate Subtitle Generation for Video Libraries
      • Solve customers queries with Video Answers
    • n8n
      N8N Workflows
      • AI-Powered Meeting Intelligence: Recording to Insights Automation
      • AI Powered Dubbing Workflow for Video Content
      • Automate Subtitle Generation for Video Libraries
      • Automate Interview Evaluations with AI
      • Turn Meeting Recordings into Actionable Summaries
      • Auto-Sync Sales Calls to HubSpot CRM with AI
      • Instant Notion Summaries for Your Youtube Playlist
    • mcp
      VideoDB MCP Server
    • Edge of Knowledge
      • Building Intelligent Machines
        • Part 1 - Define Intelligence
        • Part 2 - Observe and Respond
        • Part 3 - Training a Model
      • Society of Machines
        • Society of Machines
        • Autonomy - Do we have the choice?
        • Emergence - An Intelligence of the collective
      • From Language Models to World Models: The Next Frontier in AI
      • The Future Series
      • How VideoDB Solves Complex Visual Analysis Tasks
    • videodb
      Building World's First Video Database
      • Multimedia: From MP3/MP4 to the Future with VideoDB
      • Dynamic Video Streams
      • Why do we need a Video Database Now?
      • What's a Video Database ?
      • Enhancing AI-Driven Multimedia Applications
      • Misalignment of Today's Web
      • Beyond Traditional Video Infrastructure
      • Research Grants
    • Customer Love
    • Team
      • videodb
        Internship: Build the Future of AI-Powered Video Infrastructure
      • Ashutosh Trivedi
        • Playlists
        • Talks - Solving Logical Puzzles with Natural Language Processing - PyCon India 2015
      • Ashish
      • Shivani Desai
      • Gaurav Tyagi
      • Rohit Garg
      • VideoDB Acquires Devzery: Expanding Our AI Infra Stack with Developer-First Testing Automation

Index Scenes

The versatility of scene indexing opens up a world of possibilities for finding visual information in videos. Vision models now enable useful extraction of information from videos that you can easily index using VideoDB.
Now, you can easily build RAG for queries like:
Scene Index_Announcement_improved.png
Show me where birds are flying near a castle. Show me when the person took out the gun. Show where people are running towards the sea

index_id = video.index_scenes()
In just one command, the index_scenes function can index visual information in your video.


Optional Parameters

index_scenes() function accepts a few optional parameters.
You can use different extraction algorithms to select scene and frames.
Additionally, you can use prompts to describe these scenes and frames using a vision model.

from videodb import IndexType
from videodb import SceneExtractionType
index_id = video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time":10, "select_frames": ['first']},
prompt="describe the image in 100 words",
callback_url=callback_url,
)

# Wait to Indexing to finish
scene_index = video.get_scene_index(index_id)
print(scene_index)


# search your video with index_id,
# Default Case: search all indexes
res = video.search(query="religious gathering",
index_type=IndexType.scene,
index_id=index_id)
res.play()

info
extraction_type - Choose scene extraction algorithm.
extraction_config - Configuration of scene extraction algorithm.
prompt - Prompt to describe each scene in text.
callback_url - Notification url when the job is done.

Let’s go in detail of each parameter:

extraction_type

Visually, a video is a series of images in a timeline. A 60 fps video, for instance, shows 60 frames per second and feels higher in quality compared to a 30 fps video. Parameter extraction_type, can be used to experiment with the scene extraction algorithms and in-turn choosing the frames that are relevant to describe details. Checkout for details.
Screenshot 2024-07-04 at 11.41.39 AM.jpg

prompt

Prompt is for the vision models to understand the context and nature of output that you want. For example, if someone is interested in identifying running activity they can use following prompt to describe each scene:
“Describe clearly what is happening in the video. Add running_detected if you see a person running.”
If you are interested in experimenting with your own model, and prompts Checkout
light
Currently scene index is well suited for semantic search, try to have your prompts designed to output well written prose that can be indexed for semantic search.
😎 Soon we are going to support json and sql data extraction and indexing.

callback_url

URL to send notification when the scene index process is completed.
Checkout 👀

Managing Indexes

List all the scene Indexes created for a video.
scene_indexes = video.list_scene_index()
This function returns a list of available scene indexes with id name and status

Get Specific Index
scene_index = video.get_scene_index(scene_index_id)
This function returns a list of indexed scenes with start end and description

Delete a index
video.delete_scene_index(index_id)

Create multiple indexes for one video

light
You can create multiple scene indexes for a video.
Use these indexes to search different layers of topics and concepts within a single video.
Screenshot 2024-07-04 at 12.29.46 PM.jpg

Deep Dive

Pass your metadata for search filters
If you want to bring your own scene descriptions and annotations, explore the Pipeline.
Experiment with extraction algorithms, prompts, and search using the
Check out our open and flexible
note
In our upcoming releases, we are introducing integration with numerous metadata stores. This will allow you to extract not just plain text, but also JSON or tabular information from videos. You can then index this data using the database of your choice. Currently, we only offer vector indexing, but we plan to expand this to include more methods for finding information, such as filters, searches, and queries.
Additionally, we will introduce integration with vision models of your choice.


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.