Documentation Index Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
Use this file to discover all available pages before exploring further.
See (Ingest) → Process → Understand (Indexes) → Remember → Retrieve (Search) → Act
import videodb
conn = videodb.connect()
coll = conn.get_collection()
# SEE: Ingest from any source
video = coll.upload( url = "https://example.com/video.mp4" )
# UNDERSTAND: Create an index
index_id = video.index_visuals( prompt = "Extract key moments" )
# RETRIEVE: Search with natural language
results = video.search( "important announcement" , index_id = index_id)
# ACT: Generate outputs, trigger actions
for shot in results.shots:
print ( f " { shot.start } s - { shot.end } s: { shot.text } " )
shot.play() # Playable evidence
See (Ingest)
Get video and audio from anywhere into VideoDB.
Source Method File URL coll.upload(url="https://...")Local file coll.upload(file_path="./video.mp4")RTSP stream coll.connect_rtstream(url="rtsp://...")Desktop capture Capture SDK (screen, mic, camera)
# File-based
video = coll.upload( url = "https://example.com/meeting.mp4" )
# Live stream
rtstream = coll.connect_rtstream(
name = "Security Camera" ,
url = "rtsp://user:pass@host:554/stream"
)
Process
Built-in primitives convert raw media into processable units. This happens automatically when you create indexes.
Scene segmentation - Time-based, shot-based, or prompt-guided
Frame sampling - Control which frames to analyze
Audio chunking - Word, sentence, or time-based segments
# Time-based: every 10 seconds
video.index_scenes(
extraction_type = "time_based" ,
extraction_config = { "time" : 10 }
)
# For RTStream: batch config
rtstream.index_visuals(
batch_config = { "type" : "time" , "value" : 5 , "frame_count" : 2 }
)
This is where cost control happens - sampling policies trade compute for recall.
Understand (Indexes)
Indexes are programmable interpretation layers. You define what to extract with prompts.
Prompt-driven - Natural language instructions
Model-orchestrated - LLMs and VLMs do the work
Additive - Multiple indexes on same media
Multimodal - Visual and spoken
# Visual understanding
visual_index = video.index_scenes(
prompt = "Identify key moments and describe activities"
)
# Spoken understanding
transcript = video.index_spoken_words()
# Multiple indexes = multiple perspectives
safety_index = video.index_scenes( prompt = "Identify safety issues" )
summary_index = video.index_scenes( prompt = "Summarize each segment" )
Remember
Indexes are stored as episodic memory. This is automatic by default.
What gets stored:
Transcripts and embeddings
Scene descriptions and tags
Structured metadata
Retrieval structures
Ephemeral mode - For live sessions, you can choose not to persist:
rtstream.index_visuals(
prompt = "..." ,
ephemeral = True # Process but don't persist
)
Retrieve (Search)
Search across indexed content with natural language. Results include playable evidence.
# Single video
results = video.search( "product demo" )
# Single stream
results = rtstream.search( "intrusion" )
# Collection-wide
results = coll.search( "quarterly results" , index_type = "scene" )
Results include:
Timestamps - Exact start/end times
Text - What was detected
Score - Relevance ranking
Stream URL - Playable link
for shot in results.shots:
print ( f " { shot.start } s: { shot.text } (score: { shot.search_score } )" )
shot.play() # Verify the result
Act
Go from understanding to automation and outputs.
Event Detection
React to conditions in real-time:
event_id = conn.create_event(
event_prompt = "Detect intruder" ,
label = "security_alert"
)
index.create_alert(
event_id = event_id,
callback_url = "https://your-backend.com/alerts"
)
Programmable Editing
Compose outputs using the 4-layer editor architecture:
from videodb.editor import Timeline, Track, Clip, VideoAsset
video_asset = VideoAsset( id = video.id, start = 10 )
clip = Clip( asset = video_asset, duration = 20 )
track = Track()
track.add_clip( 0 , clip)
timeline = Timeline(conn)
timeline.add_track(track)
output = timeline.generate_stream()
Architecture Patterns
The loop applies to different use cases:
Use Case See Understand Act Video Search Upload files Index with domain prompts Search + retrieve Monitoring Connect RTSP Real-time indexing Alerts + webhooks Desktop Agent Capture SDK Index screen/mic Context for LLM Media Automation Upload + transcode Index for editing Timeline + export
Next Steps
Data Model Collections, Videos, RTStreams, and other core objects
Indexes Turn media into searchable knowledge
Search & Retrieval How search returns playable evidence
Events & Alerts Real-time detection and automation