Skip to main content

The Problem

AI agents can reason about text brilliantly. But show them a 30-minute meeting recording and ask “what did the client say about pricing?” - they fail. Video files are opaque blobs. Your agent can’t query them, can’t search them, can’t get timestamped answers from them.

The Platform Loop

Every VideoDB workflow follows the same pattern:
See → Understand → Act
StageWhat HappensReturns
SeeIngest from files, streams, or desktop captureVideo, RTStream, or CaptureSession
UnderstandCreate indexes. Search with natural language.Timestamped moments with playable evidence
ActTrigger alerts. Compose edits. Export streams.Webhooks, playable URLs, downloadable files

Quick Example

import videodb

conn = videodb.connect()

# SEE: Ingest
coll = conn.get_collection()
video = coll.upload(url="https://example.com/meeting.mp4")

# UNDERSTAND: Index and search
video.index_spoken_words()
results = video.search("pricing discussion")

# ACT: Use the results
for shot in results.shots:
    print(f"{shot.start}s - {shot.end}s: {shot.text}")
    shot.play()  # Playable proof

See: Three Input Types

SourceMethodReturns
Filescoll.upload(url="...")Video
Live streamsconn.connect_rtstream(url="...")RTStream
Desktop captureconn.create_capture_session(...)CaptureSessionRTStream
# Files
coll = conn.get_collection()
video = coll.upload(url="https://youtube.com/watch?v=...")

# Live RTSP
rtstream = conn.connect_rtstream(url="rtsp://camera.local/stream")

# Desktop capture
cap = conn.create_capture_session(end_user_id="user_123")
Same APIs work downstream. Index a Video or an RTStream the same way.

Understand: Indexes Are Everything

Indexes are what transform opaque media into searchable knowledge. You create them with prompts.

Spoken Index

Transcribes audio and makes it searchable:
video.index_spoken_words()
# or for live:
rtstream.start_transcript()

Visual Index

Understands what’s happening on screen:
video.index_visuals(prompt="Describe key activities and events")
# or for live:
rtstream.index_visuals(prompt="Describe what user is doing")

Multiple Indexes

Create different perspectives on the same media:
# Same video, different questions
safety_index = video.index_visuals(prompt="Identify safety violations")
summary_index = video.index_visuals(prompt="Summarize each segment")
Indexes are additive. Add new ones without reprocessing. Remove old ones without affecting others.

Search Returns Evidence

Search returns timestamps and playable links - not just “found” but verifiable.
results = video.search("product demo")

for shot in results.shots:
    print(f"{shot.start}s - {shot.end}s")  # Timestamps
    print(f"Content: {shot.text}")         # What was found
    print(f"Score: {shot.search_score}")   # Relevance
    shot.play()                            # Play it to verify
Every result maps to a playable moment. Your agent can cite its sources.

Act: Events, Alerts, Editing

Trigger on conditions

# Create a reusable event
event_id = conn.create_event(
    event_prompt="Detect when someone mentions 'budget'",
    label="budget_mention"
)

# Wire it to an index
index.create_alert(event_id=event_id, callback_url="https://...")

Compose with code

from videodb.editor import Timeline, Track, Clip, VideoAsset

timeline = Timeline(conn)
track = Track()
track.add_clip(0, Clip(asset=VideoAsset(id=video.id), duration=30))
timeline.add_track(track)

stream_url = timeline.generate_stream()

Objects at a Glance

ObjectWhat It Represents
ConnectionYour authenticated session
CollectionContainer for organizing media
VideoUploaded video
RTStreamLive stream (RTSP or capture)
IndexSearchable interpretation layer
SearchResultQuery results with shots
ShotSingle timestamped match
EventReusable detection rule
AlertEvent + delivery config
TimelineProgrammatic edit composition

Next Steps