The Problem
AI agents can reason about text brilliantly. But show them a 30-minute meeting recording and ask “what did the client say about pricing?” - they fail. Video files are opaque blobs. Your agent can’t query them, can’t search them, can’t get timestamped answers from them.The Platform Loop
Every VideoDB workflow follows the same pattern:| Stage | What Happens | Returns |
|---|---|---|
| See | Ingest from files, streams, or desktop capture | Video, RTStream, or CaptureSession |
| Understand | Create indexes. Search with natural language. | Timestamped moments with playable evidence |
| Act | Trigger alerts. Compose edits. Export streams. | Webhooks, playable URLs, downloadable files |
Quick Example
See: Three Input Types
| Source | Method | Returns |
|---|---|---|
| Files | coll.upload(url="...") | Video |
| Live streams | conn.connect_rtstream(url="...") | RTStream |
| Desktop capture | conn.create_capture_session(...) | CaptureSession → RTStream |
Video or an RTStream the same way.
Understand: Indexes Are Everything
Indexes are what transform opaque media into searchable knowledge. You create them with prompts.Spoken Index
Transcribes audio and makes it searchable:Visual Index
Understands what’s happening on screen:Multiple Indexes
Create different perspectives on the same media:Search Returns Evidence
Search returns timestamps and playable links - not just “found” but verifiable.Act: Events, Alerts, Editing
Trigger on conditions
Compose with code
Objects at a Glance
| Object | What It Represents |
|---|---|
Connection | Your authenticated session |
Collection | Container for organizing media |
Video | Uploaded video |
RTStream | Live stream (RTSP or capture) |
Index | Searchable interpretation layer |
SearchResult | Query results with shots |
Shot | Single timestamped match |
Event | Reusable detection rule |
Alert | Event + delivery config |
Timeline | Programmatic edit composition |