The Platform Loop
See (Ingest)
Get video and audio from anywhere into VideoDB.| Source | Method |
|---|---|
| File URL | coll.upload(url="https://...") |
| Local file | coll.upload(file_path="./video.mp4") |
| RTSP stream | coll.connect_rtstream(url="rtsp://...") |
| Desktop capture | Capture SDK (screen, mic, camera) |
Process
Built-in primitives convert raw media into processable units. This happens automatically when you create indexes.- Scene segmentation - Time-based, shot-based, or prompt-guided
- Frame sampling - Control which frames to analyze
- Audio chunking - Word, sentence, or time-based segments
Understand (Indexes)
Indexes are programmable interpretation layers. You define what to extract with prompts.- Prompt-driven - Natural language instructions
- Model-orchestrated - LLMs and VLMs do the work
- Additive - Multiple indexes on same media
- Multimodal - Visual and spoken
Remember
Indexes are stored as episodic memory. This is automatic by default. What gets stored:- Transcripts and embeddings
- Scene descriptions and tags
- Structured metadata
- Retrieval structures
Retrieve (Search)
Search across indexed content with natural language. Results include playable evidence.- Timestamps - Exact start/end times
- Text - What was detected
- Score - Relevance ranking
- Stream URL - Playable link
Act
Go from understanding to automation and outputs.Event Detection
React to conditions in real-time:Programmable Editing
Compose outputs using the 4-layer editor architecture:Architecture Patterns
The loop applies to different use cases:| Use Case | See | Understand | Act |
|---|---|---|---|
| Video RAG | Upload files | Index with domain prompts | Search + retrieve |
| Monitoring | Connect RTSP | Real-time indexing | Alerts + webhooks |
| Desktop Agent | Capture SDK | Index screen/mic | Context for LLM |
| Media Automation | Upload + transcode | Index for editing | Timeline + export |