Skip to main content

Give Your AI Agents Eyes and Ears

Your agents read text. They generate text. But the world isn’t text - it’s video calls, security feeds, screen recordings, and live streams. VideoDB is the perception layer that lets agents see, hear, remember, and act on continuous media.

Quickstart

Give your agent perception in 5 minutes

Core Concepts

Understand the platform architecture

What You Can Build

Desktop Agents

Stream screen, mic, and camera. Get real-time context about what the user is doing and saying.Sales Copilot →

Video RAG

Search across hours of meetings, lectures, or archives. Get timestamped moments with playable evidence.Multimodal Search →

Real-time Monitoring

Connect RTSP cameras and drones. Detect events as they happen. Trigger alerts and automations.Intrusion Detection →

Media Automation

Compose videos with code. Generate voice, music, and images. Export to any format.Faceless Video Creator →

Browse All Examples

Explore 30+ examples across AI Copilots, Video RAG, Live Intelligence, Content Factory, and more

The Platform Loop

Every workflow follows the same pattern:
See → Understand → Act
StageWhat Happens
SeeIngest from files, live streams, or desktop capture
UnderstandIndex with prompts. Search with natural language. Get timestamped moments.
ActTrigger alerts, compose edits, export streams
import videodb

conn = videodb.connect()

# See: Get an active stream (from desktop capture or RTSP)
rtstream = conn.get_rtstream("rts-abc123")

# Understand: Create indexes on the live stream
visual_index = rtstream.index_visuals(prompt="Describe what the user is doing")
audio_index = rtstream.index_audio(prompt="Extract key decisions and action items")

# Act: Create an event and attach an alert
event = conn.create_event(
    event_prompt="Detect when someone mentions a deadline or due date"
)
alert = audio_index.create_alert(
    webhook_url="https://your-backend.com/webhooks/deadline-mentioned"
)

# Real-time events arrive via WebSocket or webhook
# { "channel": "alert", "timestamp": "2026-02-11T12:18:00.968810+00:00", "rtstream_id": "rts-xxx", "rtstream_name": "Meeting", "data": { "event_id": "event-77aae6b981970542", "label": "objection", "triggered": true, "confidence": 0.9, "start": 1770812246.3445818, "end": 1770812277.3488276 } }

Install the SDK

pip install videodb

Python SDK

GitHub, PyPI, and setup guide

Node.js SDK

npm, TypeScript, and setup guide

Philosophy

Why perception is the next frontier for AI agents.

Why AI Agents Are Blind Today

The gap between human perception and agent perception

Perception Is the Missing Layer

The stack that gives agents eyes and ears

MP4 Is the Wrong Primitive

Why video files don’t work for AI

What Episodic Memory Means for Agents

Remember experiences, not just facts