Give Your AI Agents Eyes and Ears

Your agents read text. They generate text. But the world isn’t text - it’s video calls, security feeds, screen recordings, and live streams. VideoDB is the perception layer that lets agents see, hear, remember, and act on continuous media.

Quickstart

Give your agent perception in 5 minutes

Core Concepts

Understand the platform architecture

What You Can Build

Desktop Agents

Stream screen, mic, and camera. Get real-time context about what the user is doing and saying.Sales Copilot →

Video RAG

Search across hours of meetings, lectures, or archives. Get timestamped moments with playable evidence.Multimodal Search →

Real-time Monitoring

Connect RTSP cameras and drones. Detect events as they happen. Trigger alerts and automations.Intrusion Detection →

Media Automation

Compose videos with code. Generate voice, music, and images. Export to any format.Faceless Video Creator →

Browse All Examples

Explore 30+ examples across AI Copilots, Video RAG, Live Intelligence, Content Factory, and more

The Platform Loop

Every workflow follows the same pattern:

See → Understand → Act

Stage	What Happens
See	Ingest from files, live streams, or desktop capture
Understand	Index with prompts. Search with natural language. Get timestamped moments.
Act	Trigger alerts, compose edits, export streams

import videodb

conn = videodb.connect()

# See: Get an active stream (from desktop capture or RTSP)
rtstream = conn.get_rtstream("rts-abc123")

# Understand: Create indexes on the live stream
visual_index = rtstream.index_visuals(prompt="Describe what the user is doing")
audio_index = rtstream.index_audio(prompt="Extract key decisions and action items")

# Act: Create an event and attach an alert
event = conn.create_event(
    event_prompt="Detect when someone mentions a deadline or due date"
)
alert = audio_index.create_alert(
    webhook_url="https://your-backend.com/webhooks/deadline-mentioned"
)

# Real-time events arrive via WebSocket or webhook
# { "channel": "alert", "timestamp": "2026-02-11T12:18:00.968810+00:00", "rtstream_id": "rts-xxx", "rtstream_name": "Meeting", "data": { "event_id": "event-77aae6b981970542", "label": "objection", "triggered": true, "confidence": 0.9, "start": 1770812246.3445818, "end": 1770812277.3488276 } }

Install the SDK

pip install videodb

Python SDK

GitHub, PyPI, and setup guide

Node.js SDK

npm, TypeScript, and setup guide

Philosophy

Why perception is the next frontier for AI agents.

Why AI Agents Are Blind Today

The gap between human perception and agent perception

Perception Is the Missing Layer

The stack that gives agents eyes and ears

MP4 Is the Wrong Primitive

Why video files don’t work for AI

What Episodic Memory Means for Agents

Remember experiences, not just facts

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

Welcome to VideoDB

Give Your AI Agents Eyes and Ears

Quickstart

Core Concepts

What You Can Build

Desktop Agents

Video RAG

Real-time Monitoring

Media Automation

Browse All Examples

The Platform Loop

Install the SDK

Python SDK

Node.js SDK

Philosophy

Why AI Agents Are Blind Today

Perception Is the Missing Layer

MP4 Is the Wrong Primitive

What Episodic Memory Means for Agents

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

​Give Your AI Agents Eyes and Ears

Quickstart

Core Concepts

​What You Can Build

Desktop Agents

Video RAG

Real-time Monitoring

Media Automation

Browse All Examples

​The Platform Loop

​Install the SDK

Python SDK

Node.js SDK

​Philosophy

Why AI Agents Are Blind Today

Perception Is the Missing Layer

MP4 Is the Wrong Primitive

What Episodic Memory Means for Agents

Give Your AI Agents Eyes and Ears

What You Can Build

The Platform Loop

Install the SDK

Philosophy