> ## Documentation Index > Fetch the complete documentation index at: https://docs.videodb.io/llms.txt > Use this file to discover all available pages before exploring further. # Welcome to VideoDB > The perception, memory, and action for AI agents Your agents can read text and static images. But the real world is live, continuous, and always changing. To operate with real context, your agent needs real-time access to video calls, camera feeds, screen recordings, and live internet streams. VideoDB is the perception layer that lets agents see, hear, remember, and act on continuous media. Most AI development focuses on text and static images, but video remains a significant hurdle because of its density and lack of structure. VideoDB turns raw pixel data into structured context that agents can query, reason about, and act upon in real time. For agents to move beyond text boxes and interact with the physical or digital world via screens and cameras, they need a way to parse continuous visual and auditory data. VideoDB provides this through a specialized database that indexes video at the scene level - making it possible for an agent to "recall" specific events or "see" real-time occurrences without excessive compute costs. Give your agent perception in 5 minutes Understand the platform architecture ### How It Works The platform operates through three stages: **See**, **Understand**, and **Act**. | Stage | What Happens | | :------------- | :----------------------------------------------------------------------------------------- | | **See** | Capture SDK or live stream integration takes in media from files, desktops, or cameras | | **Understand** | Build specialized indexes for transcripts, visual scenes, or custom prompts | | **Act** | Query, search, edit, and export - agents can generate summaries or clips based on findings | Rather than merely storing video files, the platform indexes frames and audio to support semantic retrieval. This allows an agent to ask for a specific moment in a continuous stream without downloading or processing the entire file. The architecture sits above transport protocols and below the reasoning engine. This separation means you can use VideoDB with any Large Language Model or Large Video Model. By consolidating transcription, frame extraction, vector indexing, and video playback into a single platform, VideoDB addresses the high total cost of ownership typically associated with video AI. ### Skills: Native Agent Experiences Since VideoDB handles server-side video processing, indexing, and retrieval, developers can use [skills](/pages/getting-started/agent-skills) to create agent workflows that feel native to their environment. Skills give agents like Claude Code and Codex structured perception primitives - capture, search, edit, stream - without writing infrastructure code. ```bash theme={null} npx skills add video-db/skills ``` *** ## What You Can Build Stream screen, mic, and camera. Get real-time context about what the user is doing and saying. [Call.md →](/examples-and-tutorials/ai-copilots/call-md) Search across hours of meetings, lectures, or archives. Get timestamped moments with playable evidence. [Multimodal Search →](/examples-and-tutorials/video-rag/multimodal-search) Connect RTSP cameras and drones. Detect events as they happen. Trigger alerts and automations. [Intrusion Detection →](/examples-and-tutorials/live-intelligence/intrusion-detection) Compose videos with code. Generate voice, music, and images. Export to any format. [Faceless Video Creator →](/examples-and-tutorials/content-factory/faceless-video-creator) Add real-time perception to coding assistants and autonomous agents. Screen capture, audio indexing, and searchable context. [Agent Skills →](/pages/getting-started/agent-skills) Explore examples across AI Copilots, Video Search, Live Intelligence, Content Factory, and more *** ## Example: Real-time Alerting ```python Python theme={null} import videodb conn = videodb.connect() # See: Get an active stream (from desktop capture or RTSP) rtstream = conn.get_rtstream("rts-abc123") # Understand: Create indexes on the live stream visual_index = rtstream.index_visuals(prompt="Describe what the user is doing") audio_index = rtstream.index_audio(prompt="Extract key decisions and action items") # Act: Create an event and attach an alert event = conn.create_event( event_prompt="Detect when someone mentions a deadline or due date" ) alert = audio_index.create_alert( webhook_url="https://your-backend.com/webhooks/deadline-mentioned" ) # Real-time events arrive via WebSocket or webhook # { "channel": "alert", "timestamp": "2026-02-11T12:18:00.968810+00:00", "rtstream_id": "rts-xxx", "rtstream_name": "Meeting", "data": { "event_id": "event-77aae6b981970542", "label": "objection", "triggered": true, "confidence": 0.9, "start": 1770812246.3445818, "end": 1770812277.3488276 } } ``` ```javascript Node.js theme={null} import { connect } from 'videodb'; const conn = connect(); // See: Get an active stream (from desktop capture or RTSP) const rtstream = await conn.getRtstream("rts-abc123"); // Understand: Create indexes on the live stream const visualIndex = await rtstream.indexVisuals({ prompt: "Describe what the user is doing" }); const audioIndex = await rtstream.indexAudio({ prompt: "Extract key decisions and action items" }); // Act: Create an event and attach an alert const event = await conn.createEvent({ eventPrompt: "Detect when someone mentions a deadline or due date" }); const alert = await audioIndex.createAlert({ webhookUrl: "https://your-backend.com/webhooks/deadline-mentioned" }); // Real-time events arrive via WebSocket or webhook // { "channel": "alert", "timestamp": "2026-02-11T12:18:00.968810+00:00", "rtstream_id": "rts-xxx", "rtstream_name": "Meeting", "data": { "event_id": "event-77aae6b981970542", "label": "objection", "triggered": true, "confidence": 0.9, "start": 1770812246.3445818, "end": 1770812277.3488276 } } ``` *** ## Install the SDK ```bash Python theme={null} pip install videodb ``` ```bash Node.js theme={null} npm install videodb ``` GitHub, PyPI, and setup guide npm, TypeScript, and setup guide *** ## Philosophy Why perception is the next frontier for AI agents. The gap between human perception and agent perception The stack that gives agents eyes and ears Why video files don't work for AI Remember experiences, not just facts ***