> ## Documentation Index
> Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Welcome to VideoDB

> The perception, memory, and action for AI agents

Your agents can read text and static images. But the real world is live, continuous, and always changing. To operate with real context, your agent needs real-time access to video calls, camera feeds, screen recordings, and live internet streams.

VideoDB is the perception layer that lets agents see, hear, remember, and act on continuous media. Most AI development focuses on text and static images, but video remains a significant hurdle because of its density and lack of structure. VideoDB turns raw pixel data into structured context that agents can query, reason about, and act upon in real time.

For agents to move beyond text boxes and interact with the physical or digital world via screens and cameras, they need a way to parse continuous visual and auditory data. VideoDB provides this through a specialized database that indexes video at the scene level - making it possible for an agent to "recall" specific events or "see" real-time occurrences without excessive compute costs.

<CardGroup cols={2}>
  <Card icon="rocket" title="Quickstart" href="/pages/getting-started/quickstart">
    Give your agent perception in 5 minutes
  </Card>

  <Card icon="book-open" title="Core Concepts" href="/pages/core-concepts">
    Understand the platform architecture
  </Card>
</CardGroup>

### How It Works

The platform operates through three stages: **See**, **Understand**, and **Act**.

| Stage          | What Happens                                                                               |
| :------------- | :----------------------------------------------------------------------------------------- |
| **See**        | Capture SDK or live stream integration takes in media from files, desktops, or cameras     |
| **Understand** | Build specialized indexes for transcripts, visual scenes, or custom prompts                |
| **Act**        | Query, search, edit, and export - agents can generate summaries or clips based on findings |

Rather than merely storing video files, the platform indexes frames and audio to support semantic retrieval. This allows an agent to ask for a specific moment in a continuous stream without downloading or processing the entire file.

The architecture sits above transport protocols and below the reasoning engine. This separation means you can use VideoDB with any Large Language Model or Large Video Model. By consolidating transcription, frame extraction, vector indexing, and video playback into a single platform, VideoDB addresses the high total cost of ownership typically associated with video AI.

### Skills: Native Agent Experiences

Since VideoDB handles server-side video processing, indexing, and retrieval, developers can use [skills](/pages/getting-started/agent-skills) to create agent workflows that feel native to their environment. Skills give agents like Claude Code and Codex structured perception primitives - capture, search, edit, stream - without writing infrastructure code.

```bash theme={null}
npx skills add video-db/skills
```

***

## What You Can Build

<CardGroup cols={2}>
  <Card icon="monitor" title="Desktop Agents">
    Stream screen, mic, and camera. Get real-time context about what the user is doing and saying.

    [Call.md →](/examples-and-tutorials/ai-copilots/call-md)
  </Card>

  <Card icon="search" title="Video Search">
    Search across hours of meetings, lectures, or archives. Get timestamped moments with playable evidence.

    [Multimodal Search →](/examples-and-tutorials/video-rag/multimodal-search)
  </Card>

  <Card icon="bell" title="Real-time Monitoring">
    Connect RTSP cameras and drones. Detect events as they happen. Trigger alerts and automations.

    [Intrusion Detection →](/examples-and-tutorials/live-intelligence/intrusion-detection)
  </Card>

  <Card icon="clapperboard" title="Media Automation">
    Compose videos with code. Generate voice, music, and images. Export to any format.

    [Faceless Video Creator →](/examples-and-tutorials/content-factory/faceless-video-creator)
  </Card>

  <Card icon="robot" title="Agent Skills">
    Add real-time perception to coding assistants and autonomous agents. Screen capture, audio indexing, and searchable context.

    [Agent Skills →](/pages/getting-started/agent-skills)
  </Card>
</CardGroup>

<Card icon="folder" title="Browse All Examples" href="/examples-and-tutorials">
  Explore examples across AI Copilots, Video Search, Live Intelligence, Content Factory, and more
</Card>

***

## Example: Real-time Alerting

<CodeGroup>
  ```python Python theme={null}
  import videodb

  conn = videodb.connect()

  # See: Get an active stream (from desktop capture or RTSP)
  rtstream = conn.get_rtstream("rts-abc123")

  # Understand: Create indexes on the live stream
  visual_index = rtstream.index_visuals(prompt="Describe what the user is doing")
  audio_index = rtstream.index_audio(prompt="Extract key decisions and action items")

  # Act: Create an event and attach an alert
  event = conn.create_event(
      event_prompt="Detect when someone mentions a deadline or due date"
  )
  alert = audio_index.create_alert(
      webhook_url="https://your-backend.com/webhooks/deadline-mentioned"
  )

  # Real-time events arrive via WebSocket or webhook
  # { "channel": "alert", "timestamp": "2026-02-11T12:18:00.968810+00:00", "rtstream_id": "rts-xxx", "rtstream_name": "Meeting", "data": { "event_id": "event-77aae6b981970542", "label": "objection", "triggered": true, "confidence": 0.9, "start": 1770812246.3445818, "end": 1770812277.3488276 } }
  ```

  ```javascript Node.js theme={null}
  import { connect } from 'videodb';

  const conn = connect();

  // See: Get an active stream (from desktop capture or RTSP)
  const rtstream = await conn.getRtstream("rts-abc123");

  // Understand: Create indexes on the live stream
  const visualIndex = await rtstream.indexVisuals({ prompt: "Describe what the user is doing" });
  const audioIndex = await rtstream.indexAudio({ prompt: "Extract key decisions and action items" });

  // Act: Create an event and attach an alert
  const event = await conn.createEvent({
      eventPrompt: "Detect when someone mentions a deadline or due date"
  });
  const alert = await audioIndex.createAlert({
      webhookUrl: "https://your-backend.com/webhooks/deadline-mentioned"
  });

  // Real-time events arrive via WebSocket or webhook
  // { "channel": "alert", "timestamp": "2026-02-11T12:18:00.968810+00:00", "rtstream_id": "rts-xxx", "rtstream_name": "Meeting", "data": { "event_id": "event-77aae6b981970542", "label": "objection", "triggered": true, "confidence": 0.9, "start": 1770812246.3445818, "end": 1770812277.3488276 } }
  ```
</CodeGroup>

***

## Install the SDK

<CodeGroup>
  ```bash Python theme={null}
  pip install videodb
  ```

  ```bash Node.js theme={null}
  npm install videodb
  ```
</CodeGroup>

<CardGroup cols={2}>
  <Card icon="file-code" title="Python SDK" href="/pages/getting-started/python">
    GitHub, PyPI, and setup guide
  </Card>

  <Card icon="file-json" title="Node.js SDK" href="/pages/getting-started/node">
    npm, TypeScript, and setup guide
  </Card>
</CardGroup>

***

## Philosophy

Why perception is the next frontier for AI agents.

<CardGroup cols={2}>
  <Card icon="bot" title="Why AI Agents Are Blind Today" href="/pages/philosophy/why-agents-are-blind">
    The gap between human perception and agent perception
  </Card>

  <Card icon="eye" title="Perception Is the Missing Layer" href="/pages/philosophy/perception-is-the-missing-layer">
    The stack that gives agents eyes and ears
  </Card>

  <Card icon="file-x" title="MP4 Is the Wrong Primitive" href="/pages/philosophy/mp4-is-wrong-primitive">
    Why video files don't work for AI
  </Card>

  <Card icon="brain" title="What Episodic Memory Means for Agents" href="/pages/philosophy/episodic-memory-for-agents">
    Remember experiences, not just facts
  </Card>
</CardGroup>

***
