> ## Documentation Index
> Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Agent Skills

> Add video and audio perception to your AI agents - capture, upload, search, edit, and stream

Your AI agents can write code and automate tasks brilliantly. But they're missing one critical capability: the ability to work with video and audio - capturing screens, searching through recordings, editing clips, and streaming results.

VideoDB Skills give agents like Claude Code and Codex the power to execute server-side video workflows, turning text-only agents into multimodal collaborators.

***

## Install VideoDB Skills

Get video and audio perception in your agent with one command:

<Tabs>
  <Tab title="NPX (Recommended)" icon="terminal">
    ```bash theme={null}
    npx skills add video-db/skills
    ```
  </Tab>

  <Tab title="Claude Code Plugin" icon="store">
    ```bash theme={null}
    /plugin marketplace add video-db/skills
    /plugin install videodb@videodb-skills
    ```
  </Tab>
</Tabs>

Then run `/videodb setup` to configure your API key and verify connectivity.

<Card title="VideoDB Skills on GitHub" icon="github" href="https://github.com/video-db/skills">
  Complete source code, installation guide, and configuration examples
</Card>

***

## Prerequisites

<Steps>
  <Step title="VideoDB API Key">
    Get a free API key from [console.videodb.io](https://console.videodb.io)

    No credit card required. Free tier includes 50 uploads.
  </Step>

  <Step title="System Requirements">
    * **Python 3.9+**
    * **Platform**: macOS, Linux, Windows (PowerShell)
  </Step>

  <Step title="Set Your API Key">
    Export your API key in your shell:

    ```bash theme={null}
    export VIDEO_DB_API_KEY=your-key-here
    ```

    Or add it to a `.env` file in your project root.
  </Step>
</Steps>

***

## What It Does

VideoDB Skills is a perception capability that enables **See → Understand → Act, as an API, for video and audio**. It gives agents like Claude Code, Codex, and Cursor the ability to execute server-side video workflows.

One unified interface for:

* **See** - Capture desktop screens, microphone/system audio, RTSP streams, and ingest files, URLs, and YouTube content
* **Understand** - Visual analysis, transcription, indexing, and searching moments with playable clips
* **Act** - Stream results, trigger alerts, edit timelines, generate subtitles/overlays, and export clips

### Why Use It

<Tabs>
  <Tab title="Video Workflows" icon="video">
    Execute video operations without local ffmpeg installation:

    * Upload from YouTube, URLs, or local files
    * Trim, merge, clip, overlay text/images/audio
    * Transcode, reframe, adjust resolution and aspect ratio
    * Get instant playable HLS links via built-in CDN
  </Tab>

  <Tab title="Real-Time Perception" icon="eye">
    Capture and analyze live feeds:

    * Desktop screen, microphone, and system audio recording
    * Monitor RTSP camera feeds with event detection
    * Generate structured context from desktop streams
    * Log alerts with timestamps for person detection
  </Tab>

  <Tab title="Search & Intelligence" icon="search">
    Find moments by speech, scenes, or metadata:

    * "Identify all scenes showing 'phone close-up'"
    * "Capture my screen and report activities with insights"
    * Timestamped transcripts and subtitles
    * Playable evidence clips with exact timestamps
  </Tab>
</Tabs>

### Quick Start

Ask your agent to execute video tasks:

```text theme={null}
Upload [YouTube URL] and provide a shareable stream link
```

```text theme={null}
Extract clips from 10s-30s and 45s-60s and merge them
```

```text theme={null}
Generate background music and add to this clip
```

```text theme={null}
Add white text on black background subtitles to the original video
```

```text theme={null}
Capture my screen for two minutes and report my activities with insights
```

```text theme={null}
Monitor my IP Camera RTSP stream and log person detection alerts with timestamps
```

### Capabilities

| Capability            | What It Does                                                          |
| --------------------- | --------------------------------------------------------------------- |
| **Capture**           | Desktop screen, microphone, and system audio for real-time processing |
| **Upload**            | Ingest from YouTube, URLs, or local files                             |
| **Context**           | Generate structured context from RTSP feeds or desktop streams        |
| **Search**            | Locate moments by speech, scenes, or metadata with playable evidence  |
| **Transcripts**       | Generate timestamped transcripts                                      |
| **Subtitles**         | Auto-generate, style, and burn-in subtitles                           |
| **Edit**              | Trim, merge, clip, overlay text/images/audio; add dubbing/translation |
| **AI Generate**       | Create images, video, music, sound effects, voiceovers                |
| **Transcode/Reframe** | Adjust resolution, quality, aspect ratio, social crops server-side    |
| **Stream**            | Obtain instant playable HLS links via built-in CDN                    |

***

## Example: OpenClaw Monitoring

VideoDB Skills powers [OpenClaw Monitoring](https://github.com/video-db/openclaw-monitoring) - "CCTV for AI agents" that monitors, records, and audits autonomous agent sessions. Every agent run becomes a live stream, replayable recording, and searchable archive.

<Card title="OpenClaw Monitoring on GitHub" icon="github" href="https://github.com/video-db/openclaw-monitoring">
  See how VideoDB Skills enables visual observability for autonomous agents
</Card>

***

## Next Steps

<CardGroup cols={2}>
  <Card icon="camera" title="Capture SDK Overview" href="/pages/ingest/capture-sdks/overview">
    Deep dive: channels, permissions, client code, and event handling
  </Card>

  <Card icon="search" title="Real-time Context" href="/pages/ingest/capture-sdks/realtime-context">
    How real-time indexing and search works
  </Card>

  <Card icon="folder" title="AI Copilot Examples" href="/examples-and-tutorials/ai-copilots">
    Explore more AI copilot projects and use cases
  </Card>

  <Card icon="rocket" title="Quickstart" href="/pages/getting-started/quickstart">
    Try desktop perception with a hosted OpenClaw agent
  </Card>
</CardGroup>
