> ## Documentation Index > Fetch the complete documentation index at: https://docs.videodb.io/llms.txt > Use this file to discover all available pages before exploring further. # AI Agent Skills > Add video and audio perception to your AI agents - capture, upload, search, edit, and stream Your AI agents can write code and automate tasks brilliantly. But they're missing one critical capability: the ability to work with video and audio - capturing screens, searching through recordings, editing clips, and streaming results. VideoDB Skills give agents like Claude Code and Codex the power to execute server-side video workflows, turning text-only agents into multimodal collaborators. *** ## Install VideoDB Skills Get video and audio perception in your agent with one command: ```bash theme={null} npx skills add video-db/skills ``` ```bash theme={null} /plugin marketplace add video-db/skills /plugin install videodb@videodb-skills ``` Then run `/videodb setup` to configure your API key and verify connectivity. Complete source code, installation guide, and configuration examples *** ## Prerequisites Get a free API key from [console.videodb.io](https://console.videodb.io) No credit card required. Free tier includes 50 uploads. * **Python 3.9+** * **Platform**: macOS, Linux, Windows (PowerShell) Export your API key in your shell: ```bash theme={null} export VIDEO_DB_API_KEY=your-key-here ``` Or add it to a `.env` file in your project root. *** ## What It Does VideoDB Skills is a perception capability that enables **See → Understand → Act, as an API, for video and audio**. It gives agents like Claude Code, Codex, and Cursor the ability to execute server-side video workflows. One unified interface for: * **See** - Capture desktop screens, microphone/system audio, RTSP streams, and ingest files, URLs, and YouTube content * **Understand** - Visual analysis, transcription, indexing, and searching moments with playable clips * **Act** - Stream results, trigger alerts, edit timelines, generate subtitles/overlays, and export clips ### Why Use It Execute video operations without local ffmpeg installation: * Upload from YouTube, URLs, or local files * Trim, merge, clip, overlay text/images/audio * Transcode, reframe, adjust resolution and aspect ratio * Get instant playable HLS links via built-in CDN Capture and analyze live feeds: * Desktop screen, microphone, and system audio recording * Monitor RTSP camera feeds with event detection * Generate structured context from desktop streams * Log alerts with timestamps for person detection Find moments by speech, scenes, or metadata: * "Identify all scenes showing 'phone close-up'" * "Capture my screen and report activities with insights" * Timestamped transcripts and subtitles * Playable evidence clips with exact timestamps ### Quick Start Ask your agent to execute video tasks: ```text theme={null} Upload [YouTube URL] and provide a shareable stream link ``` ```text theme={null} Extract clips from 10s-30s and 45s-60s and merge them ``` ```text theme={null} Generate background music and add to this clip ``` ```text theme={null} Add white text on black background subtitles to the original video ``` ```text theme={null} Capture my screen for two minutes and report my activities with insights ``` ```text theme={null} Monitor my IP Camera RTSP stream and log person detection alerts with timestamps ``` ### Capabilities | Capability | What It Does | | --------------------- | --------------------------------------------------------------------- | | **Capture** | Desktop screen, microphone, and system audio for real-time processing | | **Upload** | Ingest from YouTube, URLs, or local files | | **Context** | Generate structured context from RTSP feeds or desktop streams | | **Search** | Locate moments by speech, scenes, or metadata with playable evidence | | **Transcripts** | Generate timestamped transcripts | | **Subtitles** | Auto-generate, style, and burn-in subtitles | | **Edit** | Trim, merge, clip, overlay text/images/audio; add dubbing/translation | | **AI Generate** | Create images, video, music, sound effects, voiceovers | | **Transcode/Reframe** | Adjust resolution, quality, aspect ratio, social crops server-side | | **Stream** | Obtain instant playable HLS links via built-in CDN | *** ## Example: OpenClaw Monitoring VideoDB Skills powers [OpenClaw Monitoring](https://github.com/video-db/openclaw-monitoring) - "CCTV for AI agents" that monitors, records, and audits autonomous agent sessions. Every agent run becomes a live stream, replayable recording, and searchable archive. See how VideoDB Skills enables visual observability for autonomous agents *** ## Next Steps Deep dive: channels, permissions, client code, and event handling How real-time indexing and search works Explore more AI copilot projects and use cases Try desktop perception with a hosted OpenClaw agent