Skip to main content
Your AI agents can write code and automate tasks brilliantly. But they’re missing one critical capability: the ability to work with video and audio - capturing screens, searching through recordings, editing clips, and streaming results. VideoDB Skills give agents like Claude Code and Codex the power to execute server-side video workflows, turning text-only agents into multimodal collaborators.

Install VideoDB Skills

Get video and audio perception in your agent with one command: Then run /videodb setup to configure your API key and verify connectivity.

VideoDB Skills on GitHub

Complete source code, installation guide, and configuration examples

Prerequisites

1

VideoDB API Key

Get a free API key from console.videodb.ioNo credit card required. Free tier includes 50 uploads.
2

System Requirements

  • Python 3.9+
  • Platform: macOS, Linux, Windows (PowerShell)
3

Set Your API Key

Export your API key in your shell:
export VIDEO_DB_API_KEY=your-key-here
Or add it to a .env file in your project root.

What It Does

VideoDB Skills is a perception capability that enables See → Understand → Act, as an API, for video and audio. It gives agents like Claude Code, Codex, and Cursor the ability to execute server-side video workflows. One unified interface for:
  • See - Capture desktop screens, microphone/system audio, RTSP streams, and ingest files, URLs, and YouTube content
  • Understand - Visual analysis, transcription, indexing, and searching moments with playable clips
  • Act - Stream results, trigger alerts, edit timelines, generate subtitles/overlays, and export clips

Why Use It

Execute video operations without local ffmpeg installation:
  • Upload from YouTube, URLs, or local files
  • Trim, merge, clip, overlay text/images/audio
  • Transcode, reframe, adjust resolution and aspect ratio
  • Get instant playable HLS links via built-in CDN

Quick Start

Ask your agent to execute video tasks:
Upload [YouTube URL] and provide a shareable stream link
Extract clips from 10s-30s and 45s-60s and merge them
Generate background music and add to this clip
Add white text on black background subtitles to the original video
Capture my screen for two minutes and report my activities with insights
Monitor my IP Camera RTSP stream and log person detection alerts with timestamps

Capabilities

CapabilityWhat It Does
CaptureDesktop screen, microphone, and system audio for real-time processing
UploadIngest from YouTube, URLs, or local files
ContextGenerate structured context from RTSP feeds or desktop streams
SearchLocate moments by speech, scenes, or metadata with playable evidence
TranscriptsGenerate timestamped transcripts
SubtitlesAuto-generate, style, and burn-in subtitles
EditTrim, merge, clip, overlay text/images/audio; add dubbing/translation
AI GenerateCreate images, video, music, sound effects, voiceovers
Transcode/ReframeAdjust resolution, quality, aspect ratio, social crops server-side
StreamObtain instant playable HLS links via built-in CDN

Example: OpenClaw Monitoring

VideoDB Skills powers OpenClaw Monitoring - “CCTV for AI agents” that monitors, records, and audits autonomous agent sessions. Every agent run becomes a live stream, replayable recording, and searchable archive.

OpenClaw Monitoring on GitHub

See how VideoDB Skills enables visual observability for autonomous agents

Next Steps

Capture SDK Overview

Deep dive: channels, permissions, client code, and event handling

Real-time Context

How real-time indexing and search works

AI Copilot Examples

Explore more AI copilot projects and use cases

Quickstart

Try desktop perception with a hosted OpenClaw agent