Capture SDK Overview

1. Backend Setup

Install

pip install videodb

Create a Capture Session

Your backend creates a session and generates a short-lived token for the desktop client:

import videodb

conn = videodb.connect()

# Create session for a user
cap = conn.create_capture_session(
    end_user_id="user_abc",
    callback_url="https://your-backend.com/webhooks/videodb",
    metadata={"app": "my-ai-copilot"}
)

# Generate token for desktop client (never share API key)
token = conn.generate_client_token(expires_in=600)

# Send session ID and token to desktop client
print(f"Session: {cap.id}, Token: {token}")

2. Client Setup

Install

pip install "videodb[capture]"

Start Capture

The desktop client uses the token to stream screen and audio:

import asyncio
from videodb.capture import CaptureClient

async def capture(capture_session_id: str, client_token: str):
    client = CaptureClient(client_token=client_token)

    # Request permissions
    await client.request_permission("microphone")
    await client.request_permission("screen_capture")

    # Discover available sources
    channels = await client.list_channels()
    mic = channels.mics.default
    display = channels.displays.primary or channels.displays[1]
    system_audio = channels.system_audio.default
    selected = [c for c in [mic, display, system_audio] if c]

    # Start capture
    await client.start_session(
        capture_session_id=capture_session_id,
        channels=selected,
        primary_video_channel_id=display.name if display else None
    )

    # Listen for events
    async for ev in client.events():
        print(f"{ev.event}: {ev.payload}")
        if ev.event in ("recording-complete", "error"):
            break

    await client.stop_session()
    await client.shutdown()

# Run the capture
if __name__ == "__main__":
    asyncio.run(capture(
        capture_session_id="cap-xxx",  # From backend
        client_token="token-xxx"        # From backend
    ))

3. Backend Starts AI

When capture begins, your backend receives a webhook and starts AI processing:

def on_webhook(payload: dict):
    if payload["event"] == "capture_session.active":
        cap_id = payload["capture_session_id"]
        cap = conn.get_capture_session(cap_id)

        # Get RTStreams (one per channel)
        mics = cap.get_rtstream("mic")
        displays = cap.get_rtstream("display")

        # Start real-time AI processing
        if mics:
            mic = mics[0]
            mic.start_transcript()
            mic.index_audio(prompt="Extract key decisions and action items")

        if displays:
            display = displays[0]
            display.index_visuals(prompt="Describe what the user is doing")

4. What You Get

Your backend receives AI-ready events in real-time:

{"type": "transcript", "text": "Let's schedule the meeting for Thursday", "is_final": true}

{"type": "index", "index_type": "visual", "text": "User is viewing a Slack conversation with 3 unread messages"}

{"type": "index", "index_type": "audio", "text": "Discussion about scheduling a team meeting"}

{"type": "alert", "label": "sensitive_content", "triggered": true, "confidence": 0.92}

Build with these:

Screen-aware AI agents
Live meeting copilots
In-call assistance
Semantic search and replay

Architecture

Backend creates a CaptureSession and mints a short-lived token
Desktop client uses the token to stream screen + audio (never sees API key)
VideoDB creates RTStreams (one per channel) when capture starts
Backend receives webhook, starts transcript and indexing on RTStreams
AI events flow back via WebSocket (real-time) or can be polled

Two Runtimes

Backend	Desktop Client
Holds API key	Receives session token
Creates sessions	Captures media
Runs AI pipelines	Streams to VideoDB
Receives events	Emits local UX events

Rule of thumb: Webhooks for correctness (durable, at-least-once). WebSocket for live UI (best-effort).

5. Example Applications

Claude Pair Programmer

AI coding assistant with screen and audio context

Focusd

AI-powered productivity tracking

Sales Copilot

Real-time sales call coaching

6. Core Concepts

CaptureSession (cap-xxx)

The lifecycle container for one capture run. Created by backend, activated by desktop client. States: created → starting → active → stopping → stopped → exported

RTStream (rts-xxx)

A real-time media stream, one per captured channel. This is where you run AI:

rtstream.start_transcript()
rtstream.index_audio(prompt="Extract key decisions")
rtstream.index_visuals(prompt="Describe what user is doing")
rtstream.search("budget discussion")

Channel

A recordable source on the desktop:

Channel	Description
`mic:default`	Default microphone
`system_audio:default`	System audio output
`display:1`, `display:2`	Connected displays

Explore More

View All Examples on GitHub

Complete source code with quickstart guides, example apps, and implementation patterns

Real-time Context

Events you receive from capture

Storage & Search

Optional persistence and semantic search

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

1. Backend Setup

Install

Create a Capture Session

2. Client Setup

Install

Start Capture

3. Backend Starts AI

4. What You Get

Architecture

Two Runtimes

5. Example Applications

Claude Pair Programmer

Focusd

Sales Copilot

6. Core Concepts

CaptureSession (cap-xxx)

RTStream (rts-xxx)

Channel

Explore More

View All Examples on GitHub

Real-time Context

Storage & Search

Start Here

Core Concepts

Ingest

Understand

Act

Automate

Build with Agents

​1. Backend Setup

​Install

​Create a Capture Session

​2. Client Setup

​Install

​Start Capture

​3. Backend Starts AI

​4. What You Get

​Architecture

​Two Runtimes

​5. Example Applications

Claude Pair Programmer

Focusd

Sales Copilot

​6. Core Concepts

​CaptureSession (cap-xxx)

​RTStream (rts-xxx)

​Channel

​Explore More

View All Examples on GitHub

Real-time Context

Storage & Search

1. Backend Setup

Install

Create a Capture Session

2. Client Setup

Install

Start Capture

3. Backend Starts AI

4. What You Get

Architecture

Two Runtimes

5. Example Applications

6. Core Concepts

CaptureSession (cap-xxx)

RTStream (rts-xxx)

Channel

Explore More