Skip to main content

The Platform Loop

See (Ingest) → Process → Understand (Indexes) → Remember → Retrieve (Search) → Act
import videodb

conn = videodb.connect()
coll = conn.get_collection()

# SEE: Ingest from any source
video = coll.upload(url="https://example.com/video.mp4")

# UNDERSTAND: Create an index
index_id = video.index_visuals(prompt="Extract key moments")

# RETRIEVE: Search with natural language
results = video.search("important announcement", index_id=index_id)

# ACT: Generate outputs, trigger actions
for shot in results.shots:
    print(f"{shot.start}s - {shot.end}s: {shot.text}")
    shot.play()  # Playable evidence

See (Ingest)

Get video and audio from anywhere into VideoDB.
SourceMethod
File URLcoll.upload(url="https://...")
Local filecoll.upload(file_path="./video.mp4")
RTSP streamcoll.connect_rtstream(url="rtsp://...")
Desktop captureCapture SDK (screen, mic, camera)
# File-based
video = coll.upload(url="https://example.com/meeting.mp4")

# Live stream
rtstream = coll.connect_rtstream(
    name="Security Camera",
    url="rtsp://user:pass@host:554/stream"
)

Process

Built-in primitives convert raw media into processable units. This happens automatically when you create indexes.
  • Scene segmentation - Time-based, shot-based, or prompt-guided
  • Frame sampling - Control which frames to analyze
  • Audio chunking - Word, sentence, or time-based segments
# Time-based: every 10 seconds
video.index_scenes(
    extraction_type="time_based",
    extraction_config={"time": 10}
)

# For RTStream: batch config
rtstream.index_visuals(
    batch_config={"type": "time", "value": 5, "frame_count": 2}
)
This is where cost control happens - sampling policies trade compute for recall.

Understand (Indexes)

Indexes are programmable interpretation layers. You define what to extract with prompts.
  • Prompt-driven - Natural language instructions
  • Model-orchestrated - LLMs and VLMs do the work
  • Additive - Multiple indexes on same media
  • Multimodal - Visual and spoken
# Visual understanding
visual_index = video.index_scenes(
    prompt="Identify key moments and describe activities"
)

# Spoken understanding
transcript = video.index_spoken_words()

# Multiple indexes = multiple perspectives
safety_index = video.index_scenes(prompt="Identify safety issues")
summary_index = video.index_scenes(prompt="Summarize each segment")

Remember

Indexes are stored as episodic memory. This is automatic by default. What gets stored:
  • Transcripts and embeddings
  • Scene descriptions and tags
  • Structured metadata
  • Retrieval structures
Ephemeral mode - For live sessions, you can choose not to persist:
rtstream.index_visuals(
    prompt="...",
    ephemeral=True  # Process but don't persist
)

Search across indexed content with natural language. Results include playable evidence.
# Single video
results = video.search("product demo")

# Single stream
results = rtstream.search("intrusion")

# Collection-wide
results = coll.search("quarterly results", index_type="scene")
Results include:
  • Timestamps - Exact start/end times
  • Text - What was detected
  • Score - Relevance ranking
  • Stream URL - Playable link
for shot in results.shots:
    print(f"{shot.start}s: {shot.text} (score: {shot.search_score})")
    shot.play()  # Verify the result

Act

Go from understanding to automation and outputs.

Event Detection

React to conditions in real-time:
event_id = conn.create_event(
    event_prompt="Detect intruder",
    label="security_alert"
)

index.create_alert(
    event_id=event_id,
    callback_url="https://your-backend.com/alerts"
)

Programmable Editing

Compose outputs using the 4-layer editor architecture:
from videodb.editor import Timeline, Track, Clip, VideoAsset

video_asset = VideoAsset(id=video.id, start=10)
clip = Clip(asset=video_asset, duration=20)

track = Track()
track.add_clip(0, clip)

timeline = Timeline(conn)
timeline.add_track(track)
output = timeline.generate_stream()

Architecture Patterns

The loop applies to different use cases:
Use CaseSeeUnderstandAct
Video RAGUpload filesIndex with domain promptsSearch + retrieve
MonitoringConnect RTSPReal-time indexingAlerts + webhooks
Desktop AgentCapture SDKIndex screen/micContext for LLM
Media AutomationUpload + transcodeIndex for editingTimeline + export

Next Steps