> ## Documentation Index
> Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Core Concepts Overview

> VideoDB is the perception, memory, and action layer for AI agents operating on video and audio. Every workflow follows the same loop, whether you're processing files, live streams, or desktop capture.

## The Platform Loop

```
See (Ingest) → Process → Understand (Indexes) → Remember → Retrieve (Search) → Act
```

```python theme={null}
import videodb

conn = videodb.connect()
coll = conn.get_collection()

# SEE: Ingest from any source
video = coll.upload(url="https://example.com/video.mp4")

# UNDERSTAND: Create an index
index_id = video.index_visuals(prompt="Extract key moments")

# RETRIEVE: Search with natural language
results = video.search("important announcement", index_id=index_id)

# ACT: Generate outputs, trigger actions
for shot in results.shots:
    print(f"{shot.start}s - {shot.end}s: {shot.text}")
    shot.play()  # Playable evidence
```

***

## See (Ingest)

Get video and audio from anywhere into VideoDB.

| Source          | Method                                    |
| :-------------- | :---------------------------------------- |
| File URL        | `coll.upload(url="https://...")`          |
| Local file      | `coll.upload(file_path="./video.mp4")`    |
| RTSP stream     | `coll.connect_rtstream(url="rtsp://...")` |
| Desktop capture | Capture SDK (screen, mic, camera)         |

```python theme={null}
# File-based
video = coll.upload(url="https://example.com/meeting.mp4")

# Live stream
rtstream = coll.connect_rtstream(
    name="Security Camera",
    url="rtsp://user:pass@host:554/stream"
)
```

***

## Process

Built-in primitives convert raw media into processable units. This happens automatically when you create indexes.

* **Scene segmentation** - Time-based, shot-based, or prompt-guided
* **Frame sampling** - Control which frames to analyze
* **Audio chunking** - Word, sentence, or time-based segments

```python theme={null}
# Time-based: every 10 seconds
video.index_scenes(
    extraction_type="time_based",
    extraction_config={"time": 10}
)

# For RTStream: batch config
rtstream.index_visuals(
    batch_config={"type": "time", "value": 5, "frame_count": 2}
)
```

This is where cost control happens - sampling policies trade compute for recall.

***

## Understand (Indexes)

Indexes are programmable interpretation layers. You define what to extract with prompts.

* **Prompt-driven** - Natural language instructions
* **Model-orchestrated** - LLMs and VLMs do the work
* **Additive** - Multiple indexes on same media
* **Multimodal** - Visual and spoken

```python theme={null}
# Visual understanding
visual_index = video.index_scenes(
    prompt="Identify key moments and describe activities"
)

# Spoken understanding
transcript = video.index_spoken_words()

# Multiple indexes = multiple perspectives
safety_index = video.index_scenes(prompt="Identify safety issues")
summary_index = video.index_scenes(prompt="Summarize each segment")
```

***

## Remember

Indexes are stored as episodic memory. This is automatic by default.

**What gets stored:**

* Transcripts and embeddings
* Scene descriptions and tags
* Structured metadata
* Retrieval structures

**Ephemeral mode** - For live sessions, you can choose not to persist:

```python theme={null}
rtstream.index_visuals(
    prompt="...",
    ephemeral=True  # Process but don't persist
)
```

***

## Retrieve (Search)

Search across indexed content with natural language. Results include playable evidence.

```python theme={null}
# Single video
results = video.search("product demo")

# Single stream
results = rtstream.search("intrusion")

# Collection-wide
results = coll.search("quarterly results", index_type="scene")
```

**Results include:**

* **Timestamps** - Exact start/end times
* **Text** - What was detected
* **Score** - Relevance ranking
* **Stream URL** - Playable link

```python theme={null}
for shot in results.shots:
    print(f"{shot.start}s: {shot.text} (score: {shot.search_score})")
    shot.play()  # Verify the result
```

***

## Act

Go from understanding to automation and outputs.

### Event Detection

React to conditions in real-time:

```python theme={null}
event_id = conn.create_event(
    event_prompt="Detect intruder",
    label="security_alert"
)

index.create_alert(
    event_id=event_id,
    callback_url="https://your-backend.com/alerts"
)
```

### Programmable Editing

Compose outputs using the 4-layer editor architecture:

```python theme={null}
from videodb.editor import Timeline, Track, Clip, VideoAsset

video_asset = VideoAsset(id=video.id, start=10)
clip = Clip(asset=video_asset, duration=20)

track = Track()
track.add_clip(0, clip)

timeline = Timeline(conn)
timeline.add_track(track)
output = timeline.generate_stream()
```

***

## Architecture Patterns

The loop applies to different use cases:

| Use Case         | See                | Understand                | Act               |
| :--------------- | :----------------- | :------------------------ | :---------------- |
| Video Search     | Upload files       | Index with domain prompts | Search + retrieve |
| Monitoring       | Connect RTSP       | Real-time indexing        | Alerts + webhooks |
| Desktop Agent    | Capture SDK        | Index screen/mic          | Context for LLM   |
| Media Automation | Upload + transcode | Index for editing         | Timeline + export |

***

## Next Steps

<CardGroup cols={2}>
  <Card icon="database" title="Data Model" href="/pages/core-concepts/data-model">
    Collections, Videos, RTStreams, and other core objects
  </Card>

  <Card icon="search" title="Indexes" href="/pages/core-concepts/indexes-and-search">
    Turn media into searchable knowledge
  </Card>

  <Card icon="search" title="Search & Retrieval" href="/pages/core-concepts/indexes-and-search">
    How search returns playable evidence
  </Card>

  <Card icon="bell" title="Events & Alerts" href="/pages/core-concepts/events-and-realtime">
    Real-time detection and automation
  </Card>
</CardGroup>
