> ## Documentation Index
> Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Video Copyright Detection

> Production-ready plagiarism detection system using semantic similarity and scene analysis

<a href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/editor/creative/automated_video_copyright_detection.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" noZoom />
</a>

## The Problem

Your original video content gets stolen, re-uploaded, and monetized by others. By the time you notice, it's already viral. Traditional manual checking doesn't scale — you need AI.

This guide shows you how to build a production-ready system that detects when your videos are stolen, even with edits, and generates evidence for DMCA takedowns.

## What You'll Build

Build a system that:

* Indexes your video portfolio as searchable "fingerprints"
* Detects visual similarity in suspect videos (even with edits)
* Generates side-by-side comparison clips for DMCA evidence
* Identifies sequential matching (stronger proof of plagiarism)
* Creates comprehensive reports with confidence scores

All powered by **VideoDB's Editor SDK** and semantic search.

## Setup

### Install Dependencies

```bash theme={null}
pip install videodb
```

### Connect to VideoDB

```python theme={null}
import videodb

# Connect to VideoDB
api_key = "your_api_key"
conn = videodb.connect(api_key=api_key)
coll = conn.get_collection()
```

## Implementation

### Step 1: Index Your Portfolio Videos

```python theme={null}
from videodb import SceneExtractionType, IndexType

# Upload your original videos
portfolio_videos = []
portfolio_videos.append(coll.upload(url="https://example.com/original-video-1.mp4"))
portfolio_videos.append(coll.upload(url="https://example.com/original-video-2.mp4"))

# Create scene indexes for all portfolio videos
portfolio_indexes = {}
for video in portfolio_videos:
    index_id = video.index_scenes(
        extraction_type=SceneExtractionType.shot_based,
        extraction_config={"threshold": 20}
    )
    portfolio_indexes[video.id] = index_id
```

### Step 2: Upload Suspect Video for Analysis

```python theme={null}
# Upload the video you suspect is plagiarized
suspect_video = coll.upload(url="https://example.com/suspect-video.mp4")

# Create scene index for suspect video (shot-based for better detection)
suspect_index_id = suspect_video.index_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={"threshold": 20}
)
```

### Step 3: Perform Similarity Comparison

```python theme={null}
from videodb import IndexType

matches = []
similarity_threshold = 0.70

# Get suspect video scenes
suspect_scenes = suspect_video.get_scene_index(suspect_index_id)

for portfolio_vid_id, portfolio_index_id in portfolio_indexes.items():
    portfolio_video = coll.get_video(portfolio_vid_id)

    # Compare each suspect scene against portfolio
    for suspect_scene in suspect_scenes:
        # Use VideoDB's semantic search
        results = portfolio_video.search(
            query=suspect_scene['description'],
            search_type="semantic",
            index_type=IndexType.scene,
            index_id=portfolio_index_id
        )

        # Process results
        for shot in results.shots:
            if shot.search_score > similarity_threshold:
                matches.append({
                    "suspect_time": suspect_scene['start'],
                    "portfolio_video": portfolio_vid_id,
                    "portfolio_time": shot.start,
                    "similarity": shot.search_score
                })

# Sort by highest similarity
matches = sorted(matches, key=lambda x: x["similarity"], reverse=True)
```

### Step 4: Detect Sequential Patterns

```python theme={null}
# Check for sequential matching (stronger evidence)
sequential_matches = []
consecutive_count = 0
last_similarity = 0

for match in matches:
    if match["similarity"] > 0.80 and consecutive_count < 5:
        consecutive_count += 1
        sequential_matches.append(match)
    else:
        if consecutive_count >= 3:  # 3+ consecutive = strong evidence
            sequential_matches.extend(sequential_matches[-consecutive_count:])
        consecutive_count = 0

plagiarism_confidence = min(1.0, len(sequential_matches) / 10)
```

### Step 5: Generate Evidence Clips

```python theme={null}
from videodb.editor import Timeline, Track, Clip, VideoAsset, Position, Fit

# Create side-by-side comparison timeline
timeline = Timeline(conn)

# Add matching segments side-by-side
for match in matches[:5]:  # Top 5 matches as evidence
    track = Track()

    # Portfolio video (left side)
    portfolio_asset = VideoAsset(
        id=match["portfolio_video"],
        start=match["portfolio_time"]
    )
    portfolio_clip = Clip(
        asset=portfolio_asset,
        duration=5,
        position=Position.left,
        fit=Fit.crop,
        scale=0.5
    )
    track.add_clip(0, portfolio_clip)

    # Suspect video (right side)
    suspect_asset = VideoAsset(
        id=suspect_video.id,
        start=match["suspect_time"]
    )
    suspect_clip = Clip(
        asset=suspect_asset,
        duration=5,
        position=Position.right,
        fit=Fit.crop,
        scale=0.5
    )
    track.add_clip(0, suspect_clip)

    timeline.add_track(track)

# Generate evidence video
evidence_stream_url = timeline.generate_stream()
```

### Step 6: Generate Plagiarism Report

```python theme={null}
# Create comprehensive report
report = {
    "suspect_video_id": suspect_video.id,
    "total_matches": len(matches),
    "sequential_matches": len(sequential_matches),
    "plagiarism_confidence": plagiarism_confidence,
    "high_confidence_matches": len([m for m in matches if m["similarity"] > 0.95]),
    "medium_confidence_matches": len([m for m in matches if 0.85 < m["similarity"] <= 0.95]),
    "evidence_video_url": evidence_stream_url,
    "timestamp": "2025-01-20T12:00:00Z"
}

# If confidence > 0.80, recommend DMCA takedown
if plagiarism_confidence > 0.80:
    report["recommendation"] = "STRONG PLAGIARISM DETECTED - Ready for DMCA takedown"
    report["action"] = "prepare_dmca_evidence"
```

## What You Get

A production-ready detection system with:

* Scene-by-scene visual fingerprinting
* Semantic similarity matching (catches edits/crops)
* Sequential pattern detection (strengthens evidence)
* Side-by-side comparison clips
* Confidence scoring for legal action
* Automated DMCA-ready reports

Here's the side-by-side evidence video:

<iframe className="w-full aspect-video rounded-xl" src="https://console.videodb.io/player?url=https://play.videodb.io/v1/a779fe53-bcc7-457e-a04f-4d818c031b26.m3u8" title="Copyright Detection Evidence Video" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen />

## How It Works

1. **Portfolio Indexing** - Convert your original videos into searchable scene embeddings
2. **Suspect Upload** - Upload suspected plagiarized video
3. **Similarity Scan** - Compare each scene using semantic similarity (catches edits)
4. **Sequential Detection** - Look for multiple consecutive matches (stronger evidence)
5. **Evidence Generation** - Create side-by-side comparison clips
6. **DMCA Ready** - Generate professional report for takedown

## Similarity Thresholds

* **0.95+** = Nearly identical (very likely plagiarism)
* **0.85-0.95** = High similarity (suspicious)
* **0.70-0.85** = Medium similarity (may be coincidence)
* **\<0.70** = Low similarity (likely not plagiarism)

Adjust thresholds based on your tolerance for false positives.

## The Result

With this system, you can:

* Protect your intellectual property at scale
* Detect plagiarism even with edits and filters
* Generate professional DMCA evidence automatically
* Monitor multiple suspect videos efficiently
* Respond to theft quickly with automated reports

Your content is your property. Protect it with AI.

<Card title="Explore the Full Notebook" icon="notebook" href="https://colab.research.google.com/github/video-db/videodb-cookbook/blob/main/editor/creative/automated_video_copyright_detection.ipynb">
  Open the complete implementation with advanced embedding techniques, batch processing, and database management.
</Card>

## Related Tutorials

<CardGroup cols={2}>
  <Card title="Profanity Beeper" icon="volume-x" href="/examples-and-tutorials/safety-compliance/beep-profanity">
    Auto-detect and beep curse words in audio
  </Card>

  <Card title="Content Removal" icon="eye-off" href="/examples-and-tutorials/safety-compliance/remove-content">
    Skip inappropriate visual content in streams
  </Card>
</CardGroup>
