Skip to main content
Open In Colab

The Problem

Your original video content gets stolen, re-uploaded, and monetized by others. By the time you notice, it’s already viral. Traditional manual checking doesn’t scale — you need AI. This guide shows you how to build a production-ready system that detects when your videos are stolen, even with edits, and generates evidence for DMCA takedowns.

What You’ll Build

Build a system that:
  • Indexes your video portfolio as searchable “fingerprints”
  • Detects visual similarity in suspect videos (even with edits)
  • Generates side-by-side comparison clips for DMCA evidence
  • Identifies sequential matching (stronger proof of plagiarism)
  • Creates comprehensive reports with confidence scores
All powered by VideoDB’s Editor SDK and semantic search.

Setup

Install Dependencies

pip install videodb

Connect to VideoDB

import videodb

# Connect to VideoDB
api_key = "your_api_key"
conn = videodb.connect(api_key=api_key)
coll = conn.get_collection()

Implementation

Step 1: Index Your Portfolio Videos

from videodb import SceneExtractionType, IndexType

# Upload your original videos
portfolio_videos = []
portfolio_videos.append(coll.upload(url="https://example.com/original-video-1.mp4"))
portfolio_videos.append(coll.upload(url="https://example.com/original-video-2.mp4"))

# Create scene indexes for all portfolio videos
portfolio_indexes = {}
for video in portfolio_videos:
    index_id = video.index_scenes(
        extraction_type=SceneExtractionType.shot_based,
        extraction_config={"threshold": 20}
    )
    portfolio_indexes[video.id] = index_id

Step 2: Upload Suspect Video for Analysis

# Upload the video you suspect is plagiarized
suspect_video = coll.upload(url="https://example.com/suspect-video.mp4")

# Create scene index for suspect video (shot-based for better detection)
suspect_index_id = suspect_video.index_scenes(
    extraction_type=SceneExtractionType.shot_based,
    extraction_config={"threshold": 20}
)

Step 3: Perform Similarity Comparison

from videodb import IndexType

matches = []
similarity_threshold = 0.70

# Get suspect video scenes
suspect_scenes = suspect_video.get_scene_index(suspect_index_id)

for portfolio_vid_id, portfolio_index_id in portfolio_indexes.items():
    portfolio_video = coll.get_video(portfolio_vid_id)

    # Compare each suspect scene against portfolio
    for suspect_scene in suspect_scenes:
        # Use VideoDB's semantic search
        results = portfolio_video.search(
            query=suspect_scene['description'],
            search_type="semantic",
            index_type=IndexType.scene,
            index_id=portfolio_index_id
        )

        # Process results
        for shot in results.shots:
            if shot.search_score > similarity_threshold:
                matches.append({
                    "suspect_time": suspect_scene['start'],
                    "portfolio_video": portfolio_vid_id,
                    "portfolio_time": shot.start,
                    "similarity": shot.search_score
                })

# Sort by highest similarity
matches = sorted(matches, key=lambda x: x["similarity"], reverse=True)

Step 4: Detect Sequential Patterns

# Check for sequential matching (stronger evidence)
sequential_matches = []
consecutive_count = 0
last_similarity = 0

for match in matches:
    if match["similarity"] > 0.80 and consecutive_count < 5:
        consecutive_count += 1
        sequential_matches.append(match)
    else:
        if consecutive_count >= 3:  # 3+ consecutive = strong evidence
            sequential_matches.extend(sequential_matches[-consecutive_count:])
        consecutive_count = 0

plagiarism_confidence = min(1.0, len(sequential_matches) / 10)

Step 5: Generate Evidence Clips

from videodb.editor import Timeline, Track, Clip, VideoAsset, Position, Fit

# Create side-by-side comparison timeline
timeline = Timeline(conn)

# Add matching segments side-by-side
for match in matches[:5]:  # Top 5 matches as evidence
    track = Track()

    # Portfolio video (left side)
    portfolio_asset = VideoAsset(
        id=match["portfolio_video"],
        start=match["portfolio_time"]
    )
    portfolio_clip = Clip(
        asset=portfolio_asset,
        duration=5,
        position=Position.left,
        fit=Fit.crop,
        scale=0.5
    )
    track.add_clip(0, portfolio_clip)

    # Suspect video (right side)
    suspect_asset = VideoAsset(
        id=suspect_video.id,
        start=match["suspect_time"]
    )
    suspect_clip = Clip(
        asset=suspect_asset,
        duration=5,
        position=Position.right,
        fit=Fit.crop,
        scale=0.5
    )
    track.add_clip(0, suspect_clip)

    timeline.add_track(track)

# Generate evidence video
evidence_stream_url = timeline.generate_stream()

Step 6: Generate Plagiarism Report

# Create comprehensive report
report = {
    "suspect_video_id": suspect_video.id,
    "total_matches": len(matches),
    "sequential_matches": len(sequential_matches),
    "plagiarism_confidence": plagiarism_confidence,
    "high_confidence_matches": len([m for m in matches if m["similarity"] > 0.95]),
    "medium_confidence_matches": len([m for m in matches if 0.85 < m["similarity"] <= 0.95]),
    "evidence_video_url": evidence_stream_url,
    "timestamp": "2025-01-20T12:00:00Z"
}

# If confidence > 0.80, recommend DMCA takedown
if plagiarism_confidence > 0.80:
    report["recommendation"] = "STRONG PLAGIARISM DETECTED - Ready for DMCA takedown"
    report["action"] = "prepare_dmca_evidence"

What You Get

A production-ready detection system with:
  • Scene-by-scene visual fingerprinting
  • Semantic similarity matching (catches edits/crops)
  • Sequential pattern detection (strengthens evidence)
  • Side-by-side comparison clips
  • Confidence scoring for legal action
  • Automated DMCA-ready reports
Here’s the side-by-side evidence video:

How It Works

  1. Portfolio Indexing - Convert your original videos into searchable scene embeddings
  2. Suspect Upload - Upload suspected plagiarized video
  3. Similarity Scan - Compare each scene using semantic similarity (catches edits)
  4. Sequential Detection - Look for multiple consecutive matches (stronger evidence)
  5. Evidence Generation - Create side-by-side comparison clips
  6. DMCA Ready - Generate professional report for takedown

Similarity Thresholds

  • 0.95+ = Nearly identical (very likely plagiarism)
  • 0.85-0.95 = High similarity (suspicious)
  • 0.70-0.85 = Medium similarity (may be coincidence)
  • <0.70 = Low similarity (likely not plagiarism)
Adjust thresholds based on your tolerance for false positives.

The Result

With this system, you can:
  • Protect your intellectual property at scale
  • Detect plagiarism even with edits and filters
  • Generate professional DMCA evidence automatically
  • Monitor multiple suspect videos efficiently
  • Respond to theft quickly with automated reports
Your content is your property. Protect it with AI.

Explore the Full Notebook

Open the complete implementation with advanced embedding techniques, batch processing, and database management.