Documentation Index Fetch the complete documentation index at: https://docs.videodb.io/llms.txt
Use this file to discover all available pages before exploring further.
The Problem
Your original video content gets stolen, re-uploaded, and monetized by others. By the time you notice, it’s already viral. Traditional manual checking doesn’t scale — you need AI.
This guide shows you how to build a production-ready system that detects when your videos are stolen, even with edits, and generates evidence for DMCA takedowns.
What You’ll Build
Build a system that:
Indexes your video portfolio as searchable “fingerprints”
Detects visual similarity in suspect videos (even with edits)
Generates side-by-side comparison clips for DMCA evidence
Identifies sequential matching (stronger proof of plagiarism)
Creates comprehensive reports with confidence scores
All powered by VideoDB’s Editor SDK and semantic search.
Setup
Install Dependencies
Connect to VideoDB
import videodb
# Connect to VideoDB
api_key = "your_api_key"
conn = videodb.connect( api_key = api_key)
coll = conn.get_collection()
Implementation
Step 1: Index Your Portfolio Videos
from videodb import SceneExtractionType, IndexType
# Upload your original videos
portfolio_videos = []
portfolio_videos.append(coll.upload( url = "https://example.com/original-video-1.mp4" ))
portfolio_videos.append(coll.upload( url = "https://example.com/original-video-2.mp4" ))
# Create scene indexes for all portfolio videos
portfolio_indexes = {}
for video in portfolio_videos:
index_id = video.index_scenes(
extraction_type = SceneExtractionType.shot_based,
extraction_config = { "threshold" : 20 }
)
portfolio_indexes[video.id] = index_id
Step 2: Upload Suspect Video for Analysis
# Upload the video you suspect is plagiarized
suspect_video = coll.upload( url = "https://example.com/suspect-video.mp4" )
# Create scene index for suspect video (shot-based for better detection)
suspect_index_id = suspect_video.index_scenes(
extraction_type = SceneExtractionType.shot_based,
extraction_config = { "threshold" : 20 }
)
from videodb import IndexType
matches = []
similarity_threshold = 0.70
# Get suspect video scenes
suspect_scenes = suspect_video.get_scene_index(suspect_index_id)
for portfolio_vid_id, portfolio_index_id in portfolio_indexes.items():
portfolio_video = coll.get_video(portfolio_vid_id)
# Compare each suspect scene against portfolio
for suspect_scene in suspect_scenes:
# Use VideoDB's semantic search
results = portfolio_video.search(
query = suspect_scene[ 'description' ],
search_type = "semantic" ,
index_type = IndexType.scene,
index_id = portfolio_index_id
)
# Process results
for shot in results.shots:
if shot.search_score > similarity_threshold:
matches.append({
"suspect_time" : suspect_scene[ 'start' ],
"portfolio_video" : portfolio_vid_id,
"portfolio_time" : shot.start,
"similarity" : shot.search_score
})
# Sort by highest similarity
matches = sorted (matches, key = lambda x : x[ "similarity" ], reverse = True )
Step 4: Detect Sequential Patterns
# Check for sequential matching (stronger evidence)
sequential_matches = []
consecutive_count = 0
last_similarity = 0
for match in matches:
if match[ "similarity" ] > 0.80 and consecutive_count < 5 :
consecutive_count += 1
sequential_matches.append(match)
else :
if consecutive_count >= 3 : # 3+ consecutive = strong evidence
sequential_matches.extend(sequential_matches[ - consecutive_count:])
consecutive_count = 0
plagiarism_confidence = min ( 1.0 , len (sequential_matches) / 10 )
Step 5: Generate Evidence Clips
from videodb.editor import Timeline, Track, Clip, VideoAsset, Position, Fit
# Create side-by-side comparison timeline
timeline = Timeline(conn)
# Add matching segments side-by-side
for match in matches[: 5 ]: # Top 5 matches as evidence
track = Track()
# Portfolio video (left side)
portfolio_asset = VideoAsset(
id = match[ "portfolio_video" ],
start = match[ "portfolio_time" ]
)
portfolio_clip = Clip(
asset = portfolio_asset,
duration = 5 ,
position = Position.left,
fit = Fit.crop,
scale = 0.5
)
track.add_clip( 0 , portfolio_clip)
# Suspect video (right side)
suspect_asset = VideoAsset(
id = suspect_video.id,
start = match[ "suspect_time" ]
)
suspect_clip = Clip(
asset = suspect_asset,
duration = 5 ,
position = Position.right,
fit = Fit.crop,
scale = 0.5
)
track.add_clip( 0 , suspect_clip)
timeline.add_track(track)
# Generate evidence video
evidence_stream_url = timeline.generate_stream()
Step 6: Generate Plagiarism Report
# Create comprehensive report
report = {
"suspect_video_id" : suspect_video.id,
"total_matches" : len (matches),
"sequential_matches" : len (sequential_matches),
"plagiarism_confidence" : plagiarism_confidence,
"high_confidence_matches" : len ([m for m in matches if m[ "similarity" ] > 0.95 ]),
"medium_confidence_matches" : len ([m for m in matches if 0.85 < m[ "similarity" ] <= 0.95 ]),
"evidence_video_url" : evidence_stream_url,
"timestamp" : "2025-01-20T12:00:00Z"
}
# If confidence > 0.80, recommend DMCA takedown
if plagiarism_confidence > 0.80 :
report[ "recommendation" ] = "STRONG PLAGIARISM DETECTED - Ready for DMCA takedown"
report[ "action" ] = "prepare_dmca_evidence"
What You Get
A production-ready detection system with:
Scene-by-scene visual fingerprinting
Semantic similarity matching (catches edits/crops)
Sequential pattern detection (strengthens evidence)
Side-by-side comparison clips
Confidence scoring for legal action
Automated DMCA-ready reports
Here’s the side-by-side evidence video:
How It Works
Portfolio Indexing - Convert your original videos into searchable scene embeddings
Suspect Upload - Upload suspected plagiarized video
Similarity Scan - Compare each scene using semantic similarity (catches edits)
Sequential Detection - Look for multiple consecutive matches (stronger evidence)
Evidence Generation - Create side-by-side comparison clips
DMCA Ready - Generate professional report for takedown
Similarity Thresholds
0.95+ = Nearly identical (very likely plagiarism)
0.85-0.95 = High similarity (suspicious)
0.70-0.85 = Medium similarity (may be coincidence)
<0.70 = Low similarity (likely not plagiarism)
Adjust thresholds based on your tolerance for false positives.
The Result
With this system, you can:
Protect your intellectual property at scale
Detect plagiarism even with edits and filters
Generate professional DMCA evidence automatically
Monitor multiple suspect videos efficiently
Respond to theft quickly with automated reports
Your content is your property. Protect it with AI.
Explore the Full Notebook Open the complete implementation with advanced embedding techniques, batch processing, and database management.
Profanity Beeper Auto-detect and beep curse words in audio
Content Removal Skip inappropriate visual content in streams