> ## Documentation Index > Fetch the complete documentation index at: https://docs.videodb.io/llms.txt > Use this file to discover all available pages before exploring further. # AI Video Copyright Detection > Production-ready plagiarism detection system using semantic similarity and scene analysis

## The Problem Your original video content gets stolen, re-uploaded, and monetized by others. By the time you notice, it's already viral. Traditional manual checking doesn't scale — you need AI. This guide shows you how to build a production-ready system that detects when your videos are stolen, even with edits, and generates evidence for DMCA takedowns. ## What You'll Build Build a system that: * Indexes your video portfolio as searchable "fingerprints" * Detects visual similarity in suspect videos (even with edits) * Generates side-by-side comparison clips for DMCA evidence * Identifies sequential matching (stronger proof of plagiarism) * Creates comprehensive reports with confidence scores All powered by **VideoDB's Editor SDK** and semantic search. ## Setup ### Install Dependencies ```bash theme={null} pip install videodb ``` ### Connect to VideoDB ```python theme={null} import videodb # Connect to VideoDB api_key = "your_api_key" conn = videodb.connect(api_key=api_key) coll = conn.get_collection() ``` ## Implementation ### Step 1: Index Your Portfolio Videos ```python theme={null} from videodb import SceneExtractionType, IndexType # Upload your original videos portfolio_videos = [] portfolio_videos.append(coll.upload(url="https://example.com/original-video-1.mp4")) portfolio_videos.append(coll.upload(url="https://example.com/original-video-2.mp4")) # Create scene indexes for all portfolio videos portfolio_indexes = {} for video in portfolio_videos: index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, extraction_config={"threshold": 20} ) portfolio_indexes[video.id] = index_id ``` ### Step 2: Upload Suspect Video for Analysis ```python theme={null} # Upload the video you suspect is plagiarized suspect_video = coll.upload(url="https://example.com/suspect-video.mp4") # Create scene index for suspect video (shot-based for better detection) suspect_index_id = suspect_video.index_scenes( extraction_type=SceneExtractionType.shot_based, extraction_config={"threshold": 20} ) ``` ### Step 3: Perform Similarity Comparison ```python theme={null} from videodb import IndexType matches = [] similarity_threshold = 0.70 # Get suspect video scenes suspect_scenes = suspect_video.get_scene_index(suspect_index_id) for portfolio_vid_id, portfolio_index_id in portfolio_indexes.items(): portfolio_video = coll.get_video(portfolio_vid_id) # Compare each suspect scene against portfolio for suspect_scene in suspect_scenes: # Use VideoDB's semantic search results = portfolio_video.search( query=suspect_scene['description'], search_type="semantic", index_type=IndexType.scene, index_id=portfolio_index_id ) # Process results for shot in results.shots: if shot.search_score > similarity_threshold: matches.append({ "suspect_time": suspect_scene['start'], "portfolio_video": portfolio_vid_id, "portfolio_time": shot.start, "similarity": shot.search_score }) # Sort by highest similarity matches = sorted(matches, key=lambda x: x["similarity"], reverse=True) ``` ### Step 4: Detect Sequential Patterns ```python theme={null} # Check for sequential matching (stronger evidence) sequential_matches = [] consecutive_count = 0 last_similarity = 0 for match in matches: if match["similarity"] > 0.80 and consecutive_count < 5: consecutive_count += 1 sequential_matches.append(match) else: if consecutive_count >= 3: # 3+ consecutive = strong evidence sequential_matches.extend(sequential_matches[-consecutive_count:]) consecutive_count = 0 plagiarism_confidence = min(1.0, len(sequential_matches) / 10) ``` ### Step 5: Generate Evidence Clips ```python theme={null} from videodb.editor import Timeline, Track, Clip, VideoAsset, Position, Fit # Create side-by-side comparison timeline timeline = Timeline(conn) # Add matching segments side-by-side for match in matches[:5]: # Top 5 matches as evidence track = Track() # Portfolio video (left side) portfolio_asset = VideoAsset( id=match["portfolio_video"], start=match["portfolio_time"] ) portfolio_clip = Clip( asset=portfolio_asset, duration=5, position=Position.left, fit=Fit.crop, scale=0.5 ) track.add_clip(0, portfolio_clip) # Suspect video (right side) suspect_asset = VideoAsset( id=suspect_video.id, start=match["suspect_time"] ) suspect_clip = Clip( asset=suspect_asset, duration=5, position=Position.right, fit=Fit.crop, scale=0.5 ) track.add_clip(0, suspect_clip) timeline.add_track(track) # Generate evidence video evidence_stream_url = timeline.generate_stream() ``` ### Step 6: Generate Plagiarism Report ```python theme={null} # Create comprehensive report report = { "suspect_video_id": suspect_video.id, "total_matches": len(matches), "sequential_matches": len(sequential_matches), "plagiarism_confidence": plagiarism_confidence, "high_confidence_matches": len([m for m in matches if m["similarity"] > 0.95]), "medium_confidence_matches": len([m for m in matches if 0.85 < m["similarity"] <= 0.95]), "evidence_video_url": evidence_stream_url, "timestamp": "2025-01-20T12:00:00Z" } # If confidence > 0.80, recommend DMCA takedown if plagiarism_confidence > 0.80: report["recommendation"] = "STRONG PLAGIARISM DETECTED - Ready for DMCA takedown" report["action"] = "prepare_dmca_evidence" ``` ## What You Get A production-ready detection system with: * Scene-by-scene visual fingerprinting * Semantic similarity matching (catches edits/crops) * Sequential pattern detection (strengthens evidence) * Side-by-side comparison clips * Confidence scoring for legal action * Automated DMCA-ready reports Here's the side-by-side evidence video: