VideoDB Documentation
VideoDB Documentation
Build with VideoDB

icon picker
AWS Rekognition and VideoDB - Effortlessly Remove Inappropriate Content from Video

Video content moderation is complex. While today's computer vision and AI have alleviated the manual burden, modifying video streams instantly and intelligently for moderated content remains challenging.
VideoDB ‘s next gen tech helps you leave behind the tedious processes of conventional video editing, and save tons of cost and time for application developers.
Key components of this blog are:
AWS Rekognition API: Leveraging Content Moderation features for video analysis.
VideoDB: Storing videos in a database tailored for video content, thus enabling the generation of dynamic streams by instantly removing unsafe content.


Install required packages:
boto3: Use AWS
pytube: Download YouTube Videos
videodb : VideoDB Python SDK
!pip install boto3 pytube requests videodb

Helper Functions

We've prepared a handy download_video_yt function to download YouTube videos in high resolution.
import pytube
import os
import time

# Downlaods Youtube video
def download_video_yt(youtube_url, output_file="video.mp4"):
youtube_object = pytube.YouTube(youtube_url)
video_stream = youtube_object.streams.get_highest_resolution()
print(f"Downloaded video to: {output_file}")
return output_file

Downloading Media

Let’s take this from the TV show "Breaking Bad".
video_url_yt = ""
video_output = "video_breaking_bad.mp4"
download_video_yt(video_url_yt, video_output)


We need to configure both AWS and VideoDB.

AWS Configuration

AWS Rekognition is a paid API, so please select your YouTube video carefully. Choosing a larger video may incur additional charges.
AWS secrets : aws_secret_key_id , aws_secret_access_key and aws_reigon
Ensure your AWS user has access to necessary policies:
AmazonRekognitionFullAccess and AmazonS3FullAccess
import boto3

aws_access_key_id= os.environ.get('AWS_KEY_ID', "")
aws_secret_access_key = os.environ.get("AWS_KEY_SECRET", "")
region_name = os.environ.get("AWS_REIGON", "")

bucket_name = "videorekog"
rekognition_client = boto3.client(
s3 = boto3.client('s3',

Analyzing the Video for Inappropriate Content

Using the Rekognition API

Upload a video to S3 Bucket and start content moderation using
# Define function to start face search in video
def start_content_moderation(video_path, bucket_name):
response = rekognition_client.start_content_moderation(
Video={"S3Object": {"Bucket": bucket_name, "Name": video_path}}
return response["JobId"]

# Define function to get face search results
def get_content_moderation(job_id):
wait_for = 5
pagination_finished = False
next_token = ""
response = {
"ModerationLabels" : []
while not pagination_finished:
moderation_res = rekognition_client.get_content_moderation(JobId=job_id, NextToken = next_token)
status = moderation_res["JobStatus"]
next_token = moderation_res.get("NextToken", "")
if status == "IN_PROGRESS":
elif status == "SUCCEEDED" :
if (not next_token):
pagination_finished = True
return response

#Upload Target video to S3 Bucket
s3.upload_file(video_output, bucket_name, video_output)

#Start Content Moderation using Rekognition API
job_id = start_content_moderation(video_output, bucket_name )
moderation_res = get_content_moderation(job_id)

Preparing Clips Timestamps

The Rekognition API flags moments in a video that are inappropriate, unwanted, or offensive by providing timestamps. Our objective is to consolidate timestamps that belong to the same sequence.
Though the offers a method for this, we will employ a more straightforward strategy.
If the gap between two consecutive timestamps is less than a specific threshold , they will be combined into a single continuous scene.
To ensure thorough coverage, we'll also introduce a padding on both the right and left sides of each scene.
Then, we need to do a compliment operation on video from inappropriate clips to get appropriate and safe content clips. Feel free to adjust the threshold and padding settings to optimize the results.
timestamps = []
threshold = 1
padding = 1

for label in moderation_res["ModerationLabels"]:
timestamp = label["Timestamp"]/1000

def merge_timestamps(numbers, threshold, padding):
grouped_numbers = []
end_last_segment = 0
current_group = [numbers[0]]

for i in range(1, len(numbers)):
# if timestamp is with threshold from previous timestamp, consolidate them under same group
if numbers[i] - numbers[i-1] <= threshold:
# else put last group's end and this group's start in result clips
start_segment = current_group[0] - padding
end_segment = current_group[-1] + padding
grouped_numbers.append([end_last_segment, start_segment])
end_last_segment = end_segment
current_group = [numbers[i]]

grouped_numbers.append([end_last_segment, numbers[-1]])
return grouped_numbers

shots = merge_timestamps(timestamps, threshold=threshold, padding=padding)

Removing inappropriate content from video using VideoDB

The idea behind VideoDB is straightforward: It functions as a database specifically for videos. Similar to how you upload tables or JSON data to a standard database, you can upload your videos to VideoDB.
You can also retrieve your videos through queries, much like accessing regular data from a database.
VideoDB enables you to swiftly create clips from your videos, ensuring a ⚡️ process, just like retrieving text data from a db.
Next, we will compile a master clip composed of smaller segments that depict appropriate contents only (i.e filter and exclude the clips with inappropriate content identified earlier)
# upload the video to db
video_url_yt = ""
video = conn.upload(url=video_url_yt)

# generate a stream link of safe_shots by passing values in timeline
stream_link = video.generate_stream(timeline=shots)
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.