Imagine building videos like coding - declarative, composable, and infinitely reusable.
VideoDB Editor lets you create videos programmatically using code instead of clicking timelines. You define what you want (assets, effects, timing), and the engine handles the rendering.
This guide is your complete conceptual introduction. By the end, you’ll understand how to compose anything from simple clips to complex multi-layer productions - all through code.
Why Code-First Video Editing?
Traditional video editors are built for one-off productions. But what if you need to:
Generate 100 personalized videos from a template Build a TikTok content pipeline that runs daily Create video variations for A/B testing Automate highlight reels from live streams Code changes everything:
Reusability – One video asset, infinite variations Scalability – Loop over data to generate hundreds of videos Version control – Git-track your compositions Automation – Integrate with AI, databases, APIs The 4-Layer Architecture
VideoDB Editor uses a hierarchy where each layer has one job. Understanding this structure is the key to mastering composition:
Asset → Clip → Track → Timeline
Let’s walk through each layer using the simplest possible example: one video asset playing for 10 seconds. This is the “Hello World” of Editor - understanding this foundation lets you build anything.
Installing VideoDB in your environment
VideoDB is available as Layer 1: Assets – Your Raw Materials
Assets are your content library. They reference media that exists in your VideoDB collection but don’t define how or when it plays.
VideoAsset
Your main video content. Each VideoAsset points to a video file via its ID.
Key parameters:
id (required) – The VideoDB media ID start (optional) – Trim point in seconds (e.g., start=10 skips first 10s of source) volume (optional) – Audio level: 0.0 (muted) to 2.0 (200%), default 1.0 Real example:
from videodb.editor import Timeline, Track, Clip, VideoAsset
video_asset = VideoAsset(
# Create a VideoAsset pointing to a video file in your collection
id=video.id,
start=0,
volume=1
)
# Ready to use in a Clip
This says: “Use the video from your VideoDB collection, start from the beginning (start=0), and keep original volume (volume=1).”
Important distinction: VideoAsset.start trims the source file. Where it appears on the timeline is controlled later at the Track layer. This “double start” concept is critical - we’ll explore it more in Layer 3 (Tracks).
AudioAsset
Background music, voiceovers, or sound effects. Works exactly like VideoAsset.
Key parameters:
id (required) – The VideoDB audio file ID start (optional) – Same trim behavior as VideoAsset volume (optional) – 0.0-2.0 range (0.2 = 20% volume) ImageAsset
Logos, watermarks, title cards, or static backgrounds.
Key parameters:
id (required) – The VideoDB image ID crop (optional) – Rarely used; trims edges before rendering Crop the sides of an asset by a relative amount. The size of the crop is specified using a scale between 0 and 1. A left crop of 0.5 will crop half of the asset from the left, a top crop of 0.25 will crop the top by quarter of the asset. Images are static by nature - duration, position, and size are controlled at the TextAsset
Custom text overlays with full typography control.
Key parameters:
text (required) – The string to display font (optional) – Font object with family, size, color border, shadow, background (optional) – Styling objects Color format: ASS-style &HAABBGGRR in hex (e.g., &H00FFFFFF = white)
CaptionAsset
Auto-generated subtitles synced to speech. This is where VideoDB gets magical.
Important: CaptionAsset is a separate asset type from TextAsset. While TextAsset is for custom text overlays you write yourself, CaptionAsset automatically generates subtitles from video speech.
Key parameters:
src (required) – Set to "auto" to generate captions from video speech animation (optional) – How words appear: reveal, karaoke, supersize, box_highlight primary_color, secondary_color (optional) – ASS-style colors font, positioning, border, shadow styling (optional) Critical requirement: Before using CaptionAsset(src="auto"), you must call video.index_spoken_words() on the source video. This indexes the speech for auto-caption generation. Without it, captions won’t generate.
Supported Fonts for Text and Caption Assets:
Supported Indic fonts:
Noto Sans Kannada
Noto Sans Devanagari
Noto Sans Gujarati
Noto Sans Gurmukhi
Recap: Assets answer “What content exists?” They don’t yet define timing, size, position, or effects. That’s the Clip layer’s job. (rephrase)
Layer 2: Clips – The Presentation Engine
Clips wrap Assets and define how and how long they appear. This is your effects layer.
Every Clip must have an asset and a duration. Everything else is optional.
Duration – How Long It Plays
duration is a float in seconds. It defines how long the clip plays on the timeline.
Real example:
from videodb import Clip
clip = Clip(
asset=video_asset,
duration=10
)
“Play this VideoAsset for 10 seconds.”
Key insight: Duration is independent of the source file’s length. If your source is 2 minutes but you set duration=10, only 10 seconds play (starting from VideoAsset.start).
We get an error if clip duration greater than video/audio length.
Fit – How It Scales to Canvas
When your asset’s aspect ratio doesn’t match the timeline’s, fit controls scaling behavior.
Four modes:
Fit.crop (most common) – Fills the canvas completely, cropping edges if needed Use when: Filling the frame is priority, cropping is acceptable Example: 16:9 video on a 9:16 (vertical) timeline Fit.contain – Fits the entire asset inside the canvas, adding bars if needed Use when: Showing all content is priority, bars are acceptable Example: Preserving widescreen footage in a square format Fit.cover – Stretches to fill canvas (distortion possible) Use when: Artistic effect or abstract content Fit.none – Uses native pixel dimensions (no scaling) Use when: Precise pixel control needed (e.g., 1:1 pixel mapping) Real example:
clip = Clip(
asset=video_asset,
duration=10,
fit=Fit.crop
)
“Fill the canvas completely, crop edges if aspect ratios don’t match.”
Position – Where It Appears
Position uses a 9-zone grid system:
top_left top top_right
center_left center center_right
bottom_left bottom bottom_right
Real example:
logo_clip = Clip(
asset=logo,
duration=30,
position=Position.top_right
)
“Place the logo in the top-right corner.”
Offset – For fine-tuned positioning
from videodb.editor import Offset
clip = Clip(
asset=logo,
duration=30,
position=Position.center,
offset=Offset(x=0.3, y=-0.2)
)
This shifts the logo 30% right, 20% up from center.
Scale – Size Adjustment
scale is a multiplier applied after fit. Default is 1.0.
Real example:
pip_clip = Clip(
asset=overlay_video,
duration=15,
scale=0.3
)
“Shrink this video to 30% of its fitted size” (perfect for picture-in-picture).
Opacity – Transparency
opacity ranges from 0.0 (invisible) to 1.0 (opaque).
Real example:
watermark_clip = Clip(
asset=logo,
duration=30,
opacity=0.6
)
“Make the logo 60% opaque (semi-transparent).”
Filter – Visual Effects
Apply color/blur effects:
from videodb.editor import Filter
clip = Clip(
asset=VideoAsset(id=video.id),
duration=10,
filter=Filter.greyscale
)
Available filters: greyscale, blur, boost (saturation), contrast, darken, lighten, muted, negative.
Transition – Fades
Fade in/out at clip start/end:
from videodb.editor import Transition
clip = Clip(
asset=VideoAsset(id=video.id),
duration=10,
transition=Transition(
in_="fade",
out="fade",
duration=2
)
)
“Fade in over 1 second at start, fade out over 2 seconds at end.”
Recap: A Clip wraps an Asset and defines how long it plays (duration) and how it appears (fit, position, scale, opacity, filter, transition). Now let’s see how to place clips on the timeline.
Layer 3: Tracks – Sequencing and Layering
Tracks are timeline lanes. They control when clips play and how they stack.
The Track Object
A Track is a container you add clips to: