VideoDB Documentation

Pages

Quick Start Guide

How Accurate is Your Search?

Introduction

When you index your data and retrieve it with certain parameters, how do you measure the effectiveness of your search? This is where search evaluation comes in. By using test data, queries, and their results, you can assess the performance of indexes, search parameters, and other related factors. This evaluation helps you understand how well your search system is working and identify areas for improvement.

Example

To keep it super simple let’s use a

countdown video⁠

of 30 seconds.

⁠

We can imagine information in video indexed as documents which are “timestamps + some textual information” describing the visuals as there is no audio in this video”.

We can use the structure as

timestamp : (start, end ), description: “string”

So, if we use index_scenes function

At (1, 2) - 29 seconds is displayed

At (2, 3) - 28 seconds is displayed

...

This continues until:

At (29, 30) - 1 second is displayed

Ground Truth

It is the the ideal expected result. To evaluate the performance of search we need some test queries and the expected results.

Let's say for the query "Six" the expected result documents are at the following timestamps:

We will call this list of timestamps our ground truth for the query "Six."

Evaluation Metrics

To evaluate the effectiveness of our search functionality, we'll can experiment with our query "Six" with various search parameters. 📊

The search results can be categorized as follows:

Retrieved Documents 🔍:

Retrieved Relevant Documents: Matches our ground truth ✅

Retrieved Irrelevant Documents: Don't match our ground truth ❌

Non-Retrieved Documents 🚫:

Non-Retrieved Relevant Documents: In our ground truth but not in results 😕

Non-Retrieved Irrelevant Documents: Neither in ground truth nor results 👍

We can further classify these categories in terms of search accuracy:

True Positives (TP) 🎯: Retrieved Relevant Documents

We wanted them, and we got them 🙌

False Positives (FP) 🎭: Retrieved Irrelevant Documents

We didn't want them, but we got them 🤔

False Negatives (FN) 😢: Non-Retrieved Relevant Documents

We wanted them, but we didn't get them 😓

True Negatives (TN) 🚫: Non-Retrieved Irrelevant Documents

We didn't want them, and we didn't get them 👌

💡 This classification helps us assess the precision and recall of our search algorithm, enabling further optimization.

Accuracy

Accuracy measures how well our search algorithm retrieves required documents while excluding irrelevant ones. It can be calculated as follows:

In other words, accuracy is the ratio of correctly classified documents (both retrieved relevant and non-retrieved irrelevant) to the total number of documents. 📊

To get a more comprehensive evaluation of search performance, it's crucial to consider other metrics such as precision, recall, and F1-score in addition to accuracy. 💡🔬

Precision and Recall

Precision is percentage of relevant retrieved docs out of all retrieved docs. It answers the question: "Of the documents our search returned, how many were actually relevant?"

Recall indicates the percentage of relevant documents that were successfully retrieved. It addresses the question: "Out of all the relevant documents, how many did our search find?" 🔍

The Precision-Recall Trade-off

These metrics often have an inverse relationship, leading to a trade-off:

Recall 📈:

Measures the model's ability to find all relevant cases in a dataset.

Increases or remains constant as more documents are retrieved.

Never decreases with an increase in retrieved documents.

Precision 📉:

Refers to the proportion of correct positive identifications.

Typically decreases as more documents are retrieved.

Drops due to increased likelihood of including false positives.

⁠

Search in VideoDB

Let’s understand the search interface provided by VideoDB and measure results with the above metric.

This function performs a search on video content with various customizable parameters:

query: The search query string.

search_type: Determines the search method. Keyword search on single video level returns all the documents .

SearchType.semantic (default): For question-answering queries. ( across 1000s of videos/ collection ) Checkout

Semantic Search⁠

for detailed understanding.

SearchType.keyword: Matches exact occurrences where the given query is present as a sub-string (single video only).

index_type: Specifies the index to search:

IndexType.spoken_word (default): Searches spoken content.

IndexType.scene: Searches visual content.

result_threshold: Initial filter for top N matching documents (default: 5).

score_threshold: Absolute threshold filter for relevance scores (default: 0.2).

dynamic_score_percentage: Adaptive filtering mechanism:

Useful when there is a significant gap between top results and tail results after score_threshold filter. Retains top x% of the score range.

Calculation: dynamic_threshold = max_score - (range * dynamic_score_percentage)

default: 20%

This interface allows for flexible and precise searching of video content, with options to fine-tune result filtering based on relevance scores and dynamic thresholds.

Experiment

Follow this notebook to explore experiments on fine-tuning search results and gain a deeper understanding of the methods involved

⁠

Here’s a basic outcome of the default settings for both search types on the query "six" for the above video:

1. Semantic Search Default:

2. Keyword Search:

Outcome

As you can see, keyword search is best suited for queries like "teen" and "six." However, if the queries are in natural language, such as "find me a 6" then semantic search is more appropriate.

Keyword search would struggle to find relevant results for such natural language queries.

Search + LLM

For complex queries like "Find me all the numbers greater than six" a basic search will not work effectively since it merely matches the query with documents in vector space and returns the matching documents.

In such cases, you can apply a loose filter to get all the documents that match the query. However, you will need to add an additional layer of intelligence using a Large Language Model (LLM). The matched documents can then be passed to the LLM to curate a response that accurately answers the query.