⚡️ Query transformation or processing is a crucial aspect of enhancing RAG pipelines, especially when dealing with multimodal information. By breaking down queries into their spoken and visual components, you can create more targeted and efficient search capabilities. ⚡️
While manual breakdown is a good starting point, automating this process with LLMs can greatly improve scalability and accuracy, making your systems more powerful and user-friendly.
# Manual query breaking
spoken_query = "Show me where the narrator discusses the formation of the solar system"
visual_query = "Visualize the Milky Way galaxy"
#Using LLM to transform the query
from openai import OpenAI
transformation_prompt ="""
Divide the following query into two distinct parts: one for spoken content and one for visual content. The spoken content should refer to any narration, dialogue, or verbal explanations and The visual content should refer to any images, videos, or graphical representations. Format the response strictly as:\nSpoken: <spoken_query>\nVisual: <visual_query>\n\nQuery: {query}
"""
# Initialize OpenAI client
client = OpenAI()
defdivide_query(query):
# Use the OpenAI client to create a chat completion with a structured prompt