multi-modal database means that we treat multi-modal data, such as images, audio, and video as first-class citizens and provide natural language search over them. That is, you can ingest multi-modal content into DataBridge the same way you'd ingest structured data into a database. You can perform updates over this information, extract metadata, or define custom parsing/processing rules (eg. redact any PII).
Your search queries would go through a planner which - depending on the kind of data we're retrieving - will call the correct tools to extract information from the data and respond to your query.
For instance, this could be function calling over object-tracking data if your query relates to object movements over a video. This could also be a call to ColQwen in case we're looking for particular features within a diagram-heavy PDF. It could also be a simple semantic search if thats what the planner deems most useful.
The idea is that traditional databases work the same way - query planning systems figure out the best path to execute the user query, and pass that to the query execution engine. We think a lot of this complexity can be abstracted away from the user - as long as we can provide them strong retrieval guarantees (the same ways Databases have SLAs).
Let me know if something is unclear here!