MQL – Client and server to query your db in natural language (opens in new tab)

(github.com)

59 pointsakashkahlon2y ago31 comments

31 comments

28 comments · 9 top-level

mritchie7122y ago· 7 in thread

text-to-sql is a dead end. There's no way for a model to correctly interpret the meaning of every column in a real world database using the `information_schema` alone. Most cloud warehouses (e.g. Snowflake) don't use foreign keys, so you don't even know the joins.

Imagine you hire a highly skilled data analyst (e.g. 9 out of 10 proficiency in SQL) and start asking them questions about your database. They won't answer them, they'll ask you more questions. The conversation would go something like:

you: what is our churn rate by channel?

new analyst: where do we store "channel"? what do we use to process payments? where is that data stored? do we include discounts in MRR / churn? etc.

If a human can't do it, an LLM can't either. An LLM isn't able to write the SQL from scratch get the right answers without a ton of additional context. We're working on an approach using a semantic layer at https://www.definite.app/ if you're interested in this sort of thing.

toddmorey2y ago

Agreed, but perhaps more semantic meaning could be expressed in metadata for tables and columns, extending beyond what's typically found in information_schema. (This may be the semantic layer you are talking about.)

Here it seems MQL isn't a query language as much as it's a text-to-SQL translator and you're right... without a bit more understanding of the data's role and purpose and intent it's a hard job for anyone, human or AI.

It strikes me that as I write an sql statement I'm not only using knowledge of sql but also knowledge of domain and database structure that I don't even think about until I need to show someone else how to do the query.

dragonwriter2y ago

> There's no way for a model to correctly interpret the meaning of every column in a real world database using the `information_schema` alone.

Why would text-to-sql be limited to information_schema alone? Human analysts would use additional documentation, why wouldn't an LLM-based text-to-sql system?

mritchie7122y ago

I should have clarified. There's a large number of apps that are:

1. taking info strictly from SQL (e.g. information_schema, query history)

2. taking a user input / question

3. writing SQL to answer that question

An app like this is what I call "text-to-sql". Totally agree a better system would pull in additional documentation (which is what we're doing), but I'd no longer consider it "text-to-sql". In our case, we're not even directly writing SQL, but rather generating semantic layer queries (i.e. https://cube.dev/).

boredemployee2y ago

Yes. And also, don't forget that different stakeholders ask in different ways, using different words, which turns out the situation in a nightmare. But I think it's possible to make it to work with mid-size databases.

tosh2y ago

providing some context about the data, the schema + samples from the entries works quite well, definitely room for improvement but already quite usable imho

mritchie7122y ago

Agreed, very usable if you know SQL and iterate from whatever the LLM spits out.

sjtly162y ago

agree, with familiarity with SQL one can use it as a reference for generating the first draft or even the final query

brudgers2y ago· 3 in thread

My experience with this kind of tool is that it is at least as hard to learn the tool as it is to learn the technology it abstracts over.

I think that's because thinking about the problem I am trying to solve is always the hardest part and I have to learn a syntax and semantics no matter what. And the syntax and semantics of SQL is mathematically linked to the mathematics of relational databases. Natural language isn't.

Furthermore there's decades of good technical documentation for SQL written by diverse authors for diverse levels of technical experience. Natural language projects are one off and writing documentation is usually a lower priority than making code go.

listenallyall2y ago

Agreed. Dozens of companies have built and sold "business intelligence" tools and report builders and visual query interfaces, all promising to ease the interface between man and data and make information easily accessible.

And then every one of these tools turns out to only be usable (barely) by some "data analysts" and never by the executives to whom the system was originally sold.

claytongulick2y ago

I think this boils down to fundamental complexity and information theory.

Meaning, that's there's a certain amount of complexity involved in solving any problem. While abstractions are great and useful, they reduce (by their nature) specificity (and consequently, functionality).

We see this issue over and over again with "no code" and "low code" platforms, which are great for to-do apps, but as soon as you get into real-world application requirements, the platform needs to become so complex it's easier to just use a programming language to solve the problem (bubble is a good example).

I think the same issue applies to data querying, but perhaps more-so.

The problem domain is different. Most of the time accuracy is the most important constraint with data queries. For example, if I need to get a list of patients to notify about a drug recall, "mostly correct" isn't going to cut it.

So then the problem becomes developing a language that's specific and can accurately describe and model the problem. Spoken languages aren't great at that. By the time you contort a language like english into a form that can accurately and consistently describe the query, it's probably easier to just use a language that was designed for querying, like SQL or PRQL, etc...

In fact, spoken languages are so terrible at describing problems an entire industry of business analysts, project managers, UX experts and others exist just for the purpose of translating what people need into what's delivered.

I doubt ML models are going to ever replace that. They're sure to provide assistance, but a statistical model is just that - no matter how many of them you chain together, how big it is, or how you weight the model.

ashok19982y ago

IMO, this tool is way simpler than SQL. Once setup is done it is very easy to use for non-tech people, In SQL you have to be 100% correct with the syntax which is not the case here.

dragon962y ago· 3 in thread

Genuine question: does anyone here actually want to query their database with natural language?

kadomony2y ago

It's really helpful with MongoDB Query Language (also MQL). Document models without a rigid schema and a less intuitive API are where this stuff comes in real handy. MongoDB's GUI Compass already shipped a feature to generate queries and aggregation pipelines from natural language.

dragonwriter2y ago

The people that hire data analysts do.

pylua2y ago

Is this to be trusted with things that have to be accurate such as a subpoena ?

Besides, I feel like a data analyst should be able to know what questions to ask, not just how to translate business requests to sql.

1 more reply

roydivision2y ago· 3 in thread

Or one could, you know, learn SQL.

kadomony2y ago

Most people would rather work in languages they already know. Natural language processing will allow programming languages to become as niche as assembly is, essentially. You won't need to interface with it much because the models will get that good.

akashkahlonOP2y ago

:D We have been working with many non-tech founders and business people who are genuinely interested in data but they cannot learn SQL, due to different constraints.

getravi2y ago

What are those constraints? Really usable SQL for Business people can be learnt in a day long workshop or less time. If they can do Excel, they can do SQL too.

1 more reply

zainhoda2y ago· 1 in thread

Nice job getting something released! How does this compare to the other similar open source solutions like Vanna AI and DataHerald?

akashkahlonOP2y ago

Thank you, we have not done that comparison yet, but we will check these 2 out to learn more. We calculated the accuracy with a test data set which is part of the repo, we will see how can compare this with others.

jhoechtl2y ago· 1 in thread

> As of the current version, MQL is designed to work exclusively with PostgreSQL

akashkahlonOP2y ago

Yes, we are working on adding MySQL support as well, would you suggest any other integrations after or before mysql ? happy to learn.

kelvinjps2y ago· 1 in thread

isn't SQL already a way to query your DP with natural language?

dragonwriter2y ago

No, SQL is not natural language.

kshitijb2y ago

For the majority of people from non-tech business functions, the ability to ask for insights from data is liberating. Tools like this can unlock their potential to make more informed decisions. Imagine a store manager of a hyperlocal grocery startup managing a dark store. What if they could ask questions like "What is the fulfilment rate of a certain SKU between 12-3 pm in their store for the past 7 days?"

Log_out_2y ago

That "natural language" will magic and away complexity mindset has done so much damage.

j / k navigate · click thread line to collapse

31 comments

28 comments · 9 top-level

mritchie7122y ago· 7 in thread

you: what is our churn rate by channel?

new analyst: where do we store "channel"? what do we use to process payments? where is that data stored? do we include discounts in MRR / churn? etc.

toddmorey2y ago

dragonwriter2y ago

> There's no way for a model to correctly interpret the meaning of every column in a real world database using the `information_schema` alone.

Why would text-to-sql be limited to information_schema alone? Human analysts would use additional documentation, why wouldn't an LLM-based text-to-sql system?

mritchie7122y ago

I should have clarified. There's a large number of apps that are:

1. taking info strictly from SQL (e.g. information_schema, query history)

2. taking a user input / question

3. writing SQL to answer that question

boredemployee2y ago

tosh2y ago

providing some context about the data, the schema + samples from the entries works quite well, definitely room for improvement but already quite usable imho

mritchie7122y ago

Agreed, very usable if you know SQL and iterate from whatever the LLM spits out.

sjtly162y ago

agree, with familiarity with SQL one can use it as a reference for generating the first draft or even the final query

brudgers2y ago· 3 in thread

My experience with this kind of tool is that it is at least as hard to learn the tool as it is to learn the technology it abstracts over.

listenallyall2y ago

And then every one of these tools turns out to only be usable (barely) by some "data analysts" and never by the executives to whom the system was originally sold.

claytongulick2y ago

I think this boils down to fundamental complexity and information theory.

I think the same issue applies to data querying, but perhaps more-so.

ashok19982y ago

IMO, this tool is way simpler than SQL. Once setup is done it is very easy to use for non-tech people, In SQL you have to be 100% correct with the syntax which is not the case here.

dragon962y ago· 3 in thread

Genuine question: does anyone here actually want to query their database with natural language?

kadomony2y ago

dragonwriter2y ago

The people that hire data analysts do.

pylua2y ago

Is this to be trusted with things that have to be accurate such as a subpoena ?

Besides, I feel like a data analyst should be able to know what questions to ask, not just how to translate business requests to sql.

1 more reply

roydivision2y ago· 3 in thread

Or one could, you know, learn SQL.

kadomony2y ago

akashkahlonOP2y ago

:D We have been working with many non-tech founders and business people who are genuinely interested in data but they cannot learn SQL, due to different constraints.

getravi2y ago

What are those constraints? Really usable SQL for Business people can be learnt in a day long workshop or less time. If they can do Excel, they can do SQL too.

1 more reply

zainhoda2y ago· 1 in thread

Nice job getting something released! How does this compare to the other similar open source solutions like Vanna AI and DataHerald?

akashkahlonOP2y ago

jhoechtl2y ago· 1 in thread

> As of the current version, MQL is designed to work exclusively with PostgreSQL

akashkahlonOP2y ago

Yes, we are working on adding MySQL support as well, would you suggest any other integrations after or before mysql ? happy to learn.

kelvinjps2y ago· 1 in thread

isn't SQL already a way to query your DP with natural language?

dragonwriter2y ago

No, SQL is not natural language.

kshitijb2y ago

Log_out_2y ago

That "natural language" will magic and away complexity mindset has done so much damage.

j / k navigate · click thread line to collapse