I think that's because thinking about the problem I am trying to solve is always the hardest part and I have to learn a syntax and semantics no matter what. And the syntax and semantics of SQL is mathematically linked to the mathematics of relational databases. Natural language isn't.
Furthermore there's decades of good technical documentation for SQL written by diverse authors for diverse levels of technical experience. Natural language projects are one off and writing documentation is usually a lower priority than making code go.
And then every one of these tools turns out to only be usable (barely) by some "data analysts" and never by the executives to whom the system was originally sold.
Meaning, that's there's a certain amount of complexity involved in solving any problem. While abstractions are great and useful, they reduce (by their nature) specificity (and consequently, functionality).
We see this issue over and over again with "no code" and "low code" platforms, which are great for to-do apps, but as soon as you get into real-world application requirements, the platform needs to become so complex it's easier to just use a programming language to solve the problem (bubble is a good example).
I think the same issue applies to data querying, but perhaps more-so.
The problem domain is different. Most of the time accuracy is the most important constraint with data queries. For example, if I need to get a list of patients to notify about a drug recall, "mostly correct" isn't going to cut it.
So then the problem becomes developing a language that's specific and can accurately describe and model the problem. Spoken languages aren't great at that. By the time you contort a language like english into a form that can accurately and consistently describe the query, it's probably easier to just use a language that was designed for querying, like SQL or PRQL, etc...
In fact, spoken languages are so terrible at describing problems an entire industry of business analysts, project managers, UX experts and others exist just for the purpose of translating what people need into what's delivered.
I doubt ML models are going to ever replace that. They're sure to provide assistance, but a statistical model is just that - no matter how many of them you chain together, how big it is, or how you weight the model.
Imagine you hire a highly skilled data analyst (e.g. 9 out of 10 proficiency in SQL) and start asking them questions about your database. They won't answer them, they'll ask you more questions. The conversation would go something like:
you: what is our churn rate by channel?
new analyst: where do we store "channel"? what do we use to process payments? where is that data stored? do we include discounts in MRR / churn? etc.
If a human can't do it, an LLM can't either. An LLM isn't able to write the SQL from scratch get the right answers without a ton of additional context. We're working on an approach using a semantic layer at https://www.definite.app/ if you're interested in this sort of thing.
Here it seems MQL isn't a query language as much as it's a text-to-SQL translator and you're right... without a bit more understanding of the data's role and purpose and intent it's a hard job for anyone, human or AI.
It strikes me that as I write an sql statement I'm not only using knowledge of sql but also knowledge of domain and database structure that I don't even think about until I need to show someone else how to do the query.
Why would text-to-sql be limited to information_schema alone? Human analysts would use additional documentation, why wouldn't an LLM-based text-to-sql system?
1. taking info strictly from SQL (e.g. information_schema, query history)
2. taking a user input / question
3. writing SQL to answer that question
An app like this is what I call "text-to-sql". Totally agree a better system would pull in additional documentation (which is what we're doing), but I'd no longer consider it "text-to-sql". In our case, we're not even directly writing SQL, but rather generating semantic layer queries (i.e. https://cube.dev/).
Besides, I feel like a data analyst should be able to know what questions to ask, not just how to translate business requests to sql.