But I don’t buy that this should be connected to a proprietary service. Or that the alternative query language should be a proprietary language. For something long-term or critical, it seems like a pretty large business risk. We don’t have to use any imagination to see how this plays out — look at Oracle. Though it could be worse — imagine you build a successful business on it and they go belly up in ten years.
I think I’d like to see one of the various “XQL”s out there emerge as the modern query language and start getting adopted as a native alternative in existing engines.
“SQL is particularly inflexible in terms of control over response: A query result is always a set of tuples. A query is incapable of returning a rich structure that is aligned with how the application needs to consume the result. ORMs solve this partially at best at the cost of hiding the capabilities of the underlying database. They also obscure the structure of generated queries, making it difficult to understand the resulting performance implications.”
The ORMs or later mentioned GraphQL are not the only approaches to solving object-relational impedance mismatch.
SQL is perfectly capable of serializing sets of tuples to XML (part of the SQL standard), and most SQL RDBMS implementations now support working with JSON data.
Serializing using SQL to XML and in the past couple of years to JSON, and deserializing the said XML/JSON to an object model in the programming language of choice is something I’ve seen used fairly often in my career.
Heck, I’ve even seen entire business logic for a very complex system implemented via thousands of stored procedures that always returned XML. Prior to 2010, this was not the blasphemy it is made to be today…
So to reiterate my point, SQL is not inherently as inflexible as it was made to be here, thus neither ORM nor GraphQL are a necessity for dealing with SQL output inflexibility (and both can be very useful tools, as always, completely depending on the context).
Fauna's advantage here is that this way of structuring queries is deeply integrated with the language (and underlying wire protocol) itself. For example, Fauna's response format supports returning independently iterable result sets (supported by cursor-based pagination under the hood), allowing you to lazily populate the result graph in your app based on further user interaction.
It's not lossy if your application can guarantee a json <-> datatype roundtrip and the json is validated with jsonschema (generated by your application)
In Rust it's something like this
https://serde.rs/ to do the data type <-> json mapping
https://docs.rs/schemars/latest/schemars/ to generate jsonschema from your types
https://github.com/supabase/pg_jsonschema to validate jsonschema in your database (postgres). with this setup it's interesting (but not required) to also use https://docs.rs/jsonschema/latest/jsonschema/ to validate the schema in your application
This is really hilarious framing to me.
The entire point is to make the whole process opaque to the developer (unless they insist upon explaining a query plan). SQL is about maximizing productivity and focusing on the information of the business. Being able to write a SELECT statement that has never been seen by any other computer on earth and have it execute ideally 95%+ of the time is pretty close to pure magic.
Why would you reset 20+ years worth of experience on an existing SQL query planner/optimizer/et. al.? Wouldn't you rather build & ship a working product?
Until cloud formation/terraform/infrastructure as code came around it was basically the only declarative language that most developers would come across (unless you count markup), and I think that was the cause of some of the resistance.
I agree, but the resistance, I feel, is due to really poor tooling.
No compilation step (so, how do you know you have an error? You run it). Run your query on test which has mostly live data, everything works, run it on production and suddenly that one query has abysmal performance and it's hard to determine that it's because of some missing index. The query explainer or describer is often understood by devs because they spend so little time with SQL itself, due to using ORMs.
There's no debugger. Hell, sometimes even the printf-equivalent is a poor substitute. You cannot step through a large SP, you cannot put in breakpoints, you cannot add watches, etc.
Unfortunately, the "tooling" for most devs are ORMs, which makes SQL even more opaque and more hard to debug, and reduces the pressure for popular DBMSs to develop decent debuggers.
I feel that if teams/projects insisted on "Only parameterized SQL queries in the program is allowed", SQL would actually get less maligned as people got more familiar with it.
When a developer first get that vision of a beatiful declarative notation to handle his case, he often tries to make it the only way to use the system. But a more flexible way to design a system is to always allow to supply an imperative handler (a callback, a subclass, an interface, etc.), and then present the elegant declarative option as such a handler, built-in, but replaceable, one of many.
With SQL the situation is more complex; it is not the elegance of a declarative notation, but also the complexity of the underlying task. The idea was to give a kind of relational calculus plus make it distributed, atomic, durable, and, most of all, multi-user with an illusion that each user is the sole client of the system. It is not only that the user is not interested in setting record locks manually: even if he is, there must be no such option. It is very much like an operating system running multiple programs where each program runs under an illusion it is the only one. I would say that it is the requirement of independent concurrency that made SQL what it is.
But although this is an important use case, it is not the only one. Sometimes there is no concurrency or all of it is under control. E.g. it can be a multi-threaded program that works with a complex model of something. Can we model that something relationally? I would love to. But in our case there is no independent concurrency and no ad-hoc queries. Given that simplification, do we still need SQL? Could we maybe arrange the same relational operations differently? Maybe imperatively? Could we gain something as a result? At the very least these are interesting questions.
I do understand the value of SQL + the planner for adhoc querying. But so many times I find myself reworking SQL to hint at the planner to use certain indexes, or to add "spurious" filters to make sure an index is used (spurious for application logic reasons). For applications with a relatively low cardinality of queries, some extra tedium might be worth stronger performance property guarantees.
Added advantage of the tedium is it would make a lot of "accidentally quadratic" stuff much clearer. There is no magic in the planner after all!
We can write inline assembler in C or Rust, it feels like it would make sense to offer something similar in SQL. Big problem is how to offer something that isn't too tedious.
Anyways FQL's index model seems to align with my idea, just unsure if I want the rest of it.
Classic example: Multitenant DB. I have a user table. I have a document table. I have a client table. documents and clients are key'd to user. In particular, there's a property of the system that document.client.user == document.user.
I do a "naive" query to count the number of documents per client, like:
- select count(*), client_id from document where document.text ilike %searchstring% and client_id in (select id from client where user_id = 1) group by client_id;
Now, it turns out that since I have this multitenant data, I have indexes alongside things like (user_id, client_id) on document. I also want to support text search so I have an full-text-search index on (user_id, document.text).
Without properly "guiding" the filtering (usually re-repeating the "where user_id = 1" fragment on every joining table) I will not be able to take advantage of my FTS index, because Postgres decides that's not slow (and _doesn't know about the relation between document.client.user_id and client.use_id). It instead tries hard to implement this either by pulling all of the client documents into memory and then doing an ad-hoc search there, or doing a FTS across all tenants (because I have an index on document.text for cross-tenant searching in an admin), and then scanning through that and removing by client_id.
This is a query plan I would not write if I wrote it myself, because it would be "obviously wrong". It would be a lot of work and clearly incorrect for what I want.
Meanwhile people writing features like this add indexes specifically to support certain workflows, and are often blissfully unaware that those indexes are not being used when it makes the most sense. Of course you gotta check experimental data and measure etc etc. But I like the idea of doing more data design before you start having performance issues, and I think that explicit index usage in particular would be a huge benefit on that front.
(there is a "fix" for the original query, involving repeating the user_id filter at other levels)
I may have used it at most once. But I'm not mr big data over here.
The only way is to ship all of the "modify" logic into the database via FQL. Given how many times I've seen code that does this unsafely with SQL, probably safe to say that if the bar is raised to "rewrite the logic in a foreign query language" developers will opt for thoughts and prayers instead.
> ORMs solve this partially at best at the cost of hiding the capabilities of the underlying database.
This is because all the popular ORMs target SQL instead of a specific RDBMS technology, like Postgres or SQLite. That doesn't have to be the case.
> They also obscure the structure of generated queries, making it difficult to understand the resulting performance implications.
Not quite true, Django for example makes it easy to do raw queries, and every generated query has .query() and .explain() methods to analyze the resulting SQL. This has always been a lame excuse to dismiss ORMs.
Is there anyone here who knows enough about these two products to do a compare/contrast?
I think they also recently nerfed the free tier.
I've been using CouchDB and Pocketbase.io much more successfully for my mini web apps.
Yeah, it's not perfect, but the suggested replacements are marginal gains, if anything.
In order to replace SQL, you probably need something like a 10x improvement, so it sells like hotcakes. Nothing about those examples screams that sort of improvement.
I mean, cool they put some stuff on the left instead of the right. And I guess the select clause is in a little better order now. Yay.
On the other hand, it shows how when something gets popular enough it gets stuck in place, unable to progress slowly towards better, because we keep calling it good enough and all marginal improvements that accumulate to something significant over time are rejected.
And this friction can be a significant problem, because it can cause systems to die overtime by being suddenly replaced, disrupted, from below by something that is a major improvement but completely incompatible with the status quo. Think of it as having a devastating earthquake after a long period of complete calm.
This is not good for anyone, we will be much better served by small gradual improvements to existing systems. This is how we came to be ourselves through evolution.
Previous HN discussion from a month ago: https://news.ycombinator.com/item?id=36866861
I want to write another soon. A free webapp saas type for consumers let's say.
Is there a reason I should learn this product instead of just continuing with modern mysql methods that seem to work and seem secure enough?
- Fauna is distributed and multi-region and therefore more resilient to hardware or regional outages (for example we barely noticed the last AWS us-east outage, except for the fact that it affected customer traffic to Fauna).
- You gain a lot of flexibility in terms of where and how you deploy your compute layer. Fauna works very well in concert with serverless platforms or edge-based compute like Cloudflare Workers. It's also possible to connect directly from your client/front-end, using Fauna to enforce end-user permissions.
- Even if you know SQL, it's worth checking out FQL. Simple queries in SQL are also easy in FQL, but more importantly, FQL gives you much greater control over the shape your query result, meaning you don't need an ORM to reconstruct your object graph. If you have ever used GraphQL, the experience is similar. Or you can see a few examples and comparisons with SQL on our FQL product page: https://fauna.com/fql
Dread it, run from it, SQL arrives all the same
SQL: WHERE category = "electronics"
FQL: where(.product.type == "Computers")
Also it changes case of USER to user between the two examples.
Minor mistakes like that hardly give one confidence in their product.
from users u join purchases p on p.user_id = u.id select u{defaults}, u[p{defaults} as purchases] limit 100 [purchases: 10]
returning rows that make `purchases` into a row-nested collection of up to 10 items instead of creating that many more overall rows and repeating the same user info multiple times.
E.g.in postgres:
SELECT u.*, ( SELECT JSON_AGG(p) FROM ( SELECT p.* FROM purchases p WHERE p.user_id = u.id LIMIT 10 ) p ) AS purchases FROM users u JOIN purchases p ON p.user_id = u.id GROUP BY u.id LIMIT 100;
Starting queries with from - what is the benefit of that beside personal preference?
And you can simply use a cte or a view instead of a field set.
Typical pattern us you find a list of ids and pull on them with a particular view that you want them in.
“Developers cannot take advantage of the full power of the underlying database for fear that the complex, opaque nature of SQL query behavior will overwhelm their ability to understand how queries perform, and quickly address problems when they do come up.”
What does that mean? Developers are somehow sacrificing database performance (“full power”) because they’re too scared to write SQL?
> maladapted to modern cloud computing abstractions such as serverless, and is difficult to manage at scale.
Got me.
There’s little money in it but working in an infra role for a while now - I don’t really see rolling your own infra as becoming any easier which is so sad.
There's nothing connection-centric at all to the language.
> Inflexibility of result structure: SQL is particularly inflexible in terms of control over response: A query result is always a set of tuples. A query is incapable of returning a rich structure [...]
Many modern SQL-based RDBMSes provide JSON support for this. Some support row/record values, which is -in SQL- more natural than JSON.
> Complicated, irregular syntax:
Well, few like the SQL syntax -- there is that. I wouldn't say it's complicated syntax though.
> [...] and allows only limited forms of composition.
CTEs and subqueries are the main methods of composition in SQL.
> Opaque, unpredictable performance:
Yes, well, yes, of course, because SQL is declarative. That the programmer gets little control over the query planner is part of the point of SQL. The query language really needs to be like this. What could and should be done however is to have a) ways of addressing parts of the query, b) ways of specifying out of band (i.e., not in the query) hints or requirements for parts of the query's planning.
I really would like to be able to:
- specify indexing for CTEs
- be able to pass in hints from outside
a query, like what table source to
make the "first" in a query plan,
what indices to use for specific
table sources, where constraints, etc.
> Rigid tabular data model: Despite the theoretical adaptability of the relational model to many domains, the flat, uniform structure the traditional SQL database imposes on records is especially rigid, forcing a significant amount of upfront design.I don't agree that this is a problem. You have to design your schema? The horrors.
> Introducing new query patterns and schema evolution is fraught with a significant amount of operational risk. Combined, these hinder iterative application development and risk locking in bad data model design decisions.
This is not exactly not true of alternatives to SQL-based RDBMSes...
TFA makes very strained arguments.
There are good arguments for a new language, but there's no need to make strained arguments along the way -- it detracts from TFA.