Really, the big thing that I see here is the transaction retry logic, which (if the writes in your transaction depend only on the reads on not side effects, which you’d need to declare) would be a nice feature for any db engine, and (if the side effects are packaged together in a safe way with the relevant reads and writes, which can be more general) would also be a nice feature for a high-level db client library, independent of backend. But that’s a pretty thin reed to hang a dependency on a proprietary SaaS solution over, say, Postgres on.
There's no way to express data structure knowledge in sql, only relations and keys. Discoverability can be quite lacking. And every time I want to join table A to B I have to re-define everything, because SQL doesn't store that.
ORMs help. They help because they encode relationships in meaningful ways. A has many Bs, so A.B works, and I don't need to repeat this join logic every damn time. But ORMs have down sides too. Sometimes they generate queries that are really sub-optimal.
I think my favorite ORM usage was with Hibernate. I KNOW!!! THAT DEVIL! But honestly we wrote SQL in hibernate, and then invoked it to populate our data. Yes it was a bit more work than RoR's "order.items.where("price > 40")" but when complex things happened, it was always easier.
I've always looked at nosql as "absolutely, sounds great, how do you represent relationships?"
Relations, keys, and (though you forgot them) constraints express…quite a lot of data structure knowledge.
> And every time I want to join table A to B I have to re-define everything, because SQL doesn’t store that.
SQL stores that if you tell it to; the usual way being view CREATE VIEW.
For example, you have a many to one relationship between posts and users. Instead of this:
select *
from user as u
join post as p
on p.user_id = u.user_id;
You could do: alter table post
add constraint fk_user_id foreign key (user_id)
references user.user_id;
select *
from user as u
left referent join post as p;
Any good sql auto-completer will also look up the foreign key information and auto-generate the on clause for you as well. Redgate SQL Prompt (mssql only) is one of the best tools out there for this reasonhttps://www.postgresql.org/docs/current/queries-table-expres....
A richer type system would help there, but to my knowledge this isn't offered by any major relational DBMS.
Postgres allows you to define custom types at least; making some enums or composites might give you some measure of sanity..?
Wow.
And now both of us are back on non-SQL datastores: Convex for him, and Nomad's combination of Raft+MemDB for me. While Convex sounds influenced by transactional memory (define your critical section and let the platform keep trying to apply it until it succeeds), Nomad opted for Raft's serialization of mutations. On the read side we both opted for explicit index use, so it seems fair to assume both of us feel like SQL's decoupling of indexes from queries does more harm than good. Query planners are great until they're not.
I would love a SQL (well SELECT) interface to Nomad's datastore (MemDB). SELECT is like the C ABI for data instead of code: imperfect, but a lowest common denominator that if implemented unlocks a ton of workflows.
I wonder if jamwt feels the same and both projects will eventually grow a SQL adapter (or at least a SQL-like DSL... Nomad is close with bexpr filtering on a number of APIs, but JOINs would be really useful).
SQL is the C ABI of querying for sure. BI tools will never adapt to use Convex directly, and nor should they.
So... yes, Convex actually had a prototype SQL adapter for the read side of things back in the early few months when we were figuring things out. Convex's read semantics are very compatible with SQL.
We've kept this adapter on ice in part because of point #3 in the article -- we don't want to overpromise things which are a bad idea.
Meaning, if we exposed SQL on the thing as-is, this would presumably be for more analytical type queries involving patterns normal Convex queries can't express. Right now that would be a Bad Idea because your website would slow down just like every other database system allows you to.
So the current recommended practice is use our Airbyte Egress connector (https://airbyte.com/connectors/convex) and get yourself into an offline Clickhouse/MySQL/Snowflake whatever and jam SQL over there to your heart's content. That's basically what we do.
We may one day abstract this away by embedding some sort of mirrored column store sql thing (maybe DuckDB based? who knows) so you can do your analytical work without impact on your OLTP or integrating 3 more systems. But the team hasn't invested in that yet and probably won't for some time.
> Convex actually had a prototype SQL adapter
Ha, this sounds like the heaps of scripts we Nomad developers have lying around for converting ad hoc bits of Nomad state to sqlite for debugging particularly stateful problems.
> So the current recommended practice is use our Airbyte Egress connector
Ah this is where we would like to end up with Nomad's event stream API: https://developer.hashicorp.com/nomad/api-docs/events#event-...
Sadly we've been unable to identify that First Sink to implement and open source. It's a lot of work to build and maintain production quality connectors, so it would be nice to do just one like your Airbyte Egress. This is an area where Nomad's onprem nature makes it hard to pick one solution that will work for a wide number of users. Someday!
Haskell: IO -> STM -> (pure)
Convex: (The browser / Convex Actions / other effectful environments) -> Mutations -> queries
All the same benefits re: retry and memoization.
Steal steal from Haskell, so many great ideas there.
It may actually be fine to just require everything to be typescript, but the idea that you'd require your application be written in the same language as your data store, and thus implement a different data store for each language you might want to write an app in (and not share between languages)... would formerly be thought of as pretty ridiculous?
But maybe it's not anymore?
But that seems to be the requirement for "the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions," yes?
In general, most of the purported benefits of convex seem to effectively require the db in the same language as the app, to make it all so seamless.
I guess we'll see if Convex can get popular enough to spawn lots of "what I wish I knew about convex a year ago" and "Goodbye Convex, hello Postgresql" posts...
> It may actually be fine to just require everything to be typescript, but the idea that you'd require your application be written in the same language as your data store, and thus implement a different data store for each language you might want to write an app in (and not share between languages)... would formerly be thought of as pretty ridiculous?
The vision is definitely aspirational, and is reflecting on the fact that JS/TS is becoming overwhelmingly the most popular language for projects. With things like React Native, for many teams even mobile apps are taken care of all on one language.
There will obviously always be apps written in other languages, and in those cases, a bit more type mapping will be necessary, because (for now) Convex functions are always side-effect free, automatically cached JS/TS running in a very particular runtime. But we'll work on making that translation as smooth as possible in our client libraries.
We have a Python client library out now ( https://pypi.org/project/convex/ ) and a Rust one coming soon, just to keep us honest in this respect.
> I guess we'll see if Convex can get popular enough to spawn lots of "what I wish I knew about convex a year ago" and "Goodbye Convex, hello Postgresql" posts...
Well, that's the dream, isn't it?
What about "all write operations on the database are done in embedded functions"? For the python and rust client libraries, do you write these embedded functions in python or rust, or typescript/JS? Do you effectively have to write a specialized update procedure in typescript for every update you want to do? This starts seeming harder to spin as an advantage if you aren't typescript in the first place. ("A custom stored procedure required for every update" is actually not something anyone's been looking for!)
The OP arguments seem to me to be written assuming you are writing typescript in your app, it's unclear to me how the advantages you claim translate if you are not -- such that I figured the product was actually only for typescript apps! It would be interesting to see a rewrite of it that explicitly targets those writing in, say, python or rust.
Underlying vision or assumption that we’ll all be using JavaScript/TypeScript in any case: Freaking really??
My limited experience with LINQ to SQL was very positive in these regards - but one is not writing a type safe SQL at all, instead a language that is type safe and is ultimately interpreted as SQL.
Same experience with some libraries in Scala - where it's possible to get compile-time type safety for SQL - but required onerous setup or synchronization/code generation tools to achieve. (Sort of comes with the territory.)
Have other implementations done anything for making queries easier (that don't involve requiring an IDE or anything, so just query language innovations)?
Edit: Not query language innovations but ClickHouse and Snowflake's "standard library" of builtin functions just keep getting better and better. Once I saw `parseDateTimeBestEffort` in ClickHouse I wanted that everywhere...
Another interesting bit of work in this space is LINQ on the microsoft side of things. And even list comprehensions end up having a scanning/summing language kind of feel to them that would be interesting to see translated into a new database query language.
This is a false statement. Both INSERT and UPDATE support JOINs and subqueries / CTEs. At least according to the standard - not every engine implementing them is another story.
Which don't? I'd have assumed anything inside of `SELECT`'s `FROM` would be allowed inside of `INSERT` and `UPDATE`.
Or maybe you're not saying you know there are implementations that have these restrictions just that any random implementation might not be there (yet).
Redshift (a PostgreSQL derivative, more or less) can run an INSERT with a join, but cannot do an UPDATE with a join.
Also, regarding the transaction functions, are those run locally, or are they serialized and run remotely on the database server? Both have their caveats.
Number 1 (and that approach to retries) already works just fine with Postgres, you just need a proper library (which wraps your function and retries on transaction isolation errors). But you also have to keep in mind that if you interface with 3rd party services during your transactions, those calls will need to be idempotent as well. This is actually the proper way to do transactions in application code with a SQL database, esp. if you're running with serializable transaction isolation.
Number 2 is very limiting, as you can't have all those third party services and libraries used between different operations in a transaction, which is often (I'd even argue - usually) very useful.
Since you're citing stored procedures as a viable alternative to convex's take, while not listing the above number 1, it sounds like you're doing 2.
Anyhow, good luck!
[0]: https://entgo.io
SELECT preceeding FROM is such a thorn in the side. :(
; SELECT FROM sources ... WHERE / ORDER / ETC ... DATA fields ...
What if DATA was a reserved keyword for column identifiers and other data fields that would normally be immediately after SELECT so they could appear anywhere in the query syntax?
Can you imagine how much time and effort that would save people?
Edit: You do need commas in ORDER BYs, that slipped my mind when typing out this pet peeve of mine.
> Can you imagine how much time and effort that would save people?
I can imagine how much extra time I’d spend typing “AS” in queries, which would ne necessary for column aliases to be distinct from new select items. And how much more time I’d spend reading unreadable queries in contexts like logs where they aren’t pretty-printed. Net, it seems to be a big loss.
> We don’t need commas in between joins.
That’s because each join is introduced by a string of one or more reserved words (including JOIN).
If each select field was prefaces with FIELD you wouldn’t need commas, but...
Also, I seem to recall that very old versions of Oracle did require commas between joins. (The join conditions had to be stated in the WHERE clause back then, instead of supporting the ON clause, iirc).
There are indeed times when a simple key-value store is appropriate, but the encoding of hierarchy and the relational calculus are useful for so many more use cases in business applications. SQL as a language is certainly flawed - it would be nice to be able to algebraically/programmatically join select clauses and conditions, for instance, or to perform multi-inserts - but the semantics of the storage and engine are based on some pretty mathematically sound foundations.
You may find the seminal paper by Codd as illuminating as I did if you read it.
I think any SaaS APIs that reach any degree of widespread adoption eventually get OSS implementations (eg S3's API is widely supported by FOSS and proprietary implementations). This evolutionary path is awfully similar to SQL's for better and/or worse.
Agree this is a necessary part of the change. Convex is working on our OSS strategy this year. Thanks for the feedback!
It's not new "in the world", but often new to our users, who understand it less intuitively than pessimistic locking.
Unless they're using Haskell or probably other very exotic environments with strict understanding of mutability and so on, they know Mutexes but not STM or other OCC patterns. So we lean heavily on explaining it. Not because it is original, but because it is unfamiliar.
The elephant in the room is, I suppose, that the modern internet literally would not be possible without NoSQL. It may be possible without SQL; that seems likely to me. Part of that is because NoSQL is a big umbrella, and covers extremely critical databases like Redis or even databases like Cloudflare's proprietary edge cache. But, even document stores are extremely critical to enterprise scaling; during Prime Day 2022, DynamoDB peaked at 150M rps. There's no SQL setup on the planet that could handle volume like that while still maintaining all the things that Make It SQL; you could start throwing read replicas and removing joins and cross-table lookups and sharding data and ope, you just re-invented a document store.
Here's the couple conclusions I have started operating by:
1. Document stores are, today, a strong choice at both low and high scales on the spectrum of systems scaling. Its great at low scales because you can avoid thinking about it too much. Its great at high scales because once you have the space to think about it you can attain substantially higher efficiency (performance+cost).
2. Making a document store operate more like SQL, for the things SQL is good at (joins, validation, etc) is a lot easier than making a SQL database operate like a document store for the things document stores are good at (Planetscale? there's a couple players in this game).
3. SQL-the-language sucks. There I said it; I'll die on that hill. The language was invented at a time +/- 2 years of Scheme, ML, Prolog, and Smalltalk. Our industry has rejected all of those. We haven't rejected SQL (yet). SQL is demonstrably, as a syntax, just as bad as those languages, evidenced by all the ORMs, injection attacks, etc. Databases tend to have a longer shelf life than programming languages, but SQL-the-language will die.
4. But, duh, SQL is fine. Go ahead and use it. In some situations it makes sense. In others it doesn't. Its a tool; one that has had 60 years to form around solving as many problems as possible.
[1] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f...
For a lot of NoSQL document stores, the simple acting of having multiple indexes was a bridge too far. Couchbase comes to mind here.
I can't think of many use cases for a document store over PostgreSQL. Maybe as a cache layer...but I'd just use a materialized view. Where some of the data structure was going to be unknown or user defined, such as a system monitoring tool? Server Density was one of the first big MongoDB adopters for this reason. Now we can store than in a JSONB column though.
> 3. SQL-the-language sucks.
You're entitled to your opinion. I've been doing this 20 years and basic SQL knowledge has been by far the most valuable and portable skill of my career. I'm by no means an expert, but you asked developers who have been in the field for 5+ years what a HAVING statement does and they have no idea. There's a huge gap in basic SQL knowledge, stuff that can be learned in < 1 week. I taught somebody who came out of a code boot camp basic SQL for a couple of weeks during an internship. Her first job made her acting DBA because she was the only person who knew SQL.
The hill I'll die on, is that ORM's and frameworks are creating an astounding lack of basic SQL knowledge in today's developers which leads to a lot of completely unwarranted griping about SQL. (not directed at you, just in general on this topic)
Its the same argument, just at a grander scale, for the generation before us and the COBOL mainframe programmers of old; database technologies just tend to move slower because data is very sticky.
You might not see the use-cases; but Apple does. Their usage of Cassandra sustains in the millions of QPS across over 300,000 instances. That's a scale SQL can't even begin to dream of. Discord does; they moved to Scylla [2]. Amazon does [3]. Blackrock Aladdin depends on Cassandra, and manages over $21T in assets [4].
Which kind of circles around to the fact that, while you say "a lack of SQL knowledge is what causes unwarranted griping about SQL", sure I can accept that, but I'd conversely argue that a lack of Solving Real Problems is what causes unwarranted griping about NoSQL. Schematization isn't a real problem; we have a dozen ways of solving that at different layers. Imperfect ACID compliance isn't a real problem; its a pseudo-academic expression of the guarantees that SQL offers, which then gets unfairly transplanted onto NoSQL databases as if those issues can't or wouldn't be addressed elsewhere. Even original article statements like data being inherently relational; data isn't anything, its just data, it just appears relational because its how we've been trained to think about data after 60 years of SQL-monoculture. If you start thinking about data end-to-end, how its displayed to users, on pages, components, screens, feeds, it really doesn't look relational at all; life isn't a graph, the graph is an expression of human pattern-matching on things that could reasonably be related, but in reality aren't; life is a series of snapshots. This moment, this page, this feed, this component, in this moment, for this request, for this session; that is the end-to-end view of data; documents, not a web.
But SQL is fine at that! In fact, its a best option at medium scales; the middle of the bell curve where 80% of people live. It may not always be, but its there today. Its that bottom 10% and top 10% where the solutions that work for most don't work as well; and I think NoSQL gets a lot of Majority Hate despite being an extremely strong solution at those scales. Know your tools, and be willing to replace them when the circumstances change.
[1] https://twitter.com/erickramirezau/status/157806381149547724...
[2] https://www.scylladb.com/press-release/discord-chooses-scyll...
[3] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f...
It's amazing how it all works. But only when it works. I've found it so flakey I don't use it for production but if the bugs were ironed out it would be amazing.
> We paused just long enough to take a sip of our Spicy Maya Mocha from Coupa Cafe.
To be honest, I think almost all drinks at Coupa Cafe are pretty bad.If we are begging DB engineers for things can we get a graph layer for accessing SQL tables please?
BEGIN;
SELECT post_count, ... from users where ... FOR UPDATE;
INSERT INTO posts VALUES(...);
UPDATE users SET post_count = new_post_count WHERE ...;
COMMIT;
For one, it's unlikely the number of posts per user is so important and so often requested that it needs to be cached in the database itself.Secondly, why would that value need to be stored with each new insert? Simply insert posts as they come, and calculate metadata about posts at a later time as a batch process, or when some part of the application actually request them.