Lots of caveats about difficulty to test and weird syntax.
But it is just that SQL is the most terse and standard way so express logic.
And that in itself is the most important factor to avoid bugs. Not what testing strategy you choose.
As someone who has written a ton of complex SQL... I couldn't disagree more.
Trying to shoehorn things that can trivially and intuitively be expressed in a couple of for-loops with a couple of variables, into SQL expressions making use of joins and aggregate functions and GROUP BY's and (god forbid) correlated subqueries... having to replicate subqueries because their results in one part of the query can't be re-used in another part... teaching people arcane terminology distinctions like between WHERE and HAVING... not having basic functions for calculating percentiles or even a basic median... certain basic kinds of logic operations that simply can't be done... flipping a coin on whether the query engine will use the index and execute in 10 ms, or decide to follow a different execution plan and take 5 minutes to run...
I have never encountered more bugs in business logic than in dealing with SQL. It obfuscates what should be clear. SQL isn't a solution for avoiding bugs, it's what causes so many of them.
> not having basic functions for calculating percentiles or even a basic median
I don't think this is really true anymore; windowing functions are pretty prolific, and I think every major database will have some percentile functions
It also helps me understand my colleagues.
It may be how our brains are wired. For me whenever I see a for loop I see bugs.
It is also about that you get the concurrency from map/reduce-type jobs for free without having to think about it.
But yeah there are a couple of places you need to be careful. Avoiding duplicates and handling of NULL.
In fact testability is one of the best parts. You can safely test read-only queries against a prod secondary database to see that it gives reasonable results on real data, and use the repl to explore parts of your query to really get a sense that things are working.
For some reason, folks assume SQL must be written badly since they have only written it badly or seen it written so.
It is absolutely possible and preferable to write maintainable SQL logic into user defined functions, stored procedures, views, materialized views, CTEs, temporary tables, etc. If you're looking at one huge pile of monolithic, untestable SQL, the problem isn't the SQL.
One doesn't write O(n^3) algorithms in C++ and then blame C++ for it being slow. For some reason, folks seem pleased with themselves to do as such with SQL though every day and twice on Sunday.
Got subselects in each of the fifteen outer joins with NULLs all over your schema, and now you're upset performance is horrible and inconsistent? PEBKAC.
A join or a group by is going to be much clearer than writing the code the query plan is going to generate, creating temporary hash maps, doing nested loops, etc.
We are headed back in the other direction, but with some hedge. Code with a very simple & transparent CI/CD experience seems equivalent to some SQL configurable thing in our minds now, but with way more potential upside.
In our latest code-based abstractions, nothing is stopping you from breaking out a SQL connection and running any arbitrary query. In fact, all of the data is still in SQL and it's still authoritative, you just now have access to way more powerful tooling to work with that data. Code+SQL together is the answer.
For non-expert team members, modern C# is turning out to be way more intuitive than SQL the moment you encounter a 3+ table join scenario.
> For non-expert team members, modern C# is turning out to be way more intuitive than SQL the moment you encounter a 3+ table join scenario.
Partly because this is so inefficient that it’s never going to work at scale, but also because of how you’re moving the complexity into the realm of “magic”.
I know it’s very easy and intuitive to use one of C#s ORMs, but with that comes an reliance on things like linq and a model builder, both of which may not work the way you think they do. If your developers think about it at all, and having seen so many C# developers use IEnumerable where they really should have been using IQuerable… well if your developers can’t do relatively basic SQL then I’m not sure I’d trust them with the abstraction either.
I don’t think you’re really, wrong, either, but I think it’s much more a question of building developer tools to help your developers handle good data access than it is about picking a particular tech on the consuming side.
As far as data storage goes, however, I think we’re just beginning to see the move away from the classics because the classics just weren’t build to support how we use data in 2023.
I can’t begin to tell you the nightmare it is to update old school SQL “data wells” to be capable of temporality, good BI access as well as being compliant with the various EU legislative rules, and almost none of that can be done above the database. Well I guess you could, but you’d probably go insane. And that’s just path of it, the other part is just how much data we transact now. It used to be that banks were basically the only organisations to move massive amounts of tiny bits of data, and now we all do it. Like, a single solar plant moves a gazillion points of data into your various systems a minute, where 30 years ago the entire data for that place might’ve been a couple of kb a year, it’s now mbs a day.
When teammates look at my code, and logic is all right there instead of scattered around, and there's a schema file backing it all that makes it clear what all the relations are, they have an easy time making tweaks or adding on. Yes it's very testable too.
This also kinda depends on having a multi-service architecture if your system is large. Separate database for each. That's a good thing anyway.
To me the most infuriating thing is the "SQL query scattered around multiple files" pattern, where a backend engineer will decompose a perfectly fine SQL query into 3 or 4 files, with multiple functions, often with very artificial separations (for example: a function just for the "select ..." part, another for the joins).
All that in the name of having small files, small functions, small lines. You take complexity away from the "micro" parts and embed it into the invisible parts of your program.
I'm not sure about that. SQL was the first language I learned and the language that has always been there throughout the decades, and is also where I make the most mistakes.
Transaction-safety is also sublime for business logic.
At the same time, the level of hype about ECS today reminds me an awful lot of the amount of hype surrounding OOP in the 90s. Can ECS be a better way to structure your game entities and make your game loop faster? Yes. Will it make your teeth whiter and your partner love you more? No.
(There are ECS frameworks in JavaScript, which gives you absolutely no control over memory layout and thus completely defeats one of the primary purposes of the pattern.)
Like any pattern, it exists to solve concrete problems. It shouldn't be the One True Way To Think About All Programming Henceforth and Forever.
When the author says things like:
> For example, how would you model chat messages in your game? I suppose you’d have to represent that as an entity in your game. How would you represent a constraint that would prevent a health component from being added to your chat message erroneously? In ECS it’s straightforward to create a system which operates on chat messages individually, but how would you query your chat messages so that you can display them in order?
To me, that just means "Don't use ECS for those." I have a really nice coffee mug that is just perfect for holding coffee. It does its job very well. That does not mean I feel any need to use that coffee mug for digging holes in my garden.
Databases and the relational model are great. ECS is great. Object-oriented programming is great. Functional programming is great. But treat them all as tools that should be used for the right job.
I appreciate that most of the ECS hype has been around specific use cases, though.
OOP was claimed as not a specific useful tool, but the answer to all programming, a billing it has not lived up to. It achieved "useful tool", no question, especially as some of the very rough bits were sanded off the original proposals (particularly the idea that objects should only and exactly match real world objects, an idea which I believe in hindsight was accidentally carried over from a simulation worldview in Simula where maybe it worked into the general programming world where it didn't), but it certainly has failed to be the one true programming methdology.
I haven't seen anyone claiming everything should be rewritten in ECS.
I recently watched the first 30 minutes of Mike Acton's 2014 talk, and while that portion of the talk wasn't about ECS specifically, it very much presented an absolutist perspective.
Depends a lot on where you hang out. On amateur gamedev fora, I have seen many many many posts from beginners where they are struggling to cram ECS into their game and feel they need to because it's simply "the way" that one architects a game. Even if their game is written in a language that offers no performance benefits and the their simulation benefits nothing from it, they just think they have to.
It's heartbreaking watching someone go, "I know I could just store this piece of data right here in my entity class, but I'm not supposed to because of DoD, so how should I do this?" And then they get back confidence answers that involve pages of code and unnecessary systems.
It's exactly like the OOP fad of the 90s, just in the opposite direction. Yes, it turned out you don't need to encapsulate all data in classes. But, also, it is OK to just store data in stuff. You don't have to make every letter of your pop-up dialog a separate component.
I try not to focus too much on the ECS side. It is all about understand the problem you need to solve. If you are making a game, and you know exactly what the game needs to do from a programming perspective... write it.
Yes, while I would keep the "data oriented" viewpoint... but do you really need to spent time on some ECS layer? If you know exactly what each character in the game does, write it and solve it. Are you making a game... or trying to create the next Unreal Engine?
While JS does not provide great support for bit packing complex structs, typed arrays give you quite a bit of control over memory layout for simple numeric types, which is what you usually want for optimal data-oriented code anyway. This is a common technique used in fast JS libs for data visualization, ie:
https://github.com/mourner/flatbush
There are also basic operators required for bitarrays, which are useful for ECS and memory-efficient code generally.
I’ve spent a lot of time thinking about how to incorporate Data Oriented Design, SoA and ECS into the normal boring business logic at work, and I think it’s interesting to think about keeping data in the same form as it is in the relational database and skip the impedance mismatch in Object-Relational Mapping.
ECS only makes sense if you have real structs like in .NET or C/C++/Rust/Swift/etc, and tight latency requirements, but I think when someone learns a powerful concept like ECS, they want to invent a reason to use it so they can actually use it in action. I know that is what I’ve experienced at least.
It doesn't give you direct control over the memory layout, but it's still fairly safe to assume that arrays are going to end up in relatively contiguous memory, which the relevant part for the performance difference between structs of arrays and arrays of structs.
I don't recall explicitly telling many compiled languages to stick all the items in an array together in memory either - it just happens by default, same as in JS.
Depending on how the GC works, the elements themselves might still wind up next to each other in memory some of the time. But that is definitely not a pattern that you would expect to hold in general- components will be added and removed over the course of the game, with lots of other stuff happening in between.
In C++, the layout of the array elements is actually part of the language semantics. In JS, the language semantics don't even have a way to talk about that layout, and engines in practice don't use the layout you want.
A contiguous array of pointers to the actual objects doesn't buy you much. You're still doing an indirection and risking blowing your cache each time you do something with each element. It may be the case that the objects the array elements point to are contiguous in memory, but that's entirely a roll of the dice, and those dice get re-rolled on every garbage collection.
ECS, OOP, functional programming, and others serve as methods to organize software — the developer interface. However that role should belong to IDEs — an actual user interface for developers.
Everybody has different hierarchy of their software, esentially developing their own software engine which sucks, because doing it right is extremely hard and time consuming.
What Steve Jobs told long time ago is still true today, "Paradoxically, we need more sophisticated software to make it more easy for the user" (paraphrasing). This is what we need - trully general software engine with IDE made right, what will unify organization of all sofotware and beyond.
For some deeper insights, in the context of the Web, I suggest reading Graydon Hoare's post [1].
>Instead of deploying a web or game server that sits in between your clients and your database, your clients connect directly to the database and execute your application logic inside the database itself. You can write all of your permission and authorization logic right inside your module just as you would in a normal server.
Why? Databases should never, ever, ever, be used to perform logic, they are datastores, that is it. Your logic goes elsewhere. Stored procedures are the worst "feature" of any database, you are just asking for a hard time debugging, troubleshooting, and increasing the chance that you will fuck up the most valuable part of your system, your data.
> This means that you can write your entire application in a single language, Rust, and deploy it as a single binary.
It also means you have a single point of failure, no read-replicas or redundancy. Hate everything about this.
It’s not difficult to debug at all, you might just be unskilled.
I agree with your larger point but this seems too harsh: it’s definitely harder to debug simply because, as with microservices, understanding how the app is functioning now requires you to understand different code in multiple languages and locations, you’re highly likely to hit non-portable behavior across databases for authoring and debugging, and you’re never going to get a debugger with the whole flow in context.
That doesn’t mean there aren’t benefits as well and it could be especially useful as a way to force distinctions about contracts for common operations, but I wouldn’t say it’s right for all or even most projects. The sweet spot is going to vary widely.
You're talking about a best practice like it's a fundamental law. It's not, it's just how we've mostly been doing things. A lot of interesting innovations in distributed systems / architecture (serverless, graphql, thin clients, thick clients, ORMs, RSC, WSGI, nodejs) have been made because the designers tried relaxing a constraint or taking a counterintuitive idea to a maximalist place.
In fact, if you look harder, there are a fair number of existence proofs of successful systems built on stored procedures. There's even a "best practice" phrase recommending doing compute as close as possible to the data.
If the language is nice and has well designed access to db facilities like records and such, it can be better than writing your code outside the database, especially coupled with a data-oriented design model/ECS (which can be extremely debuggable and offer great visibility).
>and increasing the chance that you will fuck up the most valuable part of your system, your data.
Since you can write anything outside you can inside, no. I can send a "delete from/drop <table>" from any client at any moment, or make any mistake in updating. That's what backups are for, and databases make them even easier (not to mention transactions being very good and neglected part of business logic).
You cannot sanely use a database with multiple heterogenous clients without putting logic at least about what valid data should be in it. This is gonna include some “business logic” in practically any real-world system.
Otherwise you have to elevate the same functionality to some gatekeeper-daemon that’ll almost certainly perform far worse, lack features, and be an eternal source of dumb bugs, including, I can just about guarantee, data corruption bugs.
I wouldn't go that far. Relational algebra is performing logic. Constraints and foreign keys are logic, as well.
I'm not going to argue that you should go back in time 15-20 years and start shredding XML strings in stored procedures again. But thinking of the database as one step above a flat file is similarly backward thinking.
More than that, the concept of putting application logic adjacent to the data store is sound. That's exactly what a web API or a microservice is doing. They allow a uniform mechanism of requests and responses to a data store. At a concept level, that's exactly the same thing. The concept of keeping logic at or immediately adjacent to the data layer so that a whole range of disparate applications can request and maintain data from the source is what the design goal of "database side logic" is.
But at a practical level they very different. If you have a middleware layer as you are describing, it's written in a real programming language with all the adjacent tools (source control, debugging, etc).
I'm not hard-core against stored procedures used lightly but they have a lot of downsides and they simply aren't needed. There's no performance advantage. There are complexity advantages and disadvantages that might be a wash.
Like debugging - if every memory access is a database access, then you have a builtin logging for all your memory access that is both more performant and optimal then any normal normal debugger. You can take snapshots of your 'memory' pretty much at any time, the data layout is clearly defined. You can manually edit your memory any time you want, and serialization is already solved for you.
The more I think about it, the more possibilities I come up with. Damn, this is actually genius!
1: Separating stores was crucial during the 90s and earlier when people were still writing in memory-unsafe languages (C/C++ cough) since it could cause wild corruptions with simple stray pointers. Process-separation was just a sanity thing. As you notice these are other languages in play here so random corruptions shouldn't happen (memory exhaustion can still be a thing though with their model)
2: Yes, debugging stored procedures/triggers on SQL-Server,etc is a PITA because they're database first centric objects, however the idea here is to make the database fill the job of app-servers of the 90s/00s with "regular" debugging workflows for developers. (Don't confuse implementations with the concept)
3: And MOST importantly, this is a game-focused thing, gamedevs will often end up replicating most cache/database functionality (badly) to squeeze things into main memory with the goal to achieve realtime performance targets anyhow, why not forgo that duplicated crazy work with a solid framework?
4: As a corollary to the above, the benefits of separating storage from applications (to run multiple applications against the dataset as it often happens in enterprise scenarios) isn't really the focus, application to database mappings are intended to be more or less 1:1
How does writing an application as a single language/binary prevent read-replicas or redundancy?
The application could do that.
Or you could do it at disk level (RAID).
How do you manage multiple instances of the application and not introduce split brain?
> Or you could do it at disk level (RAID).
That is not comparable. RAID's redundancy is not the same as the redundancy in a multi node database cluster. You have one service, not multiple services, network card gets fried, your database goes down, you can't promote a standby to master and be on your way. Also RAID is a single disk as far as the OS is concerned, so you could hit I/O limits (especially if you have a single binary) that cause your app to chug, you cannot split your writes and reads across different physical or virtual machines that have different disks.
This sounds like a tooling problem. One could imagine a database that doesn't have these issues.
> It also means you have a single point of failure, no read-replicas or redundancy.
What? Why would any of this be the case?
less moving parts, diversity and complexity in your stack (different languages, servers, build/deployment systems, etc).
> you are just asking for a hard time debugging, troubleshooting
what specifically hard about this in your opinion?..
It seems like it would solve a lot of problems by eliminating the need for data broker apps / endpoints that simply put POST or GET parameters into different SQL queries based on which endpoint you hit. I have no idea if it would scale but I don’t see why it couldn’t.
This sort of idea came up earlier this month in another post as well:
PostgREST does this and it was a pretty nice experience when I used it.
There's also little benefit because you still need to track application user data, so you still need a user table.
Firestore is a NoSQL db so it's a quite different paradigmatically though.
[0] - https://firebase.google.com/docs/firestore/security/get-star...
EDIT: nevermind, fixed after a refresh ¯\_(ツ)_/¯ still maybe worth a look
I'm on desktop (Edge) and don't see a scrollbar.
Very distracting.
firefox on android no longer supports mobile view?
Edit: it actually does not allow any unchecking. Just "you agree to these marketing and tracking cookies by using this site" Nope.
Baffling. It does not lend confidence in your core product. If you're not respectful of your casual reader's data, how can we expect you to be careful with your user's data?
I recommend zero tracking, or if you must, have it be genuinely opt-in.
Strongly recommended viewing if you are interested in understanding why DoD is a big deal for performant applications (hint: it's about the cache).
For those seeing this (SpacetimeDB) and immediatelly conjuring images of nightmares to be, consider the following: if you had an extremely latency sensitive usecase and had the opportunity to host your database and business logic on the same machine, why wouldn't you?
I first learned about SpacetimeDB from a HN thread posted three months ago: https://news.ycombinator.com/item?id=37146952
Everything else about how you might query these layouts is more superficial... you can provide the same API with either layout, the same way you can in relational database systems (both layouts can be queried with SQL, but with different performance characteristics.)
I was questioning whether this was the career but, after a few years, decided to give it a go.
Job interviews, particulary then, were about "OOP this" and "OOP that" and I would purposely be agreeable but was unahppy with the code I was writing. Eventually I would come out of my shell with views to other devs.
Roll on to 2014 watching Mike Actons "Data Oriented Design" - and I immediately felt at home, not because I consider myself an 'expert' or on the same league as seasoned game programmers but I felt I had the right mindset all along!
In my 20 years, primarily a C# developer to pay my bills but at home my personal projects are written in C, and tried D.. and now Odin. I just prefer the control they give.
(Scheme as well, I do like Scheme)
Whenever someone asks me (a developer) what Data Oriented Design is, I try to explain to them it is like building a decent database. You are not thinking about the code (so much) but the representation of data. With databases, you create tables that have relations with other tables, with keys and indexes. You are setting up "lookups" etc.
Once they grasp the idea of building a database (which most developers can easily understand) - it is a case of transferring that energy not to tables in a database, but to data structures in your programming language.
Of course, this idea is easier to understand if you have experience in languages like C. For those coming from Python or Java or C#, etc, can be a little trickier but only if they think purely in the OOP mindset.
However, if they struggle to grasp this, then ECS is a great way for understanding in that OOP-ish way. The penny drops when I discuss Entities and Components. You are not building classes and inheritence, you are building entities and components. It is more flexible creating an Entity and "attaching" a Component for it to do something. Then you don't have some kind of Update method in a class, you just have one function which passes in all of it's type.
Anyway - it all comes back full circle now reading this article. Great reading! Now I can just refer people to this link instead.
I am thinking that iteration is just traversal and traversal is just execution.
Take iterators in C++ or standard library "algorithms" library in C++. Or iterators in Rust or in Java.
You just want to traverse and collect values (and calls on functions in some interleaving), which is like joining tables as this article says.
I'm thinking that the definition of the traversal (such as Kafka pipeline, clojure threading syntax, Clojure data driven design, Javascript lodash pipeline) can be mapped to tuples and then the computer can optimise arbitrary chains of traversals based on the number of potential tuples are available and what traversals are equivalent paths or routes to the same traversal.
In other words, every program is a compiler pipeline or database query pipeline.
Maybe Prolog and Datalog can help here. Optimisation of arbitrary traversals and determination of identical traversals.
Or monads are just ordered traversals and OCaml compilers are just traversals of execution (function application) and relationship following, which are joins.
The relational model is everywhere.
This is interesting - could you elaborate a bit more?
Fortunately there’s Reader mode.
The great promise of ECS is that once we all have 128-512 core CPUs, that we will actually be making great use of them in the videogames of the future rather than throwing away the vast majority of performance which is the current world with the vast majority of Unity and Unreal games.
Once you hit the real world and start making queries of everything, everywhere, all at once, maybe it'll be lock, lock, lock on your "database queries"
And also, it seem like this DB really demands a game engine like Bevy actually get finished!
He makes the point that databases are today prescriptive on the structure of data you provide to them
I think databases as we know them today are not yet "end game"
I’m sorry, but the difference between "databases are slow" and "the only way I can interact with databases is slow" is only pedantics. I understand that, as a database developer working hard to squeeze every bit of performance possible this is frustrating, but as a developer I couldn’t care less about the difference.