He is not the only one I've met who insisted on not having FK. I've even seen large systems prohibit using JOIN statements.
Personally, I see the data integrity out weights the inconveniences, do you use FK for your systems, what are your experiences?
Another classic is the “joins are slow” argument, which I believe goes back to a period in the late 1990s when in one, not highly regarded at the time, database, namely MySQL, they were indeed slow. But the reason “everyone” knew about this was precisely the oddness of this situation: in fact RDBMSes are highly optimized pieces of software that are especially good at combining sets of data. Much better than ORMs, anyway, or, god forbid, whatever you cobble together on your own.
There is, in my mind, only one valid reason to not use foreign keys in a database schema. If your database is mostly write only, the additional overhead of generating the indexes for the foreign keys may slow you down a little (for reading, these very same foreign keys in fact speed things up quite considerably). Even in such a case, however, I’d argue you’re doing it wrong and there should be a cache of some sort before things are written out in bulk to a properly setup RDBMS.
I recently migrated to EntityFramework Core (from the non-core version) and I’m actually impressed. Most SQL is pretty much what I’d write by hand.
Now granted, if there are complex joins, subqueries and stuff, I don’t even try wrangling the ORM to somehow give me that output, but still. I feel more comfortable just using EF than I used to.
Like simple operation
x = Ef.Find(xid)
x.Name = "something"
y = Ef.Find(xid)
what is y.Name ? Even though you didn't save anything to the database yet ? And the second Find didn't actually refresh from the database ?Oh and the random bugs where people improperly include related entities but it somehow ends up working because they are automatically added as you're firing off other related queries, until eventually it does not (usually in production only).
It's a really really complex system designed to look simple and pave over important details with "works most of the time" defaults.
Most of the time when I had performce issues it isn't EF. It's a missed index or higher level query issue.
The only person I knew who died on that hill would insist on doing two queries to the database, and then would insist on doing a client side cartesian join.
Unlikely in practice, though.
It was a pretty central table, and the inability to use FKs there kinda spread outward.
Along with the "indexes slow down INSERTs and UPDATEs" argument that you touch on. I mean, it is literally true that indexes make writes slightly slower, and an excessive quantity of indexes (which I have seen) can slow down writes enough to cause problems. But - in general - the slowdown is irrelevant compared with the overhead of querying a table that contains 2 billion rows using, oh, I don't know, a table scan because you don't have even a single index (I have also seen this).
> Much better than ORMs
These two things are not mutually exclusive though right?
It’s entirely possible to have a lightweight and relatively transparent ORM which makes full use of the underlying RDBMS.
I use JPA/Hibernate professionally, as a decision maker, but I don't think I'm in either camp entirely. ORMs aren't a magic wand, but they do help you standardize the boilerplate that you'd end up with one way or the other, in most cases.
ORMs are just a wrapper around RDBMSes. If your ORM is producing incredibly stupid SQL to query the DB with, you might want to check that you're not modelling your data in a stupid way.
I am by no means an expert, but in general I have found that if the ORM is doing something particularly crazy, it's because my underlying assumptions about the data model is wrong.
Got any arguments to back up this bald assertion?
In particular, I'd love to hear more about how to manage schema migrations on large tables with FK's without incurring lengthy locks or downtime.
Betting the answer is going to involve some variation on "well, don't do that" which is when I'll rest my case.
They made an query and indexing system on top of it to make it fast called TAO.
Without it you need to send a distinct SQL query pet parent object to get list of associated child object which would be awfuly slow.
I just ran into severe data corruption at a large client because a programmer four years ago wrote an empty catch block. The system would open a transaction, hit a fault, roll back, then continue writing to the database as if it’s still in the context of the transaction.
I spent some time trying to pin down exactly what it did, and found that many writes went through because of a missing foreign key constraint.
In short: if a particular table of minor importance had a foreign key constraint, there would have been no damage whatsoever, because it would have faulted immediately after the rollback.
You can’t rig up a constraint against every dumb write. But you can rig them up against some of the dumb writes. And sometimes that’s enough.
For that reason having constraints enforced by the system that stores the data, external to the app code which developers will inevitably mess up, is so useful and important.
So many issues in my experience have been similar to the one you describe. If the right constraints had been present so much work and so many headaches could have been avoided.
I feel like people who claim they're not needed or not important or "the app code will do it" need to be wrangled into maintenance work of old systems for a year or two until they repent. Rather than boshing out an ill-considered prototype and then moving on before the bugs are discovered.
Sure, you can guard against some forms of bad data and fix it as it comes in (or abort with errors if your system can handle that), but in order to do anything interesting, you need to make assumptions about the data. the only operation you can perform without assumptions is the identity function, and most software requires much more than that.
Since it just gets worse the longer you accept corrupted data, this is a good justification for "crash early" programming, like DB enforcement of data integrity.
Nothing is a hard and fast rule, but in this case the only way for an error to occur is if the query schema differed from the code select statement (column names changed). At that point wtf are you doing trying to keep running without errors, something is fundamentally mismatched between your application and the data.
In other words, the world is full of idiots; and any time I start forgetting about it, I read something like your post and I get a wake-up call.
What does R stand for in RDBMS is you don't use foreign keys and joins?
Please, keep using your FKs, stay safe and don't mingle too much with idiots.
But I still agree that OP's colleague is an idiot.
But it's definitely not what it means for the great majority of contemporary contexts.
Relations in modern RDBMS are usually aliases to foreign keys unless otherwise specified.
I agree that application logic goes into the application, but data integrity is NOT application logic.
Personally, I don't use foreign key constraints because:
1. It makes schema migrations and other data-management operations more difficult.
2. On insertion, the database needs to perform an additional check to verify that the record exists at the foreign key; this carries a performance cost; IMO, this is something which should be enforced at the application layer anyway.
3. It makes it more difficult to scale the database later because you can't separate tables onto 2 different hosts if one table references another using a foreign key.
BTW, about #3, the same argument can be made against using table joins. Once you start using foreign keys or table joins, you will be forced to run those two tables on the same host in the foreseeable future; it's very difficult, error-prone and time consuming to migrate away from such architecture if you have a lot of data in a live environment. Personally I prefer to design all my tables and front end applications to not rely on foreign keys or table joins. There is a good reason why databases which are focused on scalability (like MongoDB) do not support foreign keys or joins (or at least they try to avoid them).
I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular; each one only refers to a single kind of resource; this helps to simplify caching and real-time updates; it also uses fewer resources on the server side and I find that it makes the front-end code more maintainable. Also, I like to design my front ends to mirror the natural separation of resources within the database. When the user wants to open up a related record, they need to click on a link (the foreign key ID/UUID is used to construct the link to the related resource); this loads up the other record as a separate step. This creates a very smooth (and fast) user experience - I also like it because this approach does not overload the user with information; collections of items don't show much details, on the other hand, individual resources may show a lot of detail.
The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view... Sometimes the reason why they want to do that is because they didn't design their tables correctly; maybe the tables which they use to generate list views don't contain enough columns/detail to be useful on their own so they feel forced to do joins. I find that drawing ER diagrams helps a lot with that. It's very important to get the cardinality of relationships between the different tables exactly right. Also, I find it very helpful to represent any many-to-many relation between two tables as a distinct table.
You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.
> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view
This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.
It's clear you have never work with a lot of data.
> The real reason people use joins is because they want to pack a lot of details onto the user's screen
I hate this illusion that web programming is the whole of software development.
For example audit logs get no FKs, when a delete happens the audit logs about the deletion shouldn't be deleted.
I always FK a large table (millions or more rows) to a small table (tens to hundreds of rows).
But I will pause and ask hard questions about FK a large table to a large table... will this impact migrations? Do I need this FK? Is data integrity at risk without this FK even assuming a buggy app? Does the app utilise this FK for queries, or is there zero performance benefit from having the FK? If I don't have the FK are both tables always queryable by a PK? Should I have an index, potentially a UNIQUE index, in place of a FK?
Like most things... it depends. A dogmatic insistence on using them everywhere isn't always healthy, and the inverse is true that an avoidance everywhere isn't healthy.
The DB is there to store data and make it available, whilst enforcing data integrity... if it makes sense to use a FK to achieve those things do it, otherwise don't.
FK based on the possible size of a table rather than the current size of the table.
FKs are incredibly performance on a near-empty local dev database ;)
on delete set null
?The parent's point is that the choice should depdend on the situation. Sometimes a foreign key is appropriate, sometimes it is not.
Or updated. Or prevent the deletion.
But that’s configurable anyway.
I’ve never come across a scenario yet where this wasn’t sound advice.
I tend to lean heavily on my DB as well. E.g. I tend to push all state down to the DB and out of the application. I work in environments where it’s common for developers to want to disable FK constraints, and i temporarily do sometimes during specific bulk operations during releases for example. The usual reasons others will suggest relaxing FK constraints permanently will be due to the need for audit logs or soft deletes but i have patterns for these too.
Foreign key constraints are pretty awesome, all databases I’ve worked in so far have escape hatches for when they hurt too much.
Still, in my experience, database normalization has seemed like less of a performance impediment than queries written without considering an execution plan and indexes.
I like foreign keys, check constraints and tight data types. Might as well constrain it and limit the scope for errors.
Application programmers can write some buggy code. The DB should provide a line of defence.
If you don’t have foreign keys that should be a design choice with a legitimate reason the entity can become orphaned.
You need a definition of what that means in real life. E.g PERSONID id 1012 and there is no associated record. This kinda means you need to look at another entity to know what the first entity “means”.
This might be useful for data that needs to sit in distributed databases.
I've found that proponents of this usually either don't fully understand the feature in question or made some major mistake in the implementation, which in turn causes problems that manifest themselves when they try to use said feature.
You also need to index your FKs so that the database does not have to do a full table search before you can delete a row. This is often overlooked.
Your friend can do his data migrations without FKs and create them afterwards. This is quite a common procedure.
There are a few different schools of thought. I will list them, but the important thing to remember is not to be dogmatic. They are all right or wrong depending on your circumstance.
One school of thought says "I want all data in my DB to be normalized. I want it to be right when it goes in so it never breaks the application layer." That school would say foreign key constraints are critical.
Another school of thought says "I want all the data in my DB to be retrieved and inserted quickly. I want the application layer to do any error-checking that is necessary, or, I want to be in a situation where I can always fail gracefully if there's errors in data validation."
Still another school of thought says "I don't trust those programmers to write good application code, so I will insist on normalized data for that reason," and yet another says, "I don't have control over the DB schema, that's some DBA's job, so I will just do all my validation in-app."
The point of this being, there's tradeoffs either way you go.
Personally, I typically would rather have the application layer do the validation and even the joining of data, a lot of the time, when the application is high-volume. At the volumes my organization works with, it is very difficult to write performant SQL queries that use JOINs and other relationships as a developer - even as a DBA! - and often much easier, for me, to write performant application code. The DB is also a pet with many owners, whereas the infra for my applications is owned by my team. So, it's better for me to do relationship validation in code myself. (We also do not use a heavy ORM, again for performance reasons. Just Dapper.)
At my previous job, the situation was the opposite - we weren't under such load at most times that it mattered if the queries were performant, we had Entity Framework building relationship, and EF will blow up if you ask it to build relationships where none exist. So, we needed more normalized data, and that was what we went for. But even then, not in every situation.
How can this possibly be true? Won't that result in sending unnecessary data over the wire, stressing network and SQL buffer?
What are these queries and what are these volumes? I just can't wrap my head around the performance statement. You know better join algorithms that SQL Server is capable of (Loop/Hash/Merge)? Given that you have appropriate indexes in place, perhaps query hints is what you need to control sql plan guides if you know a thing or two about your data and it's distribution more than the sql statistics.
Also some ORMs write dreadful SQL where it comes to joins
A badly written join (or collection of joins) will take a longer time to run that will, when the system is under load, backlog other queries. If these errant queries make up a significant portion of your queries then it will hit performance significantly
It's not the joins themselves just the incorrect use of them
Of course each scenario is different, YMMV, and as always "it depends".
Leaving aside my initial snark reactions¹ as they are not really relevant.
What you say may be true if the data is not arranged in a manner conducive to efficient queries of the type you are trying to make (for instance if the DB was optimised for a different sort of output because the needs were (or were expected to be) different at design time. BUT, read performance is not relevant to foreign keys. A constraint is assessed as INSERT/UPDATE/other time to maintain referential integrity and has no effect on later reads. You can do your own linking in the application if you want, but I'm keeping my foreign keys to stop bad data getting in - they won't affect your process of getting data out either way.
Also note that a foreign key does not imply an index exists in most DBMSs⁴ so if you are expecting the constraint to help performance when referring to a table from its parent then you may be disappointed. An index will exist where the key is referring to as FKs will always refer to a primary key or unique index but the other side is not usually indexed unless you explicitly ask for it to be. I've seen a few people run into this trap, expecting an index to be there because an FK constraint is, and coming to the conclusion that JOINs are just slow because one isn't so their queries that would benefit from it are slow.
----
[1] Sorry, not a good enough person: “sounds like you need a better DBA!”²
[2] Well, a better database developer. Even the best can't get good performance from an inappropriate design. Or maybe a time machine, everyone who has worked with BDs long enough will have been stuck with bad or inappropriate design³ we have no power or time to fix…
[3] Possibly of our own making!
[4] Some DBs create one automatically, and some ORMs & other data manipulation libraries built on top do too. But it is generally not done because it is far from always necessary, so it could waste space, and you may want a compound index instead depending on other properties of the data ans desired outputs.
There is basically no way it is faster for "high-volume" systems to return excess data to the application rather than doing the joins on the dB and returning the record set to the application.
Even if we were talking about multiple billions of records you'd still be better off with a completely denormalized data warehouse style table and doing filtering against indexed columns db-side before sending vast quantities of data to the application.
I think people think this way because of licensing and the specialized nature of DBAs.
But when you say "return excess data to the application" I'm not sure what you mean. Not doing lots of complex JOINs doesn't mean not filtering the queries at the DB at all. Nobody is pulling back a billion records at a time.
Here's an example of what I'm talking about - read the first comment on https://www.brentozar.com/archive/2015/05/do-foreign-keys-ma..., another venue for this same debate.
The small number of people with high enough scale that they can't use them know who they are, the rest of us need to think carefully when performing database migrations and reason out the order of operations required to maintain data integrity (sounds like a good idea anyway?).
Heck, yes. If you are butting against constraints then you may have misunderstood the problem.
I don't mind people turning constrains off for a mass migration under the following conditions:
• This is not production, or if it is production you are in a maintenance window during which you have exclusive access to this DB
• The data you are piling in will be verified against the constraints once the job is complete (no turning things back on with “WITH NOCHECK”), and you have a rollback plan for if that fails (“restore from backup take before the maintenance window” might be acceptable, if that can be done in the timeframe of the maintenance window and you are happy taking questions from your team/clients/management if this means planned updates have to be completely postponed).
• You can explain why not getting the data to be modified in an order that allows the constraints to be kept on through the process would be significantly more complicated (simplifying migration code is a valid reason for temporarily turning off constraints if it makes maintaining the relevant code less error prone) or significantly slower (sometimes doing it most right unavoidably takes more time) or both.
When building systems for scale where the databases may grow large, foreign keys can cause many issues -
- ORM features around foreign keys can easily bring your system down when joining large tables with incorrect/missing indexes during heavy loads
- As the table grows, not having foreign keys makes it simple in taking out large tables into big-data solutions in the future
- The schemas and relations are sometimes hard to understand during the initial phases of application development. Not having those relations makes changing schemas simpler and faster.
- Sharding tables is much simpler when there are no foreign keys
- It helps to add some of the reference logic in the application rather than the database. Databases are the bottlenecks when it comes to IOPS and scaling. The more processing you move to your application server, the better scalability you can achieve.
In general it depends on whether you intend to use the database to drive a system, or whether you intend to use the data for reporting and data analysis.
Inconvenience in migrations is not a legit reason, if your migration would fail if you had FKs but doesn't because you don't you just broke your data, the errors are there to protect you. In other words your colleague is straight up wrong.
All my issues with databases usually stem from the fact that someone did not use enough checks and we got dirty data nobody knows what to do with.
My only non-conventional usage of database schemas is text field lengths. I have rule: it's either 20, 200 or 2000. It should be big enough to fit any non-insane value. Like 200 should be enough to fit a phone number. The main point is not to let absurdly broken data in.
I have seen over and over firsthand the kinds of data integrity problems that come from leaving the decision to the business software and those who meddle asynchronously with data.
You can always rewrite software. Rewriting bad data is not only difficult but often impossible.
"You can always rewrite software. Rewriting bad data is not only difficult but often impossible."
Isn't that an argument against FKs? It's easier to rewrite the software to handle FKs than to deal with trying to setup FKs with bad data since it's difficult to fix.
> Isn't that an argument against FKs?
No, it's an argument for FK checks.
> It's easier to rewrite the software to handle FKs
Except once you have that bad data, you don't know what to do to fix it in whatever language your app is coded in any more than you do than in SQL. Once you know that, you can just as well do it in SQL as in any other language. Or, if it's just that you know that language better than the SQL language, by all means write a separate one-time fix-the-data app in that language, run it once to fix the data...
And then enable the FK checks so you won't have to do it again.
> than to deal with trying to setup FKs with bad data since it's difficult to fix.
Which is why you want FK checks in your DB from the beginning.
[Edit: Fix bad original editing.]
Well, one might ask, what is the point of having foreign keys if they are disabled? And the answer is, there are several benefits. Here are a couple:
1. foreign keys, disabled or not, create a record of your data design that itself serves as documentation, and that can be programmatically queried, extracted, copied, modified etc, all of which reduces technical debt and is useful for other users or developers to understand your schema and work more efficiently.
2. You still have the option to enable the foreign keys. Furthermore, you can pass this job over to someone in your organization who may not have authority to create or modify foreign keys, but does have the authority to enable them and/or to fix the data however required to enable them
Entire generations of application may rise and fall. New languages, frameworks, developers all lead to rot over time. Heck, some legacy projects the application code is incomplete or lost.
But the database doesn't rot. Show me a database that is 20 years old, and it is as fresh as the day it was created. If it has FK constraints, it's even healthier.
That means that the more value is embedded in the database - FK and of course many more constraints - that value will live for decades.
Commenting just to draw attention to your comment.
It sounds like your colleagues want NoSQL (key-value) given their no JOINs policy.
This example makes it pretty clear: https://en.wikipedia.org/wiki/Finitary_relation#Example
I always thought it was referring to foreign keys too. Pretty bad name in hindsight.
But some things like adding a column with default value is still hard.
https://www.braintreepayments.com/blog/safe-operations-for-h... this article has a good guideline on this.
Why? Fks don’t require unicity. Or do you mean that you target non-unique columns? That sounds… horrendous.
But regardless sounds like something you’d want to resolve before declaring the migration finished.
It's not difficult; it merely requires that you think about the order of the migration operations.
> He'd rather have a smooth data migration process than having an unexpected error
This should be grounds for dismissal. Data integrity problems should be dealt with immediately instead of making it someone else's problem later. He's just kicking that "unexpected error" down the road to some poor soul that will spend weeks trying to figure out how the data got so messed up.
If you don't care about data integrity then use a different data storage solution.
There are any number of tools that will generate me a pretty and useful database schema diagram if I point them at a relational database. This is incredibly handy when you're new to a database and need to figure out which tables to query and update, and which (gasp) sprocs you need to call. I've been on projects where people have been poking around in the dark, and then a good database diagram has saved us days or weeks of effort trying to figure out how to make something work. As I say, there are plenty of tools capable of generating one of these in seconds or minutes[0].
However, if you don't have foreign keys, the utility of such a tool is severely diminished because you just get a big pile of nodes representing tables clustered at the top or bottom of your diagram (depending on exactly which layout algorithm is being used and how it's been configured).
[0] Many years ago I and three colleagues built one of these: Redgate's SQL Dependency Tracker (https://www.red-gate.com/products/sql-development/sql-depend...). It was pretty neat because you could build a diagram spanning databases, or even linked servers, and unlike most other tools at the time it could handle thousands and thousands of database objects, but the product name doesn't really help get across that it's fundamentally a diagramming tool. I built the dependency engine, the graph, and radar views. We used yFiles from yWorks for the graph layout calculations, with a bit of extra hackery, but I seem to remember yFiles lost compatibility with a newer version of .NET at some point so (or something along those lines) so RG ended up swapping it out for something else.
I'd even go so far as to design database schema with these ER diagram tools in mind: if the automatic diagram is messy then that's more-often-than-not a code-smell in need of refactoring (before being hit with production data).
>He rather have a smooth data migration process than having an unexpected error and abort the whole operation
That's not a good reason. It doesn't sound like he's making an educated decision based on context-- it sounds like he's sacrificing data consistency to make his own work easier. If that's the case, perhaps he should do away with all integrity constraints (primary keys, checks, et al). Then, he could enjoy a "smooth process" for adding new data, too. No more pesky errors-- just blissful, silent corruption.
>To be fair, as far as I know, he never had major data problems.
Yet, anyway. He has intentionally limited his visibility over at least one kind of problem (invalid references). His problems might not be clear until they start causing noticeable issues elsewhere.
>He is not the only one I've met who insisted on not having FK. I've even seen large systems prohibit using JOIN statements.
Allergy to the relational model. Many such cases!
In a strict relational model implementation, the only way to reify that is by the FK of the Order in the Line Item, but in some other implementation, say a generic programming language, the Line Items might be an array or similar data structure that is part of the the Order, and the programming language implementation keeps track of the pointers or address offsets or whatever detail it cares about.
Anyway that's all beside the point.
In my experience, if the IDs are not autogenerated by the relational system, then it's relatively easy to migrate, however when data is full of IDs defined via some AUTO_INCREMENT behavior then migrations become an awful mess and any system (and this happens more frequently than you might expect) where a specific ID starts to have semantic meaning (oh, the customer's ID is 387437. whoops) then all bets are off and you might as well just give up and accept that your auto-generated IDs are now fixed for all time and can't be changed.
Oh and just to add, I have notebooks from previous employers where I have written THE Important IDs to Know, which started life as auto-generated numbers but which now are enshrined and encased in acrylic to the extent that years later they are important tribal knowledge.
I think trying to manage consistency in your application is probably a bad idea.
FK maintains your database integrity. If you care about data, just use it !
In SQLite it's a pragma, in Postgres you turn triggers off, those are the ones I've personally done but surely any relational database has this ability for this specific reason?
There _is_ a scale, though. I have worked in enterprises where the database is the "contract" and where the business logic is implemented in the database, with triggers and all sorts of constraints and what not, in addition to foreign key constraints. But I now mostly work in service- or micro-service-oriented places where the "contract" is a REST(-ish) API, business logic is implemented in Java, and the database is exclusively owned by that one code-base, where not all business logic is replicated from the Java code into the database (still foreign keys, though!)
...Of course, if nobody understands relational databases at the company, then that won't help...
Even if data migration was difficult with foreign keys for some reason, they can be disabled at the time of migration - no need to put weird constraints on day-to-day operations.
I see them the same as stored procedures. If you use them, you better use them everywhere and have all your data consistency model in the fk and stored procedures. Also, figure out how to disable them during the more interesting online schema migrations. If you can't commit to that, they'll only cause issues down the line.
As an alternative you can use Foreign keys without constraints. This way you get the conveniences of them without the migration problems. You can do this permanently (a bit wilder) or just before an import, and re-enable it later.
Your data migration aborted on an unexpected error? Well in my view that's a problem and you need to figure out why that is happening and fix it.
But hey, you can always drop the constraint when you decide it's not worth it, and you can always add it back again after. Depends on the application too, sometimes it just doesn't matter if you make a mess.
I'm remembering the time I was working at a place that had a huge number of Microsoft Access, Microsoft SQL Server and mysql databases and I was the first person they'd hired who knew how to do joins and they thought it was pretty scary.
He's just kicking bugs down the road. That's not engineering.
If you have no FK, the day will come that your data corrupts. Tiny application bug, wrong manual data fix, incomplete datamodel communication. It's a case of when, not if. And when it happens, it can fester for weeks or months, corrupting all kinds of data in unfixable ways.
If you have FK, you can go all-in on relational. The stuff is so powerfull, you'd be dumb not to. But then comes the too-smart-optimizer problem. Just like with compilers and opengl shaders, the tiniest change might make you fall off the optimizer's preferred path and get a way-to-slow version. Even when no code has changed, a minor DB version or even a tiny shift in statistics will kill you.
Personally, I tend to consider integrity more important than performance, but both have a zone where they are good enough to make trading off worthwile.
So I do FKs and joins, and count on rigorous testing and monitoring to keep things in check. Someone else might decide otherwise. Both strategies require you to do the unsexy part of the wirk, or a harsh punishment will follow.
The issue with managing the relationship just in code is if you ship a bug to break the relationship, you now have to manually fix your data, and if you want to find out when or where the bug was introduced, you're looking at commit history instead of a migration history. Same thing when it comes to making manual updates or adds in the db. Even if it's just on a dev stage, if your code makes an assumption about the constraint which isn't true, you can end up with bugs or exceptions on dev, which is also annoying. If you want to remove the assumption of the relationship from the code entirely, that would be more understandable, but not if instead it means replacing what would be an efficient constraint and join with a separate query.
In most cases, a well designed database covers a multitude of sins and makes life for future you and your future team much easier. Some of the biggest dumpster fires of code I have seen started as a smoldering dumpster from terrible db choices that were made far harder to fix when it had been in production for a few years.
If someone put a gun to a developer's head and threatened to pull the trigger if the database was crap, sadly they would be forced to pull the trigger most of the time.
that's some take, because if that "unexpected error" is not reported because FK integrity was turned off, that means the data containing integrity errors goes right in. Now your database is corrupt. Dealing with a DB where random rows here and there are not conforming to implied-only constraints is zero fun. in my own experience, things like the main page of the production site is a 500 error, with a stack trace deep into some logic nobody has looked at in two years, run the identical code on staging, works fine. Fun stuff! Seems like an odd choice to let errors like that stream right into your production DB without checking.
"It's never gone wrong yet" is a good way to get yourself electrocuted when you're changing a light bulb.
Mainly to cascade delete though, since we were robust enough to insert proper data.
This was spurred on in new projects because in a very very large older project with many many tables we had problems with orphaned records.
(PostgreSQL)
One day, I decided let try this out. And turning out it wasy easiser than I think. When delete data just make sure to use cascade or ensure its associated data is deleted, which make sense.
Once I embrace it, FK now becomes my friend to enter data consistency, I cannot imagine how did I live without it before.
Specifically I hadn't thought through an Order model and the OrderLines ended up being dependent directly on the product meaning through an FK. The orders wouldn't "settle" once they had been completed, since the product could be updated and change the values of the orderline and order. Dumb dumb. It was one of the cases where denormalizing data, very much, makes sense
We do not use them for write performance concerns.
There's plenty of nice features of the DB we're not allowed to use under the excuse of "performance". But I'm told this by people who live and breathe SQL, so I trust them and I hope they have evidence to back it up. Because a lot of these features we're not allowed to use would make our lives 100x easier if we could!
In this talk, we start at 10,000 transactions per second, and just by altering the design we get to 20,000 transactions per second... all on a 5 year old laptop.
And at no time did we ditch any constraints (primary or foreign). The claim that you can't get performance with constraints on a database is a myth
A) The world is full of idiots.
> I've even seen large systems prohibit using JOIN statements.
B 1) WTF is wrong with these people? What do they think an RDBMS is for?!?
B 2) See above, exhibit A.
Then it's up to all your apps to guarantee that they fill those fields in correctly, and do the reference check when inserting or updating rows in the dependent table.
You also need to write checking programs that report foreign key errors on a timetable that suits (every minute, every hour, every day, whattever).
Report the errors to someone who understands the significance of them, and can fix them.
Performance can be another reason for skipping them but modern dbs are pretty good.
In fact, Snowflake really really doesn't care to the extent that I've seen non unique primary keys in tables.
NUTS!
> I suspect the errors might be mainly caused by not considering data integrity at all at the first place
Spot on.
----
I deal with interface with MANY ERP-like software with as many "database designs" as you can't imagine, and have face the fun of interface with many system in my long career.
Among DOZENS only 2 major packages have zero issues and are the only ones with competent schema designs.
Let me tell you how bad is it:
One of my interfaces is with a package that, somewhat, manage to mix all the ways to be wrong about RDBMS:
- MySql, not a good start
- Only Strings and Ints datatypes, and the Ints datatypes are not many
- This means ALL data is suspected (dates, money, and yes, strings too)
- The tables are named `UNCC_00001, UNCC_00002...`
- The fields are named `FT_0001, FT_0002...` and obviously the first is the PK. This pk is purely advisory, other fields can be part of the PK but you can't know looking at the schema
- NO FK at all, so ALL TABLES have integrity issues.
- NO indexes, because why make joins or lookups faster?
- Weird normalization, like yes, no, maybe?
- And no option to let me add some sanity to the DB, because the database is (in my niches) "property" of the software makers so can't be touched, only queried.
And not tell the rest, because this is only the query side, the CRUD side is nuts, and not wanna remember much about it.
---
RDBMS is one of the most simplest ways, to get FOR FREE, and minimal effort, a sane, well behaved and performant system.
Refuse to use them is NUTS. Refuse to use the abilities they have is NUTS.
I only have ONE actual company (in +20 years) with a real reason to go NoSQL, and that only was for a fraction of the thing (in fact, the data was mirrored to a RDBMS for reporting!). And maybe that was true 5-10 years ago, today I think modern SQL engines catchup more and more on the scenarios (Today I have used timescale just for speed up log processing, and was so much nicer that deal with weird log storage engines).
Personally, I would prefer errors to become immediately obvious, while I'm active and there's a roll-back ready to be applied.
Dear Database Architect,
Directive 595 Part 2 is as follows.
"Foreign and Primary Key constraints give lack of flexibility, more
costly evolution, inhibit the use of the database acting as a
service to applications and make it an inhibitor to evolution."
As such, please remove from all production databases.Sincerely, Chief Architect Gerald
I create all fields as NOT NULL and use empty string in place of NULL.
Last time I tried foreign key constraints can't work in an environment like this.
...but why?
Say, if I want to check how many records I don't have value for "ref", I don't want the count(*) query to show
count(*) ref
12000 (null)
17030 ""
I want both added together. That's for example one simple reason out of many others.Referential integrity saves hours of pain from weird DB issues down the line.
I also interact with VAX applications that use Rdb (now owned by Oracle) that avoid constraints of all types for performance reasons.
I mean, there can be reasons why not, but why use an RDBMS at all if you’re not going to take advantage of its features?
Join is simply an index scan. And is guaranteed to be cheaper then the ORM having to send many request to the server.
Everyone here seems to be super diligent about FKs. I wonder if that's a sampling bias.
I've worked on a few projects where they were considered an unnecessary hassle, especially when the RDMS had some performance or functionality limitations when using them. Also "on delete cascade" seems scary. Plenty of systems even just set deleted=true instead of actually deleting (at least before GDPR).
Not every CRUD app treats data integrity as a holy grail. A social network for hamsters can lose a comment, no big deal. Some applications threat databases as a bag of key-value pairs, and the inconvenience of migrations ends up with a JSON in an "everything_else" column. Move fast and break relational integrity.
Good, that's the correct reaction.
> I feel better having read this thread!
Don't.
Though the answers would probably vary and there's most likely lots of nuance per individual case (which might matter more than just yes/no), personally I can think of the following as examples:
- yes, we use foreign keys
- yes, but we use them in testing environments and turn them off in prod
- no, we don't use them because our database doesn't support them (e.g. distributed like TiDB)
- no, we don't use them and check integrity and orphaned data ourselves
- no, because our system design doesn't allow us to use them meaningfully (e.g. OTLT and EAV)
(also, talking about whether to cascade or not might be useful, e.g. whether you want to manually clean up related data, or not)
Someone else mentioned varying schools of thought, which rings true. Personally, my opinions about database design in general are along the lines of: - avoid EAV and OTLT outside of very specific cases, have multiple tables over few (e.g. employees, employee_contact_information, employee_vacations, employee_notes instead of employee_fields and/or employee_field_values)
- have foreign key constraints across your tables, so that you might not end up with orphaned data, *consider* cascading the constraints (depends on requirements)
- use views liberally, especially for complex requirements in regards to selecting data, so that your app (or ORM in it) can map against it in a simple manner
- outside of batch operations, prefer to modify data through the app, instead of procedural SQL, since that's easier to debug; I'm yet to see someone use breakpoints/watches for stored procedures successfully
Though my ideal database design probably looks way different and scales slightly differently (which hasn't mattered as much yet) than someone else's.There are people who want to build their entire database around a "classifier" system, about which I wrote previously here: https://news.ycombinator.com/item?id=32416093 (this also makes the DB hard to visualize as ER diagram because of meaningless links, and sometimes makes the DB hard to use without the app, e.g. type_enum_value vs table_name).
There are people who want to do everything in procedural SQL (I've seen application views call stored procedures to fetch all data and validate forms), there are those who don't want to touch it with a 10 foot pole.
It really varies a lot, though in my experience it's invaluable to be able to feed a database into something like DbVisualizer and get an overview about how the different tables are related to one another, basically like documentation: https://www.dbvis.com/