Second, “anti-pattern” is a very technical rating of this phenomenon. Business logic and its databases may contain data that may, may not, or can never be viewed as deletable, at the same time (and in the same table). Soft deletion isn’t a blanket rule. For example, you hard-delete generated records, cause they are a cached projection of higher level data. You may hard-delete drafts like you trash your notes, but you never hard-delete a signed contract/declaration.
That's right, I think it's really "soft deletion as a blanket rule" which is the anti-pattern; soft-deletion is one option which (IMO) is used too often without thinking about specifically what you need to achieve. If soft-deletion is used as a blanket rule, you're more likely to want to try and abstract it away via an ORM or similar, which tends to be fragile (I agree views aren't fragile, but they do add another layer of complexity in defining the relationship between the application logic and the schema). If soft deletion is chosen judiciously and represented explicitly in the business logic, it's less likely to cause problems (the "archived state" in the post is kind of an explicitly represented soft delete).
It all probably stems from a rule that as a developer you must never [force/allow anyone to] lose expensive input or make it hard to recover. So ORM and platform developers try to ensure that no one really deletes anything, as a presumably simplest solution. It’s okayish sometimes, but is a bad responsibilities design really. If data is valuable, then its owner is the most responsible by definition. So the actual responsibility should be moved there, with explicitness and reasonable safety nets where needed. Otherwise a developer has to get defensive on all fronts which comes with additional costs for both them and a user, for reasons not well defined.
People get so attached to ORMs with object tracking that they invent whole categories of blog articles to work around cases where a simpler abstraction would be less work.
* availability / recovery - soft deletes provide the best RPO/RTO in archival / lifecycle planning
* auditability / compliance - much easier to achieve with 1 system than 2 or 3 systems
* security - see above
You certainly can achieve these objectives with CDC / snapshotting / warehousing / archival practices, but the soft delete pattern has its place at the application layer in spite of performance which is only begrudgingly acknowledged in the article.
But I'd usually consider soft delete alongside this approach, as it always really depends on what you're doing and what your needs are - if you constantly query the dependent records joined to the entity you may or may not delete, then a deleted entity table means you now need to left join two tables when before you could inner join one table. So soft delete might be simpler.
But if that's a rare use case, then soft delete might be more complex depending on how many separate codepaths are querying the primary entity.
My next blog post should be called "It depends - avoiding the overly broad generalisations anti-pattern".
If you want to preserve history (not just the special case of deletion) you'd also need to move 'updated' entities as well.
The article isn't just pointing out that a 'deleted' column is a hassle, it's also pointing out it's insufficient for preserving history.
For example you might not care to record the 20 different names/birthdays a user changed but you might care to remember that the user existed.
Unless you automate it devs will have to remember to migrate both when making a change which adds some overhead, not a lot, but it's just something to consider here imo as some migrations (schema and/or data) can become nasty and complex
So, not the opposite of a pattern, but the opposite of good. There you have it.
Do you want to support reversible deletion in the business logic sense? Soft delete is a trivial way to do this.
Do you want to support business logic deletion in a normalised schema while retaining other records that relate to that entity for auditing requirements? Probably worth looking into soft delete first.
Of course at large entity counts, soft delete can impact performance, but that's usually a rather large entity count, and then you can start considering approaches like a delete log or denormalisation.
Afraid of throwing away data you worry you might need later but don't have an existing use case for right now? There are better ways to data hoard, and you should regularly analyse how often that hoarded data is actually accessed before your data lake turns into a data swamp.
This allows for undoing a soft delete and gets rid of soft deleted rows eventually.
— Greenspun's tenth rule of programming
Seriously, event sourcing is hard to do right, maybe soft delete is the simpler approach, it depends on what you're doing.
IMO, if entity might be resurrected/revived/“undeleted”, then it either cannot be killed/deleted, or more likely what you thought as “deletion” was something else (e.g. suspending, archiving, hiding, or putting into trash bin).
It’s better to model a lifecycle of such an entity as an FSM.
And yes, Event Sourcing might be harder in some respects, but it’s make things easier in others, as it makes soft-delete and tens of other design patterns redundant.
Soft delete is the only way to make this possible without horrible kludges.
Maybe a better recommendation is to give guidelines for implementing soft-delete?
A. Move deleted data to (an)other table(s): users, deleted_users
B. Read from a scope, view, or materialized view but update a raw table: deleted bool or deleted_at datetime
C. Sprinkle conditionals everywhere live data is desired: deleted bool or deleted_at datetime
There is no one "the way" for all use-cases.
If you keep it as mirrored schema you'll have to manage individual history table per table (high overhead), manage it on every migration and you'll run into problems sooner or later - you won't be able to migrate it correctly as historic records will be disconnected unlike ordinary ones.
We do hard deletes on most things, mainly due to legacy reasons, and almost every week we get a request to restore data that a user deleted but later realized they needed.
And quite often the user realizes this after a week or more, at which point the only option is for the user to ask their IT to restore the DB from backup so we can extract the data from there.
So adding soft deletes to the tables the users commonly do mistaken deletes from is something we're planning on doing.
I don't see the alternatives given in the article would work for us. For example few of our customers even have a data warehouse. Our current DB doesn't support temporal tables, though we are migrating to MSSQL which does, so that might be an option soon. Though unclear how well it works with 3-4 levels of child tables which would also need to be temporal, especially since we need to do hard deletes due to GDPR etc and we have customers who work 24/7 so regular downtime is not tolerated. And users will have active locks against these key tables, not sure how that'll work out with the schema changes needed for hard deletes.
I have never implemented this, but I feel like it would work well (including not having to specify a deleted_date IS NULL on every query)
For example, it could be the user deletes a customer entry in our system, the customer has contacts, and each contact has multiple contact methods say. There are many other child tables for a customer, like delivery addresses and official id numbers and so on, this was just an example.
And yes, user wants to use this data as before it was deleted. So has to go back into the same tables.
On the bright side, we don't have too many foreign keys to "associated data". For example, orders with that customer id would not change, just point to a non-existing customer. So after restore that all works fine.