Avoiding the soft delete anti-pattern (opens in new tab)

(cultured.systems)

61 pointsculturedsystems2y ago55 comments

55 comments

47 comments · 18 top-level

Reefersleep2y ago· 6 in thread

How about a separate, schema-wise identical "deleted_x" table that you "move" deleted entities to? Can't get much more explicit than that, and still enables whatever joins you'd like on historical deleted data.

EdwardDiego2y ago

It's an approach I've seen before that can work nicely - often when you want to retain records for auditing/compliance purposes that refer to a deleted entity.

But I'd usually consider soft delete alongside this approach, as it always really depends on what you're doing and what your needs are - if you constantly query the dependent records joined to the entity you may or may not delete, then a deleted entity table means you now need to left join two tables when before you could inner join one table. So soft delete might be simpler.

But if that's a rare use case, then soft delete might be more complex depending on how many separate codepaths are querying the primary entity.

My next blog post should be called "It depends - avoiding the overly broad generalisations anti-pattern".

viraptor2y ago

If you're using foreign keys, that becomes quite tricky. Not impossible, but you're dealing with way more than a single record being moved and may need to duplicate some references between live and deleted data.

mrkeen2y ago

> Can't get much more explicit than that

If you want to preserve history (not just the special case of deletion) you'd also need to move 'updated' entities as well.

The article isn't just pointing out that a 'deleted' column is a hassle, it's also pointing out it's insufficient for preserving history.

afiori2y ago

These options are ways to preserve entities not histories.

For example you might not care to record the 20 different names/birthdays a user changed but you might care to remember that the user existed.

novariation2y ago

That was my thought as well, I believe it's close to the "let the data warehouse sort it out" solution: make the current state separate from the "history" - in their example this supposes a data warehouse, but it could just be separate tables or databases.

janee2y ago

There is the downside of having to maintain both schemas now.

Unless you automate it devs will have to remember to migrate both when making a change which adds some overhead, not a lot, but it's just something to consider here imo as some migrations (schema and/or data) can become nasty and complex

wruza2y ago· 5 in thread

It’s a great article exploring the idea, but the premise and arguments leading to it are somewhat weak, imo. First, views aren’t “fragile”. I may be wrong here, but it feels like TA tries to squeeze that along with some abstract-ORM issues.

Second, “anti-pattern” is a very technical rating of this phenomenon. Business logic and its databases may contain data that may, may not, or can never be viewed as deletable, at the same time (and in the same table). Soft deletion isn’t a blanket rule. For example, you hard-delete generated records, cause they are a cached projection of higher level data. You may hard-delete drafts like you trash your notes, but you never hard-delete a signed contract/declaration.

culturedsystemsOP2y ago

"Soft deletion isn’t a blanket rule"

That's right, I think it's really "soft deletion as a blanket rule" which is the anti-pattern; soft-deletion is one option which (IMO) is used too often without thinking about specifically what you need to achieve. If soft-deletion is used as a blanket rule, you're more likely to want to try and abstract it away via an ORM or similar, which tends to be fragile (I agree views aren't fragile, but they do add another layer of complexity in defining the relationship between the application logic and the schema). If soft deletion is chosen judiciously and represented explicitly in the business logic, it's less likely to cause problems (the "archived state" in the post is kind of an explicitly represented soft delete).

wruza2y ago

Yeah, I also think that it should be a part of business requirements rather than a purely technological decision that applies everywhere. A developer shouldn’t be asking “do we need soft deletion” in vacuum, because it’s a decision to be made higher up where workflows live.

It all probably stems from a rule that as a developer you must never [force/allow anyone to] lose expensive input or make it hard to recover. So ORM and platform developers try to ensure that no one really deletes anything, as a presumably simplest solution. It’s okayish sometimes, but is a bad responsibilities design really. If data is valuable, then its owner is the most responsible by definition. So the actual responsibility should be moved there, with explicitness and reasonable safety nets where needed. Otherwise a developer has to get defensive on all fronts which comes with additional costs for both them and a user, for reasons not well defined.

panstromek2y ago

Views are fragile in a sense that they don't update when the table that they depend on change and generally, you don't have any notification when you break one (e.g. by dropping a column that the view selects). I bumped into both of these in MySQL and SQLite. It looks to me like a view acts just like a cached sql string and not much more than that.

to11mtm2y ago

> abstract-ORM issues

People get so attached to ORMs with object tracking that they invent whole categories of blog articles to work around cases where a simpler abstraction would be less work.

hn927268192y ago

I don't think the author is saying that views aren't fragile. Requiring every view and orm dao to have that annotation is fragile because you might forget it.

kthejoker22y ago· 3 in thread

The main problem I have is the article takes a performance/devlopment lens to soft deletes, and only pays lip service to the objectives you're trading off performance for with soft deletes ... namely data retention / disaster recovery / audit requirements.

* availability / recovery - soft deletes provide the best RPO/RTO in archival / lifecycle planning

* auditability / compliance - much easier to achieve with 1 system than 2 or 3 systems

* security - see above

You certainly can achieve these objectives with CDC / snapshotting / warehousing / archival practices, but the soft delete pattern has its place at the application layer in spite of performance which is only begrudgingly acknowledged in the article.

darkwater2y ago

Well, compliance (as in GDPR) might actually force you to not soft-delete in some cases.

EdwardDiego2y ago

Yeah, before breaking out Debezium, Kafka Connect and S3, consider soft delete first. It might not scale, but maybe you don't need to scale just yet, and maybe a column called is_deleted is far more appropriate and far less complex for your current purposes.

oxidant2y ago

IMO deleted_at, either with null or epoch as the "not deleted" value is nicer than is_deleted because it allows some level of auditing.

1 more reply

000ooo0002y ago· 3 in thread

So we call approaches 'anti-patterns' now if they aren't universally suitable?

EdwardDiego2y ago

The calling things anti-pattern anti-pattern... ...considered harmful. Had to.

Isamu2y ago

There was the software “pattern” hype back in the 90’s, people began to think of the word “pattern” as meaning “good”. So there was some discussion of what to call a bad pattern, and people decided they liked “anti-pattern”.

So, not the opposite of a pattern, but the opposite of good. There you have it.

smegsicle2y ago

anti patterns considered harmful

GMoromisato2y ago· 3 in thread

Isn't soft-delete just a variant of having a lifecycle? The article tries to distinguish it by saying that the lifecycle pattern is implemented at the app-layer instead of the database layer, but isn't their criticism of soft-delete that the app-layer has to deal with it?

Maybe a better recommendation is to give guidelines for implementing soft-delete?

AdrianB12y ago

I have the same struggle figuring what the author wants to say about this. The difference between a soft delete and a lifecycle, in the context of the article, is just semantic.

schwede2y ago

I think the idea is you could still have a page in the UI that shows archived data but you wouldn’t do that normally with a soft delete. The lifecycle thing makes it up front with the user instead of hiding soft delete as an implementation detail.

meatmanek2y ago

As I understand that section: if the data you're dealing with already has a need for some sort of lifecycle that's fairly prominent in your data model (i.e. where you're likely to be filtering based on the lifecycle column in all your queries anyway), then adding a lifecycle state for "deleted" is perfectly fine.

mirekrusin2y ago· 3 in thread

Just add on delete trigger to store whole row as json in generic archive table and move on.

AdrianB12y ago

I would never do that. A delete trigger to a generic archive table that stores the row as a row, this is what I would do. JSON in a RDBMS is something that can be done, but rarely should be done. Why keep it in the database and not as a file on disk (in the filesystem), otherwise?

mirekrusin2y ago

You want to disconnect it from fk constraints and schema because you don't want to manage schema evolution on deleted history.

If you keep it as mirrored schema you'll have to manage individual history table per table (high overhead), manage it on every migration and you'll run into problems sooner or later - you won't be able to migrate it correctly as historic records will be disconnected unlike ordinary ones.

aidos2y ago

How do you store it as a row in your case? JSON offers a very reasonable trade off; it works forever over all tables in spite of schema changes.

1 more reply

nivertech2y ago· 2 in thread

“Soft-Delete pattern (deleted_at column) or any other pattern adding $event_at column to a DB table, contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Event Sourcing.”

— Greenspun's tenth rule of programming

EdwardDiego2y ago

As opposed to event sourcing, which every time I've seen it in use, contains a bug ridden slow implementation of event sourcing.

Seriously, event sourcing is hard to do right, maybe soft delete is the simpler approach, it depends on what you're doing.

nivertech2y ago

The problem is that people’s first choice is to use generic models like CRUD/CRUD-L/REST/etc., instead of unvesting a bit of time in upfront thinking/design and building a custom model.

IMO, if entity might be resurrected/revived/“undeleted”, then it either cannot be killed/deleted, or more likely what you thought as “deletion” was something else (e.g. suspending, archiving, hiding, or putting into trash bin).

It’s better to model a lifecycle of such an entity as an FSM.

And yes, Event Sourcing might be harder in some respects, but it’s make things easier in others, as it makes soft-delete and tens of other design patterns redundant.

1 more reply

magicalhippo2y ago· 2 in thread

We make a B2B application that's installed on-prem for a lot of customers.

We do hard deletes on most things, mainly due to legacy reasons, and almost every week we get a request to restore data that a user deleted but later realized they needed.

And quite often the user realizes this after a week or more, at which point the only option is for the user to ask their IT to restore the DB from backup so we can extract the data from there.

So adding soft deletes to the tables the users commonly do mistaken deletes from is something we're planning on doing.

I don't see the alternatives given in the article would work for us. For example few of our customers even have a data warehouse. Our current DB doesn't support temporal tables, though we are migrating to MSSQL which does, so that might be an option soon. Though unclear how well it works with 3-4 levels of child tables which would also need to be temporal, especially since we need to do hard deletes due to GDPR etc and we have customers who work 24/7 so regular downtime is not tolerated. And users will have active locks against these key tables, not sure how that'll work out with the schema changes needed for hard deletes.

afiori2y ago

If the thing you want to do with the deleted data is mostly ad-hoc support queries (that is you are keeping the current workflow, not adding new UI and functionality for pervasive restores) then I feel like moving deleted entries to a new "shadow table" with the same schema (eg subscription might have a soft_deleted_subscription shadow) might work well for you.

I have never implemented this, but I feel like it would work well (including not having to specify a deleted_date IS NULL on every query)

magicalhippo2y ago

That is a possibility. But all our "soft-delete targets" have at least 2-3 levels of child tables, it's never just one table. So that complicates matters.

For example, it could be the user deletes a customer entry in our system, the customer has contacts, and each contact has multiple contact methods say. There are many other child tables for a customer, like delivery addresses and official id numbers and so on, this was just an example.

And yes, user wants to use this data as before it was deleted. So has to go back into the same tables.

On the bright side, we don't have too many foreign keys to "associated data". For example, orders with that customer id would not change, just point to a non-existing customer. So after restore that all works fine.

1 more reply

Scubabear682y ago· 1 in thread

I am surprised the article didn’t mention the obvious fix soft deletes potential performance issues - have a job run regularly that archives soft deleted data older than X units of time.

This allows for undoing a soft delete and gets rid of soft deleted rows eventually.

kadoban2y ago

Can't you just use a partition too?

minimeme2y ago· 1 in thread

But what about data overwrites? This is basically the same as deleting data, since information will be destroyed. Using soft delete is a somewhat naive solution if there is no mechanism for restoring overwritten data.

throwaway8943452y ago

Append-only tables are my preference. If you're worried about infinitely growing space, you can have some garbage collector clean up the least relevant records periodically for very little cost.

EdwardDiego2y ago

Sigh, as always in tech, the answer to "is soft delete appropriate" is - "it depends".

Do you want to support reversible deletion in the business logic sense? Soft delete is a trivial way to do this.

Do you want to support business logic deletion in a normalised schema while retaining other records that relate to that entity for auditing requirements? Probably worth looking into soft delete first.

Of course at large entity counts, soft delete can impact performance, but that's usually a rather large entity count, and then you can start considering approaches like a delete log or denormalisation.

Afraid of throwing away data you worry you might need later but don't have an existing use case for right now? There are better ways to data hoard, and you should regularly analyse how often that hoarded data is actually accessed before your data lake turns into a data swamp.

mikkom2y ago

Soft deletion is not an anti-pattern. In real software you need to have possibility to delete items but they still need to be exist in historical items because of analytics, historical data integrity etc.

Soft delete is the only way to make this possible without horrible kludges.

spyke1122y ago

No one ever got fired for implementing soft delete.

kylehotchkiss2y ago

I remember how easy it used to be to drop an entire firestore collection with one click. Yes, when deleting your production data is one click away (the delete button was right next to the filter button!) it’s very natural to be afraid. Thankfully Google has improved a lot of these interfaces with a deletion confirmation prompt but can you see where the fear originates?

banish-m42y ago

The choices are:

A. Move deleted data to (an)other table(s): users, deleted_users

B. Read from a scope, view, or materialized view but update a raw table: deleted bool or deleted_at datetime

C. Sprinkle conditionals everywhere live data is desired: deleted bool or deleted_at datetime

There is no one "the way" for all use-cases.

AdrianB12y ago

For some reasons I was involved a lot with databases in the past 5 years (more than usual) and I don't remember to meet soft-delete implemented anywhere, but lifecycle is used almost everywhere. I think that soft delete may be an indicator of bad design.

elliottkember2y ago

The lengths people will go to, to avoid using an ORM!

cynicalsecurity2y ago

The article is mostly nonsense.

j / k navigate · click thread line to collapse

55 comments

47 comments · 18 top-level

Reefersleep2y ago· 6 in thread

EdwardDiego2y ago

It's an approach I've seen before that can work nicely - often when you want to retain records for auditing/compliance purposes that refer to a deleted entity.

But if that's a rare use case, then soft delete might be more complex depending on how many separate codepaths are querying the primary entity.

My next blog post should be called "It depends - avoiding the overly broad generalisations anti-pattern".

viraptor2y ago

mrkeen2y ago

> Can't get much more explicit than that

If you want to preserve history (not just the special case of deletion) you'd also need to move 'updated' entities as well.

The article isn't just pointing out that a 'deleted' column is a hassle, it's also pointing out it's insufficient for preserving history.

afiori2y ago

These options are ways to preserve entities not histories.

For example you might not care to record the 20 different names/birthdays a user changed but you might care to remember that the user existed.

novariation2y ago

janee2y ago

There is the downside of having to maintain both schemas now.

wruza2y ago· 5 in thread

culturedsystemsOP2y ago

"Soft deletion isn’t a blanket rule"

wruza2y ago

panstromek2y ago

to11mtm2y ago

> abstract-ORM issues

People get so attached to ORMs with object tracking that they invent whole categories of blog articles to work around cases where a simpler abstraction would be less work.

hn927268192y ago

I don't think the author is saying that views aren't fragile. Requiring every view and orm dao to have that annotation is fragile because you might forget it.

kthejoker22y ago· 3 in thread

* availability / recovery - soft deletes provide the best RPO/RTO in archival / lifecycle planning

* auditability / compliance - much easier to achieve with 1 system than 2 or 3 systems

* security - see above

darkwater2y ago

Well, compliance (as in GDPR) might actually force you to not soft-delete in some cases.

EdwardDiego2y ago

oxidant2y ago

IMO deleted_at, either with null or epoch as the "not deleted" value is nicer than is_deleted because it allows some level of auditing.

1 more reply

000ooo0002y ago· 3 in thread

So we call approaches 'anti-patterns' now if they aren't universally suitable?

EdwardDiego2y ago

The calling things anti-pattern anti-pattern... ...considered harmful. Had to.

Isamu2y ago

So, not the opposite of a pattern, but the opposite of good. There you have it.

smegsicle2y ago

anti patterns considered harmful

GMoromisato2y ago· 3 in thread

Maybe a better recommendation is to give guidelines for implementing soft-delete?

AdrianB12y ago

I have the same struggle figuring what the author wants to say about this. The difference between a soft delete and a lifecycle, in the context of the article, is just semantic.

schwede2y ago

meatmanek2y ago

mirekrusin2y ago· 3 in thread

Just add on delete trigger to store whole row as json in generic archive table and move on.

AdrianB12y ago

mirekrusin2y ago

You want to disconnect it from fk constraints and schema because you don't want to manage schema evolution on deleted history.

aidos2y ago

How do you store it as a row in your case? JSON offers a very reasonable trade off; it works forever over all tables in spite of schema changes.

1 more reply

nivertech2y ago· 2 in thread

— Greenspun's tenth rule of programming

EdwardDiego2y ago

As opposed to event sourcing, which every time I've seen it in use, contains a bug ridden slow implementation of event sourcing.

Seriously, event sourcing is hard to do right, maybe soft delete is the simpler approach, it depends on what you're doing.

nivertech2y ago

The problem is that people’s first choice is to use generic models like CRUD/CRUD-L/REST/etc., instead of unvesting a bit of time in upfront thinking/design and building a custom model.

It’s better to model a lifecycle of such an entity as an FSM.

And yes, Event Sourcing might be harder in some respects, but it’s make things easier in others, as it makes soft-delete and tens of other design patterns redundant.

1 more reply

magicalhippo2y ago· 2 in thread

We make a B2B application that's installed on-prem for a lot of customers.

We do hard deletes on most things, mainly due to legacy reasons, and almost every week we get a request to restore data that a user deleted but later realized they needed.

And quite often the user realizes this after a week or more, at which point the only option is for the user to ask their IT to restore the DB from backup so we can extract the data from there.

So adding soft deletes to the tables the users commonly do mistaken deletes from is something we're planning on doing.

afiori2y ago

I have never implemented this, but I feel like it would work well (including not having to specify a deleted_date IS NULL on every query)

magicalhippo2y ago

That is a possibility. But all our "soft-delete targets" have at least 2-3 levels of child tables, it's never just one table. So that complicates matters.

And yes, user wants to use this data as before it was deleted. So has to go back into the same tables.

1 more reply

Scubabear682y ago· 1 in thread

I am surprised the article didn’t mention the obvious fix soft deletes potential performance issues - have a job run regularly that archives soft deleted data older than X units of time.

This allows for undoing a soft delete and gets rid of soft deleted rows eventually.

kadoban2y ago

Can't you just use a partition too?

minimeme2y ago· 1 in thread

throwaway8943452y ago

Append-only tables are my preference. If you're worried about infinitely growing space, you can have some garbage collector clean up the least relevant records periodically for very little cost.

EdwardDiego2y ago

Sigh, as always in tech, the answer to "is soft delete appropriate" is - "it depends".

Do you want to support reversible deletion in the business logic sense? Soft delete is a trivial way to do this.

mikkom2y ago

Soft delete is the only way to make this possible without horrible kludges.

spyke1122y ago

No one ever got fired for implementing soft delete.

kylehotchkiss2y ago

banish-m42y ago

The choices are:

A. Move deleted data to (an)other table(s): users, deleted_users

B. Read from a scope, view, or materialized view but update a raw table: deleted bool or deleted_at datetime

C. Sprinkle conditionals everywhere live data is desired: deleted bool or deleted_at datetime

There is no one "the way" for all use-cases.

AdrianB12y ago

elliottkember2y ago

The lengths people will go to, to avoid using an ORM!

cynicalsecurity2y ago

The article is mostly nonsense.

j / k navigate · click thread line to collapse