Soft deletion probably isn't worth it (opens in new tab)

(brandur.org)

654 pointslfittl3y ago494 comments

494 comments

241 comments · 108 top-level

JohnBooty3y ago· 48 in thread

I've been a software dev since the 90s and at this point, I've learned to basically do things like audit trails and soft deletion by default, unless there's some reason not to.

Somebody always wants to undelete something, or examine it to see why it was deleted, or see who changed something, or blah blah blah. It helps the business, it helps you as developer by giving you debug information as well as helping you to cover your ass when you are blamed for some data loss bug that was really user error.

Soft deletion has obvious drawbacks but is usually far less work than implementing equivalent functionality out-of-stream, with verbose logging or some such.

Retrofitting your app and adding soft deletion and audit trails after the fact is usually an order of magnitude more work. Can always add it pre-launch and leave it turned off.

If performance is a concern, this is usually something that can be mitigated. You can e.g. have a reaper job that runs daily and hard-deletes everything that was soft-deleted more than n days ago, or whatever.

gmiller1234563y ago

The author uses the "no one ever undeleted anything" as the primary justification. I think this is the part they miss. I've never undeleted a user either, but there have been many times I've gone back to look at something. Either a complaint finally gets around to me as to why the user wanted their account deleted (e.g. feature not working) and it helps to figure out why. Or they're returning and want things set up like they were. Or someone is taking over their roll and needs to be set up like the last person who's already gone.

Though you really shouldn't be relying on a database for an audit trail. It might help find some issues, but things actually used for security shouldn't be writable so easily.

JohnBooty3y ago

    I think this is the part they miss. I've never 
    undeleted a user either, but there have been many 
    times I've gone back to look at something.

Yeah. As far as a user-facing "Undelete" button existing or being used... that's very rare in my experience.

What's much more common is a user accidentally deletes some data. They deny they made an error. The developers are blamed. You then have to go on a wild goose chase figuring out if it was possible for the app to actually screw up in that way. There's usually no definitive answer, and even if there is, management can't understand it. And regardless of how any of that plays out, you still probably have to try and recover the data from backups or something.

Alternately, maybe it was the app's fault. Still plays out nearly the same!

Soft deletes and/or audit trails save you from all of that.

    Though you really shouldn't be relying on a 
    database for an audit trail. It might help 
    find some issues, but things actually used 
    for security shouldn't be writable so easily.

I mean, at some level you need to trust the database right?

Been ages since I did it, but it's usually possible to set up a "secure" audit trail with use of database permissions. For example, the application's DB credentials can have SELECT and INSERT permissions on the audit trail table, but no UPDATE or DELETE perms.

How would you set up a secure audit trail that didn't rely on the application and/or database at some level? Even if it lives outside of the database, that data came from the database.

Not a rhetorical question. Genuinely curious!

5 more replies

coldtea3y ago

>I've never undeleted a user either, but there have been many times I've gone back to look at something.

I, for one, have undeleted things tons of times, taking them of the trash can before emptying it, undoing the delete action (in apps where this is possible), and so on.

Akronymus3y ago

I messed up a mass update query enough times to leave myself SOME provision to undo it.

The exceptions are when there is a well tested query that affects a single account or something. Like GDPR

pustan3y ago

>The author uses the "no one ever undeleted anything" as the primary justification. I think this is the part they miss.

But did they though?

"Although I’ve never seen an undelete work in practice, soft deletion wasn’t completely useless because we would occasionally use it to refer to deleted data – usually a manual process where someone wanted to see to a deleted object for purposes of assisting with a support ticket or trying to squash a bug."

alerighi3y ago

It may be convenient, but under the GDPR is illegal. When an user deletes an account, all the personal data associated with that user must be deleted (or anonymize it in a way that it's no longer possible to associate it back to the particular user).

You cannot just keep user information forever "just in case" they are useful again.

18 more replies

pradn3y ago

Soft deletion is just one way to achieve undeletion. The author's proposed solution of moving the resource to another table works just as well. You can move it back to the non-deleted table to perform the undeletion. You can keep around these deleted objects as long as you want; they work as a subset of a proper audit trail. The cost of course is you have more tables, but that is less of a cost than having to add "deleted=False" predicates in all of your queries.

Also note, if you use a soft-deleted column, indexes need to be keyed by that column as well if you want to access non-deleted objects quickly. That's extra complexity.

mjevans3y ago

Even more important; the deleted records don't need to live in your cache / RAM / etc. Potentially faster queries.

JohnBooty3y ago

    but that is less of a cost than having to add "deleted=False" 
    predicates in all of your queries.

Maybe or maybe not. You can use a view. Or you may be using an ORM that lets you set a default scope (essentially, a default WHERE clause - ActiveRecord lets you do this)

It also depends on if you're designing an app for this from the ground up or if you're trying to retrofit an existing app with 90 million different hardcoded queries.

Also depends on what you want to do with the deleted records. Do you want to do things with them? Maybe after a user is deleted, you want them to be able to log in, but you would prefer them to see a "sorry - your account has been deleted" message instead of "user not found." Maybe you want your support staff to be able to look at deleted users. Etc. Now you need to update your app logic so that it's flexible enough to look at "users_deleted" and "users". Which may be at least as onerous as messing with the WHERE clause on every single query.

To be clear, I don't hate the "another table" solution. It's the right choice in a lot of situations IMO.

Gordonjcp3y ago

> but that is less of a cost than having to add "deleted=False" predicates in all of your queries.

It's like people have forgotten what views are.

1 more reply

smackeyacky3y ago

What seems to be missing here is the DB tech he is using. On a proper database you can do your "undeleted" with triggers and it's relatively trivial. Nonsense like a "deleted" column on your main data table just seems silly.

3 more replies

efsavage3y ago

Also if people know that deletion is reversible, they're more likely to actually do it, which can keep things generally tidier.

I don't actually like using a "deleted" column, my standard table has a status column, and deleted is one of those states, along with active/pending/suspended/etc, as the needs dictate. This way I get soft deletes for basically free both in the schema, but also in the queries (which would generally default to active), so it's not really the spaghetti that the author discusses.

JohnBooty3y ago

For some applications this is fine, depending on your app/business logic but for a lot of applications states like active/pending/suspended and "deleted" are not mutually exclusive.

Suppose I soft-delete an active, pending, or suspended user using your scheme.

Now I need to un-delete the user. What status should they have? We don't know.

This is another "best practice" I've learned over the years. Those flags/statuses you think are mutually exclusive? Maybe they aren't. Or maybe they are now but won't always be. It's usually easier in the long run to give each status its own column even if you think they'll always be mutually exclusive. Because I mean, what are you really saving? A few bytes per record? In most cases that's not worth it.

robertlagrant3y ago

That still has the same issues. You have to remember to set every linked table's records to the same state, or remember to query every linked table through the table that has the lifecycle column on it.

2 more replies

rjzzleep3y ago

In rails you get these things for free. What I don't get is why everyone rolls their own framework with node.js. It's basically 90s PHP all over again.

EDIT: Soft delete is a trivial piece of code when the framework has a well defined transaction system for its ORM. It's not really related to Rails per se. Your statement is extremely disingenuous, while trying to look smart. Audit trails _can_ be(but don't have to be) more complex, especially when the framework uses a lot of stored procedures to handle operations. But other than that these frameworks are specifically designed to REDUCE complexity of such operations, dependency costs - which are huge in node.js, specifically because you can mix and match anything into everything.

Node.js people tend to stitch together XSS solutions, random templating solutions based on their frontend work, even basic salting of auth passwords becomes unpredictable because you have 30 options on minimal auth libraries.

But yes nothing is ever free. If you want to use Rails you still have to learn ruby and the framework and a basic understand of how ActiveRecord builds queries if you want to be writing performant code. And the same applies to Laravel, Django, or whatever of the 50 patchwork node.js solutions you want to base your code on.

joshmanders3y ago

I don't know why you're ragging on Node.js users or even PHP for that matter as both ecosystems have this stuff covered too.

Also you're comparing language/runtime with an actual framework and then dogging those users...

If you want to compare Rails with Node/PHP then I'd suggest comparing with things like Laravel (PHP), Adonis (Node) and you'll find everything you can do in Rails is done in Node/PHP too.

3 more replies

sky_rw3y ago

Nothing is free. In Rails you have _currently well maintained libraries_ for this. There are still complexity costs, dependency costs, data costs, performance cots, etc, etc, etc.

tomschlick3y ago

> It's basically 90s PHP all over again.

Which is funny because Laravel give you this for free as well through it's ORM. Soft deletes are an easily solved problem with $table->softDeletes() in your migration.

1 more reply

jesseryoung3y ago

I was getting ready to disagree with you - but then I tried to think of any time I've actually pushed code to production with the "DELETE" keyword in it. The problems that I've had to solve in my career very rarely call for deleting something.

"Soft deletion" and "audit trail" are technical terms we developers come up for solutions the business wants but maybe hasn't asked for yet. It's not really a soft deletion it's a "deactivate" or "hide". Likewise, it's not an audit trail it's a "history" or "undo". Most of the time your stakeholders and users actually want these features, but don't ask because they perceive this as more expensive to build then just a "delete" button.

pwm3y ago

> Likewise, it's not an audit trail it's a "history" or "undo".

Depends on the industry. The one I work in audit trail is a well-defined and mandatory business concern.

augustl3y ago

This is why I don't understand why Datomic isn't more popular. Pretty much every system I've worked on never needed to scale past 100s of writes per second due to hard limits on the system (internal backoffice stuff, fundamenally scoped/shardable to defined regions, etc etc). And since Datomic is built with that in mind, you get the trade-off of full history, first class transactions and being able to query for things like "who changes this attribute to its current value, when, and why" is such as super power!

Scarbutt3y ago

Too niche of a technology, tied to clojure, not open source and very slow(doesn't matter for most internal apps though). For many, it's better to do the tedious thing here and there with postgres. SQL also has strong grip on databases and if you look at it the other way around, postgres has lots of features that datomic lacks, with datomic you almost always need a secondary database.

tsuujin3y ago

It’s hard to get adoption for expensive toys.

I think Datomic is neat, and I’d like to use it, but it is prohibitively expensive for a personal or hobby project. Personal projects are where I get excited about tech, and when I’m excited I’m more likely to adopt it in my day job.

They’re really shooting themselves in the foot by not having a one-click install free tier or a self hosted option.

1 more reply

nightski3y ago

Personally looking at their pricing since it is so tied to AWS it is completely non-transparent how much it's going to cost us now or in the future.

I really like the concept of datomic though.

rjbwork3y ago

In the past I've used MSSQL's Temporal Tables (also called System-Versioned Tables) to implement this kind of functionality. This also gets you, for free, Type 2 SCD functionality for OLAP-style queries.

I can't wait until Postgres has this kind of functionality baked in. It's such a nice feature.

1 more reply

th0ma53y ago

For me it is simply that isn't open source (at least last I checked.)

synthc3y ago

Datomic is great but i think its missing some features that many enterprises need (access control and a query planner). Also it seems to be mostly built for DynamoDB/AWS.

wnevets3y ago

This is something that I was forced to learn the hard way more than once. Literally today I needed to undelete a record because a customer was confused by what the "delete" button did and wanted their record back.

wst_3y ago

Isn't it the problem of UI, though. If the user would be informed about the consequences (possibly with bold red font and with a confirmation checkbox) would they still click that button?

6 more replies

corrral3y ago

I'd add tagging, for anything that could conceivably use it, when you're doing DB design. May as well start with support, even if the functionality's initially dormant. Someone will ask for it, directly or indirectly, and it won't take long before they do.

JohnBooty3y ago

I personally would not agree.

But... your experiences are also real and I believe you when you say that your experiences DO support your conclusion. :)

The "soft delete" design decision is usually pretty impactful and is in my experience potentially much more of a pain in the butt to implement later if you haven't included it from day 0.

Audit trails and soft-deletes are also crazy useful for developers (both for debugging and for general cover-thine-ass utility) even if end users never touch them.

Whereas, tags are easier to tack on later and are not intrinsically useful to developers.

2 more replies

nicoburns3y ago

+1 on audit trails. And one should always store audit trails in machine readable format. That way you can not only manually inspect what happened, but you can query it too (and reconstruct the entire state as it existed in the past if necessary).

aboodman3y ago

  > Somebody always wants to undelete something
  > or examine it to see why it was deleted
  > or see who changed something, or blah blah blah

People for whom this resonates should look into Dolt: https://dolthub.com/.

It's a mysql-compat database that is versioned and forkable - basically imagine git and mysql had a baby. Every transaction creates an entry in the commit log that lets you see when, how, and why the database changed. And just like git you can `dolt checkout <hash>` to see what the data was like at some point in time, or `dolt checkout -b temp-branch <hash>` to make exploratory changes to some historical version, or `dolt revert` to revert a transaction... etc.

There is a lot more power that comes with making the entire database versioned and forkable by default. For example it makes it much easier to recover from catastrophic bad code pushes, etc.

note: Dolt was forked from Noms, my previous project, but I don't work for Dolt or have a stake. Just a fan.

makeitdouble3y ago

> unless there's some reason not to.

Yes.

To note, with GDPR there's now legal reasons to do so regarding user personal data. That can be the moment the devs realize they actually can't delete the data, because they soft deleted for so long, many relations are now interlocked and the data model needs to be changed to give a starting point to the deletion cascade.

My lesson from that was to at least have one test deleting a mock user that spans the maximum data breadth of the service . We caught a bunch of these loops in test at dev time and that was pretty great.

JohnBooty3y ago

    My lesson from that was to at least have one test 
    deleting a mock user that spans the maximum data 
    breadth of the service . We caught a bunch of these 
    loops in test at dev time and that was pretty great.

Thank you. That is SUPER insightful. If it wasn't too late to edit my initial post I would add this.

Falkon13133y ago

Good point. That could be tricky when it crosses the boundaries of schema changes and data migrations.

cal853y ago

> Somebody always wants to undelete something, or examine it to see why it was deleted, or see who changed something, or blah blah blah.

In my experience this happens “rarely”, not “always”.

It can happen, and in some ultra-rare cases the impact of not being able to recover some data might be huge (company-ending, even), and engineers are good at worrying about such edge cases. That’s why we habe protective measures like soft deleting and event sourcing - because of nightmare edge cases, not because we are always having to actually use them. It’s driven by engineers avoiding their worst nightmare: having to say “I’m sorry, I cannot solve this problem for you. The data is gone.” It’s a peace-of-mind thing, not an everyday-need thing.

nivertech3y ago

> That’s why we have protective measures like soft deleting and event sourcing

IMO soft deletion is a hack trying to fix problems in CRUD, which is a hack in itself.

CRUD attempts to model everything in the universe as a collections of mutable items, loosely based on RDBMS/SQL.

Event Sourcing is more realistic: it models everything as a append-only logs of immutable events/facts, which preserves both the historical data and, more importantly, the original intent.

Unlike CRUD, Event Sourcing is technology agnostic.

femiagbabiaka3y ago

Having multiple use cases for data is normal and okay. Treating your primary data store that is designed for one set of use cases and using it for them all is very very bad, even before you hit scale. Know which data store to use and when for a use case is a super power than can allow you to scale to much higher magnitudes than one would intuit.

dcow3y ago

What's wrong with the author's proposed compromise/solution?

ReptileMan3y ago

The proper thing is database snapshots or something like that.

I don't think soft delete is wrong per se, but it is something that should be native to the database engine.

cpursley3y ago

Can you recommend any best practices, design patterns, etc - for implementing soft delete in SQL systems?

tomschlick3y ago

The biggest one is to use a timestamp instead of a boolean. For instance Laravel used `deleted_at` as the field name with the type of TIMESTAMP (for mysql at least).

This allows you to do a `select * from users where deleted_at is null` to get the active records, but also know when the user was deleted if you need to audit / rollback.

JohnBooty3y ago

I think the best answer is to optimize for the expected data and use cases.

- How many records are there? Enough that performance will be an issue? If I'm planning on keeping things in the same table, can I utilize database features like indexes and partitions to mitigate any perf issues? (For some access patterns, partitions might solve 100% of your perf concerns)

- How common is deletion? Are we expecting 1% of the records to be soft-deleted, or more like 90%? If it's the latter, you may not want all those records clogging up your main table.

- Is this greenfield development, or am I adding soft-delete to an existing app? Greenfield favors "same table" soft delete; if you're retrofitting an existing app it may be better to keep deleted stuff in a separate table so that you don't break existing functionality.

- What do you want to do with the soft-deleted records? Are there times when you want to treat them just like regular records, e.g. "Give me a list of all the users who joined last week, even if we've deleted their accounts?" If the answer is "yes" then a lot of those things will probably be easier if you keep deleted and non-deleted in a single table.

brightball3y ago

Agreed. The key to soft deletes is to actually move the record out of the original table to ensure they don’t accidentally end up in a query join.

But there’s always something that needs to be undeleted. You can either have an easy way to do it or restore an entire DB backup and cross query the missing records. Soft deletes are a lot easier.

4m1rk3y ago

It's audit log enough for that? As deletion should also be logged.

xxs3y ago

GDPR compliance and all - we encrypt and delete stuff... or at the very least "lose" encryption keys 1st.

pg_12343y ago

This 100%

dafelst3y ago· 43 in thread

Views are a simple solution to this problem. Pretty much all moderns RDBMSs support updatable views, so creating views over your tables with a simple WHERE deleted_at IS NULL solves the majority of the author's problems, including (IIRC) foreign key issues, assuming the deletes are done appropriately.

I feel like a lot of developers underutilize the capabilities of the massively advanced database engines they code against. Sure, concerns about splitting logic between the DB and app layers are valid, but there are fairly well developed techniques for keeping DB and app states, logic and schemas aligned via migrations and partitioning and whatnot.

pbardea3y ago

> assuming the deletes are done appropriately

This is one gripe I have with soft-deletion. Since I can no longer rely on ON DELETE CASCADE relationships, I need to re-defined these relationship between objects at the application layer. This gets more and more difficult as relationships between objects increase.

If the goal is to keep a history of all records for compliance reasons or "just in case", I tend to prefer a CDC stream into a separate historical system of record.

dspillett3y ago

> Since I can no longer rely on ON DELETE CASCADE relationships

Cascaded deletes scare me anyway. It only takes one idiot to implement UPSERT as DELETE+INSERT because it seems easier, and child data is lost. You could always use triggers to cascade you soft-delete flags as an alternative method, though that would be less efficient (and more likely to be buggy) than the built-in solution that cascaded deletes are.

If you look at how system-versioned (or “temporal”) tables are implemented in some DBMSs, that is a good compromise. The history table is your audit, containing all old versions of rows even deleted ones, and the base table can be really deleted from, so you don't need views or other abstractions away from the base data to avoid accidentally resurrecting data. You can also apply different storage options to the archive data (compression/not, different indexes, ... depending on expected use cases) without more manaully setting up partitioning based on the deleted/not flag. It can make some query times less efficient (you need to union two tables to get the latest version of things including deleted ones, etc.) but they make other things easier (especially with the syntactic sugar like AS AT SYSTEM_TIME <when> and so forth) and yet more things are rendered possible (if inefficient) where they were not before.

> I tend to prefer a CDC stream into a separate historical system of record.

This is similar, though with system versioned tables you are pretty much always keeping the history/audit in the same DB.

---

FWIW: we create systems for highly regulated finance companies where really deleting things is often verboten, until it isn't and then you have the other extreme and need to absolutely purge information, so these things are often on my mind.

1 more reply

wvenable3y ago

Often you don't have to rely on ON DELETE CASCADE relationships. Because you are never deleting anything, you will never have any orphaned records. If you don't want to see say Invoices for a deleted Customer then that's just another filter feature.

Mostly I use soft-delete because for auditing requirements we pretty much can't remove anything but also because nothing ever truly goes away. If we have an Invoice or Order then, from our perspective, we must have those forever even if the corresponding client is deleted and can never place another one.

1 more reply

dexwiz3y ago

You may end up doing this anyways if you have any application code that needs access to delete hooks, or access control varies across objects. At this point, you are probably using a ORM instead of direct queries, and place logic that could be in the db instead at the app layer.

danielrhodes3y ago

Being unable to effectively use foreign key relationships is definitely a downside of using soft deletes. But it's also worth asking if these types of behaviors, which would also include a feature like triggers, really belongs in a database or whether it's better to have at the application level (or at least at a layer above the data layer). I'd argue that ultimately you probably don't want these things at the DB level because you get into a situation where you're sharing business logic between two (or more places).

4 more replies

munk-a3y ago

If we're assuming you're using a view based approach which elides the soft deleted rows automatically then you'll get a lot of these dependent objects correctly updated for free assuming you're pulling them out of the DB with JOINs - SELECT FROM foo JOIN bar (assuming bar is a view into barwithdeleted) will automatically filter out the invalid rows from foo... if you're using this information to populate a CRUD interface it's likely you'll be JOINing bar already to get some metadata for display (like maybe bar.name instead of the surrogate bar.id key you use for joining).

1 more reply

kukx3y ago

Is not there any attempt to improve the soft deletion at the engine/SQL level? I can see it as a possible feature request.

4 more replies

dragonwriter3y ago

> This is one gripe I have with soft-deletion. Since I can no longer rely on ON DELETE CASCADE relationships

If you use soft deletes on all tables, you can also cascade them as long as you either cascade updates to the real keys as well, or prevent such updates, by having a deleted flag column on each table, including it in a unique constraint with the actual key column(s), and including it in the foreign key.

semiquaver3y ago

The main problem with views for this use case in practice is that they ossify your schema. Views and matviews are effectively a dependency tree, and many common types of schema evolution become substantially more difficult when the system forces you to wrap your DDL in a series of view drop/recreation steps.

This is merely annoying when dealing with regular views because recreating even a large number of views is fast, but can be catastrophic if you have any matviews in your table dependency tree. A matview can easily turn what should be an instantaneous DDL operation into a partial outage while the matview is being regenerated.

(this is all postgres specific, it may be untrue for other systems)

enepture3y ago

As an FYI using a tool like DBT solves this problem. As someone who was not a data engineer I was not familiar, there were tools like this

4 more replies

rodw3y ago

> The main problem with views for this use case in practice is that they ossify your schema. Views and matviews are effectively a dependency tree, and many common types of schema evolution become substantally more difficult when the system forces you to wrap your DDL in a series of view drop/recreation steps.

You're not wrong, especially with the second part. I.e., deeply nested or convoluted dependencies between views can definitely make it awkward or painful to make adjustments near the root of the tree.

When I started this reply I was going to say "I hear you, but it's not an issue I run into very often". But that's not true. I've actually been burned by that moderately often, and have sometimes avoided or redesigned the root-level table change to avoid having to propagate all those changes to the rest of the dependency tree.

That said, in my experience (also mostly with postgres for this context) I feel like that's usually been more of a developer laziness issue (my own laziness that is), rather than an "ossified schema" issue. It's definitely a PITA when some simple change is going to break a dozen inter-connected views, but that's a coding issue not a deployment issue almost all of the time.

To be fair I don't really use matviews very often, but for true basic views I am guessing that the actual execution of the DDL to rebuild changed views is manageable in all but the most extreme cases. Even then there _should_ be a maintenance window of some sort available.

Thinking this thru a little bit, I believe the "anti-pattern" you're warning against isn't really views themselves but deeply nested/interconnected views (views that query other views, etc). I use views often (for this logical-delete type idiom for example) and I have rarely regretted it. I have often regretted creating complicated view-base dependency trees however, so I think I'm wholeheartedly in agreement on that point.

smallnamespace3y ago

To avoid an outage, have you tried fronting the matview with an additional view to allow hot-swapping?

slt20213y ago

it is very dangerous to have dependency on materialized view - it is a poor architectural decision from DBA to do that.

if you want view depending on mat view - materialize it yourself in a table, and refresh it yourself controllably.

1 more reply

rodw3y ago

Seriously. That "Downsides: Code leakage" point is nonsensical.

``` CREATE OR REPLACE VIEW active_customer AS SELECT * FROM customer WHERE deleted_at IS NULL OR deleted_at <= NOW() ; ```

There, I fixed it.

Just use `active_customer` instead of `customer ... deleted_at IS NULL`.

In fact, since the deleted_at column is a timestamp, the original "leakage" query:

``` SELECT * FROM customer WHERE id = @id AND deleted_at IS NULL; ```

is actually broken. A non-null `deleted_at` timestamp that's in the future implies the record hasn't been deleted yet, right?

I've often had junior devs assert that views are some kind of code smell, but these sorts of "canned query/filter that you want to apply very very often" seem like the perfect use case for a view to me. It's DRY, and the fact that your standard "query" is in the database" means you can change it more readily than trying to make sure you hit all the points it might be embedded in the application code.

> I feel like a lot of developers underutilize the capabilities of the massively advanced database engines they code against

Early-ish in the JDBC days a senior dev I was working with at the time (as a junior dev myself) made a pretty good case that "the database is part of the application" that's always stuck with me. Full database independence via software level abstractions is a pretty silly goal outside of library code. If you have a service that makes extensive use of the database, don't throw away the database features in the interest of some abstract "we could swap out oracle with mysql without changing anything" objective. If you want it to be generic, use the SQL standard, but don't be afraid to have a few db-specific bits in the app code if that's a subsystem you might replace once a decade or something.

I blame the DBA/Dev divide for a lot of this. A lot of the impedance between these layers is social/procedural. If you can change the DB as easily as the code, there's a lot less fear of using the right tool for the specific job.

djur3y ago

The query isn't broken. In the Rails community at least it is very common to use a nullable frobbed_at column to indicate both "was it frobbed" and "when was it frobbed". In that context, the boolean check is always NULL/NOT NULL, rather than a time comparison.

1 more reply

doctor_eval3y ago

> the database is part of the application

100% this. If you accept that the database is part of the application, you give yourself permission to use the full feature set of the database, and life becomes a lot simpler. Using views, stored procedures and other features lets you implement things like soft delete trivially, without it infecting all your application code.

In my entire career I've changed backend databases for an application exactly twice. It's not easy, and no amount of abstraction is likely to make it easier.

1 more reply

madisp3y ago

I'd argue that the simple `deleted_at IS NULL` check is not broken - unless your product / domain specifically allows and requires scheduled future deletions adding such logic can easily introduce bugs. For example, you could to get the comparison flipped by accident, and if it's only in one place out of many that bug could go unnoticed for a while.

2 more replies

aeyes3y ago

At least in Postgres, having a huge amount of "dead" data in large tables is problematic because vacuum always has to read the full data set.

Even with conditional indexes where you exclude deleted data you take a significant performance hit reading dead blocks because there is no way to quickly vacuum them. You accumulate hours of bloat until your vacuum finishes.

You can't beat a separate insert only archive table which you never have to vacuum.

paulmd3y ago

This is practically the #1 use-case for partitions imo. Partitions are tables with syntactic sugar (that Postgres understands in its query planner) so partitions maintain their own indexes and are vacuumed individually. If you structure your partitions into hot/cold, or hot/cold_day1/cold_day2/etc then you get several advantages:

* hot and cold do not churn each others indexes or tables, you effectively have only one set of indexes (and data) that's actually churning and the others are stable.

* hot and cold can be treated differently - you can perform more aggressive indexing on data once you know it's out of the hot-set, or partition your hot data onto a tablespace with dedicated hardware while cold data is on bulk hardware, etc. Since queries are planned onto each partition individually, postgres can often select "the right way" for each partition.

* "deleted_at" is a special case of cold table. Dropping any partition is basically free, so if you partition into date ranges, then at the end of your data retention period you just drop the partition for that date range, which doesn't churn indexes or induce vacuuming

If data can never exit the cold-data state, then it's effectively an append-only table too, it just exists as a supplement to your online/hot-data table but it doesn't require special attention/etc. So we're in agreement on that point, that's a good design feature if you can swing it!

(note that for audit logging, I think it's simpler to just do the separate table. But the partition strategy does have some advantages for "cold" or "dead" data as a more generic concern imo)

anarazel3y ago

Vacuum does not have to read the full data set every time. The visibility map tracks, on a block level, whether all rows in a page are known to be visible to everyone (starting in 8.4 or such) and whether the page is "frozen", i.e., does not contain visibility information that might need vacuuming (starting in 9.6 IIRC).

However, indexes do currently have to be scanned as a whole. But that's only done by autovacuum if there's enough row versions for that to be worth it (in recent versions).

layer83y ago

Shouldn’t partitioning help with that? (I have no experience with Postgres.)

1 more reply

jaydub3y ago

Based on my experience, I like the author's approach since it makes things pretty clear-cut and optimized the storage in the core table (in my experience as well, deletes happen frequently and the soft deletes are rarely touched). In large, row-oriented tables that that storage can add up and even with views/materialized views there's a cost to using/maintaining those as well.

SoftTalker3y ago

A problem (unless something has changed, my context is Oracle from some time ago) is that NULL values are not indexed. So the "WHERE deleted_at IS NULL" could trigger a full table scan. It can also cause row migration when the NULL value is eventually filled in. Unless you explicitly need the deleted date, it's probably better to use a non-nullable Y/N for this.

remram3y ago

It seems Oracle does it although there is a special syntax to opt-in. That seems wild. I am not aware of another DBMS having that limitation though.

1 more reply

firloop3y ago

Views can really bite you performance wise, at least with Postgres. If you add a WHERE against a query on a view, Postgres (edit: often) won't merge in your queries' predicates with the predicates of the view, often leading to large table scans.

dafelst3y ago

IIRC Postgres has supported predicate push down on trivial views like this for over a decade now, and possibly even more complex views these days (I haven't kept up with the latest greatest changes).

1 more reply

AdrianB13y ago

There is no impact with views in MS SQL. You can also have indexed views and filtered indexes, so you can have even better performance.

5e92cb50239222b3y ago

This is one of those situations where a good ORM can simplify things greatly. For example, with EF Core you can add a global filter which will filter out soft-deleted rows in all queries automatically (unless you add .IgnoreQueryFilters()).

It couples nicely with some hackery which turns removes into soft-deletes. You can remove objects as usual and they get soft-deleted in the database.

I've used this in a few projects and it's fantastic.

https://docs.microsoft.com/en-us/ef/core/querying/filters

https://www.thereformedprogrammer.net/ef-core-in-depth-soft-...

7crow3y ago

> there are fairly well developed techniques for keeping DB and app states, logic and schemas aligned via migrations and partitioning and whatnot.

Hi, <1 yr experience swe here. Would HN mind unpacking "whatnot" with specific names of some these techniques?

OJFord3y ago

I had the same reaction to the 'code leakage' section, but 'foreign keys'? You can't reference a view; so you either don't use them (fks) or they point at the underlying table and you have the problem described.

You could have views that say 'thing I have a foreign key to is not deleted' of course, but that sort of seems like 'code leakage' again, just in SQL this time.

ibejoeb3y ago

> developers underutilize the capabilities of the massively advanced database engines

So true. There are so many amazing, powerful features in all of the major players.

Also: updatable views are amazing. With query rewriting (whatever you vendor calls it) you can affect some truly material changes to the system without any changes to the client applications. An example would be implementing temporal relations.

layer83y ago

How does this help with foreign keys? Normally you can’t have foreign keys referencing a view.

I agree that one should make use of RDBMS capabilities. A check constraint may be practical instead of (or in addition to) the foreign-key constraint.

nousermane3y ago

> (...updatable view...) WHERE deleted_at IS NULL

This is the way. Also, save record creation timestamp, and you can have very flexible "time-machine" selects/views of your table essentially for free.

moggers3y ago

I'm glad to see this thread. I've been mulling over this exact issue of deleted_at code leakage with a naive soft-delete implementation. My immediate thought was to use views, so its nice to see this is not yet another case of me using crack-brain ideas out of inexperience.

What's an appropriate naming convention?

Should I do it universally and put transparent views in front of all my tables so I don't have to refactor my code to point to new views whenever I do suddenly need to put in a model constraint that isn't 1:1 with my data layer? Is a transparent view a no-op in terms of perf? if it matters, this is being done in Postgres

I will probably make my constraints partial over the not-deleted records, particularly for unique constraints used for upserts. Am I about to footgun myself? Is it even necessary with the new uniqueness behavior with NULLs being implemented in postgres? Will my performance characteristics be better one way or the other in particular circumstances? It sounds like if I have a high ratio of deleted to not-deleted records a partial index becomes necessary.

airstrike3y ago

Views are such a powerful concept I’m honestly disheartened by how hard it is to use, replicate or leverage that functionality outside of dropping straight into the db shell

quickthrower23y ago

See http://materialize.com

zozbot2343y ago

Views can be used to implement pretty much any kind of automated inference, reasoning, rules etc. on the "raw" table data. The example of filtering out deleted records is just one of the simplest. That one single feature can easily transform a simple DB platform into a fully-featured knowledge base system, easily usable to support even complex reasoning tasks.

coding1233y ago

I was going to chime in with this. thanks. One issue with views however is that a lot of these features require more and more nuanced knowledge of RDBMSes where these days unless you have a veteran architect, most of the team just knows the various library/tooling that interacts with "a variety of databases" so there is often less effort to go deeper.

quickthrower23y ago

Where is this anti-fb culture? Is it a startup thing?

Everywhere I have worked people know a decent amount about their data store. Not architects, just mid devs and higher.

1 more reply

vlunkr3y ago

That doesn't solve the foreign key problem. You can still easily have a reference to a record that is "deleted"

yen2233y ago

Also in Postgres, you cannot have a foreign key constraint that references a view, not even a materialised view.

I'm with the author on this one. Any soft delete logic does have a tendency to bleed into other systems and make your systems more complicated, for very little gain.

chrisshroba3y ago

How would a view solve the foreign key issue? Are you suggesting coding specific deletion triggers into the view such that appropriate foreign keys are "cascade" deleted when a row in the view is deleted?

1 more reply

dimgl3y ago

100% this. And yes, with inner joins it also solves the relationship issues.

giantg23y ago· 10 in thread

"The concept behind soft deletion is to make deletion safer, and reversible."

That's one part. The other part is that in many industries you have regulatory data retention and audit requirements. This is arguably the most valuable and common reason to perform Logical deletes.

brtkdotse3y ago

In banking and bookkeeping, there’s no such thing as a “delete”. Once something is in the ledger you can’t undo it - you have to make a new entry that negates the old one.

4 more replies

jewayne3y ago

I would argue that in many cases the concept behind soft deletion is to make deletion permanent.

Hard deletes retain no memory of what you wanted to be gone, so any malfunctioning sync process will continuously recreate the deleted record soon after it's deleted. Soft deletes are often the only way to make sure deleted records don't reappear.

2 more replies

nirvdrum3y ago

My experience at a few start-ups has been that account deletion just isn't prioritized. It's not a focus when building an MVP. If the application ever gains traction, everyone is then terrified they'll accidentally delete customer data that they never delete anything. It's a shame. As a user, when I delete my account or data in my account, I want you to permanently delete it, not keep it around and just make inaccessible to me.

3 more replies

silisili3y ago

Why not just use an audit table, to keep from littering your indices?

2 more replies

MonkeyMalarky3y ago

Then there's always the joy of a situation where your client is being sued by one of their clients and now needs your help recovering everything possible from your platform. And you're going to help them because you'd like to keep them as a client rather than let them be sued into oblivion.

jiggywiggy3y ago

Ha, and then there is the opposite regulation that you have to delete user data.

3 more replies

taeric3y ago

This is almost certainly going to bite you if you don't push all customer identification data out of your main data stores.

And it will go a long way to making your services harder to use if you don't allow users to associate friendly names with things. And to assume that the same friendly name will be used for a future item. (For example, if you name devices based on the room you put them in. Is reasonable to think that when you replace a device, that you are likely to want to reuse the name.)

necovek3y ago

And yet another part is making deletes (appear) instantaneous: useful when it involves cleaning up a bunch of "related" data possibly living on different services (eg. S3, ES...).

This also helps with the original goal of making them safer by manually implementing "eventual consistency" for data living outside the transactional world.

1 more reply

dexwiz3y ago

I think billions have been spent bridging the gap between “ideal” software and what businesses actually need. Access control is another thing I see developers wanting to simplify or push to implement later, but is actually a key feature.

1 more reply

yen2233y ago

Nowadays, regulations goes both ways. Data sovereignty regulations and GDPR-like laws can mandate that data must get hard-deleted.

dfee3y ago· 6 in thread

My experience is that soft-deletes are blunt tools bridging the gap between hard deletes and event sourcing (capturing all the changes against the table, in a replay-worthy stream).

Event sourcing is hard – because the engineers responsible for setting it up and managing it aren't generally well skilled in this domain (myself included) and there aren't a wealth of great tools helping engineers find their way into the pit of success.

The downsides of soft-deletes (as identified in the article) are numerous. The biggest problem is that it appears "simple" at first blush (just add a deleted_at column!), but it rots your data model from the inside out.

rodelrod3y ago

Or you can see it the other way around: soft-deletes are a pragmatic alternative to event sourcing that provides a lot of the value without requiring a team of super-humans and a radical redesign of the existing systems.

redavni3y ago

Just want to add that the downsides as identified in the article make little sense. Deleting a customers invoices should be a very rare thing. I can't imagine any accountant or auditor is going to be happy with an IT guy deciding when to delete invoices.

If accidentally writing the wrong query is a problem, then writing the wrong query is your problem.

btown3y ago

The question for either of these systems IMO is: do you trust that a change from your upstream represents a true, everlasting intention, or is it something that may need to be reinterpreted or rolled back in the future?

At my startup, soft deletes for our SKUs are critical, because we work with data sources where notoriously both the technical systems and the humans driving will all-too-frequently accidentally represent something to our connection as deleted. Or there might be an irrecoverable error when asking "what things are still active upstream" - but that doesn't mean the SKUs are deleted, we might just not have certain live details until a bugfix is made. So "error status" and "soft delete" are somewhat synonymous, and both require investigation into root causes and root intents. Yes, the concept of "unerrored and active" is peppered through our codebase and analytics - but our ability to recover from supplier technical mistakes is much higher as a result. And we could absolutely do this with an event sourced system - but the tooling for relational databases is so much better, it's night and day.

zozbot2343y ago

An event store is just a special case of a temporal database. The whole point of temporal databases is to natively support the notion of historical vs. current data.

Terr_3y ago

> My experience is that soft-deletes are blunt tools bridging the gap between hard deletes and event sourcing

Agreed, sometimes it makes business-sense to implement it, but in the big picture it's still kludgy and not-ideal.

While full-on event-sourcing isn't always the answer, once business-rules prevent you from un-deleting anything there's not much point of having all those dead-rows interspersed in your regular tables.

vbezhenar3y ago

You can restore to any point of time from your database backup. So it can cover some requirements.

danielrhodes3y ago· 3 in thread

In a previous place I worked, we were programmatically using Box to store files. One day we were presented with a case study in Murphy's Law: a script went awry and deleted everything (10s of thousands of files). There was no clear way to recover these files, they were gone from what we could see. It was a disaster. We got a Box support person on the phone and described what had happened. There was a pause, some mouse clicking and then: "Ok, those files will be back in your account in an hour."

It was 100% our fault. But soft deletes saved us that day. If you're in a situation where you or your customers could benefit from the same, it's wise to not only embrace them but also make sure they work.

pradn3y ago

The author agrees with you in principle. All the author is arguing against is the use of "deleted" bool column to indicate deletion. His solution of moving deleted objects to their own column gives you the ability to un-delete, just as before. Only now, your queries and indexes are simpler and you get to use foreign keys and other useful futures.

1 more reply

foolfoolz3y ago

Box has a well defined schedule for the various stages of trashing. some of them are user configurable. i would call this workflow expected behavior of the application. this sort of soft delete is something you design in intentionally knowing it’s a customer use case. there’s many other objects in Box that do not need this workflow. i think soft deletes don’t need to be available for all tables but some it’s immensely helpful

latchkey3y ago

That sounds more like a lack of backups and disaster recovery than it does soft deletes.

jelkand3y ago· 3 in thread

Soft deletion is certainly very situationally worth it. I've found the most value when 1. it is well supported at the ORM layer and 2. business requirements dictate strong auditability of data. While I have undeleted items on occasion, I've used soft deletes more frequently to debug and build a timeline of events around the data.

For context, I've worked in fintech where I often needed to review backoffice approvals, transactions, offers, etc.

hinkley3y ago

In my limited experience, soft deletion also has better prospects where partial indexes are involved, since it reduces the size of the index and reduces search and insert time a little bit. If soft deletes are rare, you aren't going to see much of a payback for your investment in code complexity.

And since you can never really be sure what you'll need 2 years from now, I imagine there are a lot of anecdotes out there of people who implemented it thinking it would be used a lot, and turned out to be wrong.

krstf133y ago

Wouldn’t storing the deleted data in an immutable storage, with time stamp, be much better for auditability ? I mean how could you audit deleted, restored and deleted again data with that setup? Also, while I know it’s not really accurate, I tend to understand relations as sets, it makes me uncomfortable to have soft deleted data that are neither member or not member of the set.

chomp3y ago

Yep, we have an abstraction layer on top of the ORM to provide common queries. "Give me all X" will always return stuff not soft deleted. Data people also like to go diving through old data, and without getting into data warehousing and stuff like that, it's not too complex to support a single flag to enable us to keep old stuff.

1 more reply

Gurgler3y ago· 2 in thread

There's a very legitimate case that I've seen made for soft-deletion in several different situations: foreign keys related to "created-by" columns. Hard-deleting a user who created an object that remains in use after they're gone would trigger referential integrity complaints on those columns. Without being able to reference a "deactivated" user's primary key in such a situation, you'd have to come up with some counterintuitive system for revisiting such objects. And the result (short of removing the foreign key) would be to give you inaccurate information about who created the object. Maybe one of you smarter people has already thought of an elegant way to handle this, but I've never seen one that satisfies my taste.

unemployable3y ago

Yeah that is the main problem with not using soft deletes. The question is though, if you delete a user, should the user's personal information still exist in your database, or does that violate some kind of privacy regulations? The idea of the deleted user's table is that it can be kept around and then pruned after x number of days to satisfy both privacy and undeleting. To keep the references around, I think one way might be to create two tables, so one table is used for all of the references and it stays around, and the other one gets deleted. Eg Account and User tables, or something.

dubswithus3y ago

There are Rails gems that can handle this in various ways.

But the easiest way is to deactivate the user account (is_active boolean) and continue to reference the user in internal records.

1 more reply

munk-a3y ago· 2 in thread

I just wanted to touch on the fact that eliding soft-deleted rows from queries is really, really easy - this article makes it out to be a constant headache but here's my suggested approach.

    ALTER TABLE blah ADD COLUMN deleted_at NULL TIMESTAMP;
    ALTER TABLE blah RENAME TO blahwithdeleted;
    CREATE VIEW blah (SELECT * FROM blahwithdeleted WHERE deleted_at IS NULL);

And thus your entire application just needs to keep SELECTing from blah while only a few select pieces of code related to undeleting things (or generating reports including deleted things) need to be shifted to read from blahwithdeleted.

bjourne3y ago

This is not a solution. It introduces a leaky abstraction which sooner or later will lead to errors. Sure, all code you write will access the view and not the table. But how can you ensure all other code in the organisation uses the view? Perhaps you add some access control to the table so that only authorized users can read directly from it, but that's even more technical overhead. Then you have foreign keys. If you have a "deleted" column in the Customer table you need to remake the Invoice table as a view so that it hides invoices for deleted customers. The same goes for the InvoiceItem table (foreign key of a foreign key) and all author auxiliary information related to the soft-deleted customers.

Furthermore, the cost of an error is potentially massive. Someone new at the company makes a revenue report based in the billed Invoices and does not realize they should query the view and not the table... Not great if 90% of all invoices belong to soft-deleted customers!

The author is right; soft-deletes are probably most definitely not worth it. There are many better ways to solve the problem.

5 more replies

yladiz3y ago

But, assuming you don't really need the data, why make your queries more complex and instead just actually delete the data?

2 more replies

GartzenDeHaes3y ago· 2 in thread

Personally I like no delete designs, which give you a full audit history of changes. This is similar to generally accepted accounting principles. https://en.wikipedia.org/wiki/Generally_Accepted_Accounting_...

munk-a3y ago

So if an account was active in your system and is active no longer... do you soft delete it (even if that means UPDATE ... SET active = 'f') or hard delete it?

1 more reply

znpy3y ago

Your taste in database design is probably nor gdpr compliant, i hope you don’t work in the eu.

2 more replies

adrianmsmith3y ago· 2 in thread

> Instead of keeping deleted data in the same tables from which it was deleted from, there can be a new relation specifically for storing all deleted data

The disadvantage of this is that if you ever do want to access this "deleted" data, e.g. in admin or compliance tools, you now have to do it in two different ways, one way for the main data and a different way in case the data has been "deleted".

The article asserts you'll never need to "undelete" the data. So they're presenting a solution with that assumption, fair enough. Without that assumption, however, moving the data back from an archive table becomes a pain, and if there are any unique constraints e.g. on username or email address, you'll have a problem if you've moved the data out of the main table and another user has used that username or email address.

Terr_3y ago

> The article asserts you'll never need to "undelete" the data.

IMO it's worth distinguishing between (A) some kind of "click to undelete" feature versus (B) simply having that old-data conveniently exposed for a developer to manually-edit things or craft database-change scripts.

In practice I've only ever seen the latter get used, because it requires a developer to figure out how the heck to get "the parts that matter" back while preserving the integrity of other newer data and obeying certain business-rules.

layer83y ago

> now have to do it in two different ways

Use a view.

> if there are any unique constraints e.g. on username or email address

Have those in a dedicated table where they aren’t deleted, and add a synthetic key referenced by the other tables.

scott_w3y ago· 1 in thread

The example the author gives is… frankly awful.

I can’t think of a single case where you’d want to remove the invoices of a customer you delete. Ever. In fact, the opposite is more likely to be a big problem, accidentally cascading your delete to your financial records!

Using a soft delete, your invoices won’t “disappear” because your app WILL have a view for looking at just the invoices.

Source: I built a bookkeeping system and soft deletes is a necessary feature.

decebalus13y ago

> I can’t think of a single case where you’d want to remove the invoices of a customer you delete. Ever.

CCPA will require you to delete the invoices. And I would love for all platforms to support deleting everything, including invoices, considering some things are illegal in other places and if there's proof of you buying said illegal thing, you can get in serious trouble (think gay dating apps in the UAE).

But I don't really agree with the author on his take about soft deletions.

1 more reply

vyrotek3y ago· 1 in thread

I've found SQL Server Temporal Tables are a good alternative to get the benefits of soft-deletes without some of the drawbacks.

https://docs.microsoft.com/en-us/sql/relational-databases/ta...

tfigment3y ago

Mysql also has this now. I've wanted to rewrite out apps to use it but haven't gotten around to it. Postgres has it as an addon but feels like it wouldn't work for us until its first class supported.

2 more replies

habibur3y ago· 1 in thread

Which is why I don't add that extra deleted field. Rather duplicate all the tables into a new database called "archive" and then insert there before deleting from main.

That works for updates too, by preserving the old data and showing you a time machine like backlog. But the archive database gets too large over time and you need to purge it periodically. You can create some delete triggers for automating this "save before delete" behavior.

tehbeard3y ago

How do you account for maintaining integrity in the archive?

E.g. you have 3 users sign up with the same email (a unique field) one after the other with deletions in-between each sign-up?

2 more replies

tehbeard3y ago· 1 in thread

> Instead, we rolled forward by creating a new app, and helping them copy environment and data from the deleted app to it. So even where soft deletion was theoretically most useful, we still didn’t use it

I don't get this statement. You wouldn't have had the env or data without soft delete? You did use it!

I would say, soft delete isn't a tick the box and done solution as many ORMs make it.

You need to consider the data model, and adjust your queries to that.

It may make sense for a product to be deleted, but orderlines still able to access it to display product name etc.

With blob data, I tend to move that to a "bin" with a 30-60 day grace period. Customers know quickly reporting, we can fully recover, while outside that time they'll have to provide images etc. It's a decent compromise.

Reuse of unique fields is the sticking point I run into often, as mysql interprets null as not clashing with other nulls so composite uniques using the ID and deletion date don't work.

TylerE3y ago

> mysql interprets null as not clashing with other nulls

Which is correct per SQL. Null is NaN, not zero (or negative infinity).

whack3y ago· 1 in thread

We use soft-deletes extensively at our startup. Here's a couple reasons:

- Feature creep. "Sometimes our users accidentally hit the delete button, or change their minds a minute later. We want to give them a way to undo the deletion." Or "I know we said last quarter that we users want to delete stuff, but they also want to see a list of everything they've deleted in the past." Soft-deletes handle feature-creep a lot better than hard-deletions

- It simplifies foreign-keys management. If you want to hard-delete something that some other entity is referencing, you'll have to hard-delete or modify that other entity first. And potentially repeat this process recursively for their own references. This is a pain. One could argue that if you really want to delete something, you should be deleting all children as well. Such arguments are highly domain specific, and very bad universal claims. We've seen some use-cases where such pedantry is not necessary

- It makes it easier to recover from mistakes and bugs. Customer deleted something accidentally and emailed you begging for help? Your code has a bug causing stuff to get deleted when it shouldn't be? You'll be thankful you did a soft-delete and not a hard-delete. Is it going to solve every single problem where the data has system-wide ripple effects in a unicorn sized organization? No. But it'll still solve a number of problems where the data impact is more localized

- It makes debugging easier. You have a clear record of everything that used to exist. You don't have to go digging through your logs to find something that used to exist but has now been deleted

- Speed. All of the above problems can be solved in other ways too. The author suggests putting all deleted data in a "deleted records table." So now you need to maintain a 2nd table for every table that you may want to delete stuff from. All schema updates will need to be mirrored on this 2nd table. And you'll need to write and maintain code to populate this deleted-records-table every time you delete stuff from the original table. All doable and straight-forward but takes time away from other things you could be doing instead

The main benefit from hard-deletions is data compliance and liability. Ie, being able to tell privacy-conscious customers that you actually deleted their data. If you're handling any sensitive data, you should definitely do hard-deletions at some point for this reason. But the other reason the author gave - "it's annoying having to check for `deleted_at` when writing SQL queries" - seems pretty minor compared to the benefits.

waspight3y ago

It seems that it is too easy to delete things in your system. Rather than solving it with reversible soft deletes I would suggest to improve the UX. I don’t agree that it simplifies foreign key management, it is most often the opposite from my point of view.

Pxtl3y ago· 1 in thread

Trivial case I hit:

1) Client wants to remove user from the system who have left their org but

2) There are objects that were contributed by that user which are required to persist beyond the user's deletion.

Those are ideal cases for soft deletion. We can still query information about the deleted user to explain who created this object, with the note that their account has been deleted.

Probably I should be doing full event-sourcing for this case, but delete flag works well. MS offers temporal tables for this use case and I'm still considering the implications there -- AFAIK ORM support is WIP.

And unlike the article author, I have used soft deletion to undelete things. Many times. Maybe he has better users than I do, I don't know.

mixmastamyk3y ago

Another way to handle is to set each obj.user to a “Deleted User” record.

muhaaa3y ago· 1 in thread

Always use a temporal database (datomic, postgres with temporal_tables extension). You get out of the box the full history of your data. That is really helpful for business intelligence and analytics, auditing / audit log (security, accountability), live sync & real-time features and as a bonus easy recovery after application fails.

If disk gets to full, project the latest time slice into a new database and move the old database onto a cold storage.

hu33y ago

MariaDB as well: https://mariadb.com/kb/en/system-versioned-tables/

justin_oaks3y ago· 1 in thread

The author didn't mention it, but restoring data from a database backup is a perfectly reasonable way to handle undeletes. By this I mean the situations that are "Oh crap, we didn't mean to delete that!" instead of usual business operations.

I've probably restored data from backup maybe 4 times in my career. I greatly prefer to do this on the rare one-off scenario than to deal with the overhead of soft deleting everything.

marcosdumay3y ago

The difference in framing one gets by looking around is amazing, even funny.

> I've probably restored data from backup maybe 4 times in my career.

Yet, I often use soft-deletes because it allows people to undelete things from the software interface and not call me all day long.

But that's not the most common reason I have for them. Normally it is because the data just can not be gone, and the full table is still important somewhere.

justin_oaks3y ago· 1 in thread

For those who are expressing favor with soft deletes, do you default to soft deletes on every table unless you know you won't need them? Or do you only apply them where you know you'll need them?

I think people arguing for and against soft deletes both understand that there are cases where you want to use them and when you don't.

baq3y ago

soft delete everywhere by default. true deletes only after retention policy expires, if FK constraints allow it (best if you can drop whole partitions).

codemac3y ago· 1 in thread

Well, there are several problems with this analysis when you go very large (>10000 machines):

- For many applications, it's easiest to put the state of the object in the primary key, and thus point reads will fail when something gets deleted. This has other problems though with hotspotting and compaction during deletes. The deleted table doesn't really solve this either.

- For storage systems, GC is critical functionality to implement. Most systems whether they want to believe it or not are glorified storage systems. Garbage collection is hard to do at scale, and I've never seen it implemented as SQL statements rather than code. Especially for GDPR etc.

- For large scale distributed systems, foreign key constraints are rare if impossible to implement with reasonable latency, so they don't exist either way. I haven't worked on a system in >15 years that had fk constraints.

- For large scale restores where you need to undelete trillions of rows, keeping the rows basically pre-assigns the distribution of writes. When you have to re-create the rows, you tend to get intense hotspotting and failures along the way as you attempt to load balance on the keyspace of the writes.

A deleted records table is good for smaller (<10000 machine) systems when latency between nodes can be kept within the same campus. It can really improve performance of your GC if reading by column isn't fast compared to reading by table.

mixmastamyk3y ago

Advice should be aimed at the 99% rather than the 1%, right? I guess Heroku and Stripe don’t have the biggest datasets in the world but they are probably larger than most folks will need to manage.

1 more reply

llimos3y ago· 1 in thread

Do any databases let you refer to constant values in foreign keys?

Then you could do

  FOREIGN KEY (foreign_id, NULL) REFERENCES foreign_table(id, deleted_at)

munk-a3y ago

I don't believe that's possible in postgres at least - but I don't think it's a huge concern either - you can have deleted_at cascade via trigger or just use views to hide the data - both are extremely easy to implement at the DB level without the application devs ever needing to worry about what's what.

matusp3y ago· 1 in thread

The "Code leakage" problem can easily be solved by using views. Or am I missing something?

dboreham3y ago

Solved with "...and deleted = false"

lowercased3y ago

"Instead, we rolled forward by creating a new app, and helping them copy environment and data from the deleted app to it. So even where soft deletion was theoretically most useful, we still didn’t use it."

But... weren't you using all those env and data info from the soft-deleted set?

I've typically been using soft-deletes for most projects for years. People have accidentally deleted records, and having a process to undelete them - manually or giving them a screen to review/restore - has usually been great.

Yes, if there's a lot of related artefacts not in the database (files/etc) that were literally deleted, you may not be able to get them back. But that's an ever greater edge case in projects I work in as to not be a huge issue. We probably have some files in a backup somewhere, if it's recent. Trying to 'undelete' a record from years ago - yeah, likely ain't gonna happen.

People are used to 'undo' and 'undelete'. Soft-deletes are one way to provide that functionality for some projects.

vivegi3y ago

If you do want to retain the deleted records for any purpose (audit, compliance etc.,) it is better to design a DELETED table to maintain the history (just as suggested in the article towards the end).

Once your main tables start getting to the order of tens of millions of records, the filtering by 'deleted_at is NULL' or 'deleted_at is NOT NULL' gets in the way of query performance.

NULL is also not indexed. So, that throws the spanner in the works sometimes (depending on the query).

deerIRL3y ago

As someone who has done development work with Class A data and specifically in the realm of justice, soft deletes aren't simply a good idea, they are required by law.

Most of these downsides are easily mitigatable issues as well. As many users have stated, something like views solves the issue of forgetting the 'deleted' clause.

Lastly, I'm not sure the issue with foreign keys/stray records really resonates with me. I'd be hard pressed to be comfortable allowing a developer or DBA who isn't fully comfortable with the data model to be hard deleting records, let alone flagging them as soft deleted.

willlll3y ago

For the control plane part of Crunchy Bridge, on day one I decided to go with the deleted_records table that is mentioned at the end of this post. It's been great. No need to keep around dead data that no one ever looks at.

We don't need to have `where deleted_at is null` on every single query. But the best part though is our actual working data set of records we actually care about is tiny compared to the deleted cruft that would have otherwise been just sticking around forever. Backups and restores take no time at all. It's really cool that postgres lets you have conditional indexes on things, but it's even cooler not to need them.

jonstaab3y ago

If you implement soft delete, you should surface it to your user. That's who is accidentally deleting things, and that's who will want to un-delete them. As for side effects like spinning up/down servers, build that into your data model (of course, in a case like Heroku's that can be prohibitively expensive, so don't).

Source: I write back of house software for resale store owners, and accidental deletes happen occasionally. Being able to restore things instills a lot of confidence for our customers.

radu_floricica3y ago

Is nobody using log tables? Pretty much every time I touch something in my db, there's a log call that records who did it, when, IP, URL and a (JSON) snapshot of the changed record, which in a pinch can be used for undelete.

It's surprisingly manageable. I mean, yes, it's definitely the largest table in the db, but:

1. it's well worth it

2. most of the stuff in it isn't the main scenario above (a human does something and I record the change) but various automated processes I also want to track, like API calls. which leads to:

3. it's easy to prune - both in time period kept, and by selectively deleting the automated stuff earlier

But it mostly helps by localizing things. It's just one meta-data log table, and everything related to logging actions is there. Not very elegant to keep adding fluff fields to every table, like "add_date" or "deleted_at". When I decided I want to also track the URL of the request I had to change things in just one place, and now I have it for every action everywhere.

Note: don't fall into the "everything is a nail" mistake. Some other dedicated log tables may be necessary, for high-volume or distinct stuff. I also have a mail_log, a sms_log and a separate table for events coming from mobile users (like location history).

phibz3y ago

I've definitely seen soft delete work in practice. A couple things: for small data sets you can implement the naive deleted_at you can hide the records from your users by forcing them to use a view. You can also handle updates on the view to prevent data conflicting with deleted data if you need to.

For foreign key constraints you can set the foreign key to null and orphan the records if the relation is deleted. You could also hard delete them in this case. It depend on your use case.

When the data volume grows or the ratio of soft deleted to normal records is high, you should consider another solution. One solution you suggested, moving the record to a deleted table is a fine one.

The other solution that I've used successfully is to journal your deletions in another table or system. For smaller volumes having an audit table Journaling the data and storing the pk, fkeys, and a serialized version of the record, json works great in postgres, works well. For large volumes or frequent deletions something like Kafka or PubSub work better.

You may very well find others interested in consuming your audit journal to track changes. Updates and even inserts fit great in the more general case.

Apreche3y ago

I agree with the author that a separate table is the way to go, but I go one step further than the author and use database triggers to manage that second table. Alternatively, a combination of database views and triggers can do the same thing without having an actual extra table to manage.

Either way, it allows you to have soft deletion and/or full activity logging functionality without the application having to know about it.

Ensorceled3y ago

I use soft deletes in our system and literally used it to restore an accidentally deleted item about 3 hours ago. Took a second to toggle the deleted item.

I don't get how this rocket science. Almost every query in the system is some kind of where clause on a fk to account or user or project or some other critical object ... so there are only a few places in the ORM where I need to support this.

Minor49er3y ago

It's interesting that the author notes that, as far as he's aware, nobody's ever undeleted something. It could be true. But I'm wondering if maybe he simply hasn't seen it first-hand since the action of recovering something is often handled by a customer-facing team and not by a developer.

dunkelheit3y ago

This brings back memories... Some time ago I was an intern in a team working on a UGC map editor. We were using this soft-delete pattern and for some task I needed to deploy a database migration that fiddled with the "deleted" status field. It was quite late and after the migration finished I almost went home but for some reason decided to check community forums. There users were having a time of their life taking screenshots of deleted objects that suddenly became visible (many of them quite amusing, including swear words written in 500km letters). Dunno how this escaped testing, but horror of what I have done brought clarity of mind and I quickly found an error and devised another migration that fixed the data. That worked and I was able to finally go home.

So yeah, be careful with the soft-delete pattern :)

unemployable3y ago

Everybody always did soft deletes with the is_deleted column at companies I once worked for so that is what I would do. I noticed that a lot of bugs would occur this way because you would forget to add the is_deleted to the query somewhere. The queries were also longer due to the longer where clause and so on.

These days I use a deleted table as per the article as I decided it would be better to deal with the more complex undelete process. It keeps that process to a single section instead of spreading it all throughout your database.

Some of the suggestions here like "use views" don't really work for two reasons - sometimes the is_deleted check must be performed in the ON clause, not in the WHERE clause, and sometimes you want to count the deleted or show the deleted, while other times you don't.

astura3y ago

>so you can be left with your customer being “deleted”, but its invoices still live.

This not a problem, its is almost always what's desired, otherwise you have no records for, for example, the tax auditor.

Obviously when, say, an employee leaves basically all things they did on a corporate system can't disappear. Any documents they created/updated still need to be accessed, their git history/commits can't disappear.

When you switch classrooms you don't want all the events that ever happened in the old classroom to disappear.

This sort of systems are the kinds of systems I've worked with my entire career. Undeletion happens all the time too (employees get rehired, for example).

Most computer systems aren't B2C free social media sites where you CAN just delete anything you want because no data is important.

rubyist5eva3y ago

One thing that I could find in the article: performance.

At least for our use case, soft deletes made everything slower because it's much harder to index. For our database we basically had to do an audit of all of our WHERE clauses and create partial indexes on "not yet deleted" records. Of course, this bloats your indexes/disk and hurts write performance so it's not a silver bullet.

We've also taken to inserting into "delete records tables" for records we may want to recover or for historical reasons. You still lose foreign keys but indexing and query optimization is a lot easier, and your old data is just still a simple query away.

armchairhacker3y ago

Dumb solution: make soft deletes explicit in your backup system.

Your company has a database backup system right? That system should be configured so that when it runs a backup, it will not remove deleted entries from the previous backup, instead just mark them as "deleted_since" the current backup time.

Idk if any backup system actually support this, if there's some glaring problem (like you can't just overwrite parts of a database backup for some reason), or if most companies just don't have backups because they're too expensive (probably not), but this is the solution I would go with. It works for other sorts of data like file systems as well.

joshuanapoli3y ago

Atlassian's deletion-related outage demonstrates why soft deletion should be the default. Use hard deletion after a grace period for data that truly needs to be expunged. Even if undelete is not part of the normal workflow, experience shows that swift recovery from bugs and operator errors is a universal part of serving users. The less data motion involved in deletion (and recovery) the better for both the original deletion process and also any recovery process.

https://news.ycombinator.com/item?id=31015813

lolsal3y ago

In my 20 years of software experience the soft delete is not so often used to undelete something, but more often used to know what has been deleted. If you delete a record from a table, did it ever exist? Can you reference that customer/user/product ever again? Not to mention the one-in-a-million case where a customer had their account erroneously or fraudulently deleted - undeleting saves time/money/bacon when it's needed and is relatively inexpensive to maintain.

dragonwriter3y ago

> The concept behind soft deletion is to make deletion safer, and reversible

IME, as with “updated_by” and “last_modified_at” columns, it's usually hazy audit requirements, not making deletion reversible, that motivates it.

A proper history store maintained by appropriate triggers solves this, and leaves the referential integrity constraints on the base table intact. (It can also be used for reversibility if you need that.)

Views conceptually would work, but then you get bitten by all the ways that all relations are not equal in real-world RDBMSs.

jasonhansel3y ago

It's pretty easy to solve the foreign key issue (where you need to write elaborate DELETE queries to avoid breaking foreign keys) in Postgres using deferrable constraints. Just start a transaction, run "SET CONSTRAINTS ALL DEFERRED," delete rows from various tables in any order, then commit the transaction. The DELETE statements will effectively ignore the foreign key constraints, but any remaining "broken" foreign keys will be caught when the transaction commits.

ThePhysicist3y ago

I don't get what the problem is with cascading deletes. I mean you typically only use them for foreign keys where deletion of the parent object makes the referencing object simply invalid, so there would be no reason to leave the referencing object in the database.

The point that is true is that queries get more complicated as you'll have to add a "WHERE deleted_at IS NULL" to every SELECT (once for each table you refer to), but that can be automated if you use an ORM. A paradigm that I often use is that all objects in the database belong to a role object that determines who can read/write/delete the given object. So before doing anything with an object I always check the role object (e.g. the "user" referencing an "invoice", to stay with the example OP gives), and as part of this I check whether the user object still exists. Alternatively, you could automate most of the required update logic using triggers as well.

But otherwise I agree, soft deletes often don't seem to be a worthwhile tradeoff, not sure if I would use them again when designing a relational schema. They are very useful for auditing and undo though: In a current project, whenever a set of objects gets updated I soft-delete the old versions and create new objects, keeping the UUIDs intact. That allows me to display the entire version history of each object to the user, which can be necessary e.g. for compliance reasons. You can achieve this with an audit log as well but that would require more logic and different queries, whereas querying soft-deleted objects just requires a slight modification of existing queries.

agentultra3y ago

Soft deletion by storing the row in JSON won’t survive months of schema migrations. If restoring a record is rare you don’t want to have to find out that there’s no way to map the old data to the new table when it matters.

There are cases where you shouldn’t be deleting or updating data; auditable and non-repudiation systems for some regulatory compliance come to mind. Best to use patterns that don’t require those operations.

Soft deletion does come at a cost. Choose carefully!

1 more reply

nwah13y ago

If you have a lot of stored procs then the argument makes some sense. If you do most things in code, then I would argue these complaints are moot.

In your code you can isolate all soft-deleting from business logic in the ORM layer or data layer, so the complaint about littering your codebase is moot for me. For instance, using Entity Framework, you can change deletes to soft deletes in a centralized place for all records matching a particular interface, then add a query filter that applies in the background for all queries.

The complaint that soft deleting is never done is maybe valid since you can review things with audit logging or backups without risking unknown effects of an un-delete. But if you need a recycle bin feature then you get that for free if you just build that in from the start, and it is one more guarantee.

The risk of orphaned records is real, although you could probably handle most cases generically in the data layer or ORM as well. It seems like there's just tradeoffs to the various approaches. Do you want to err on the side of deleting data, or on the side of keeping it? Do you worry more about orphaned records or data loss?

viiralvx3y ago

I don't know if I 100% agree with this blog post. Additionally, having foreign key constraints isn't a "catch-all" solution and breaks at scale. There's frameworks like Rails that can still handle these discards for the user via the `dependent` option on the model with some extra code.

At my current employer, we noticed that `acts_as_paranoid`'s default behavior was not what we wanted, so we migrated over to `discard`. We also added a concern that reflects on dependent associations, finds if they are discardable, and discards them if possible. And that cascades down, easing those concerns. This `Discardable` concern is automatically added to every single soft-deletable model and it has been working out great for us.

[1]: https://github.com/jhawthorn/discard

rvr_3y ago

TFA is nonsense. Hard deletes should almost never be used, period. The application credentials should not even have permission to issue delete statements, thus reducing potential damage from bad actors. Things like ON CASCADE DELETE should not even exist. Anyone using them must stop and rethink their life decisions.

kardianos3y ago

This poster misses the point completly. Soft delete is a must have for historical data, where you want to keep history, but keep the current set clean.

Effectively, you don't check for the soft delete flag if you get to it from a an un-deleted record, but you do check for it if you access it the other way around.

spfzero3y ago

I like the deleted-items-table suggestion the author makes. It's useful though, to think about the cases where you'd want to delete, say a customer with existing invoices. In one situation, you may have made a mistake and want to start over, say an operator creates the customer and order, but then the customer changes their mind. In that situation a hard delete is in order; you want to "undo" the _creation_ of the customer and invoice, and nothing further has happened as far as referencing their key

In other situations though, you may have some reason to treat the customer as if they were deleted, but better to examine the reason for that, and use an attribute more relevant to that reason, such as active/inactive etc. Would be different for different entities of course.

mrinterweb3y ago

For audit trails in rails, I still like papertrail. https://github.com/paper-trail-gem/paper_trail. It provides the ability to restore records as well as auditing abilities.

rtpg3y ago

I believe you can get most of the advantages of soft deletion through a notion of archival.

Usually archival does the main thing (“get this out of my main resource list”) without breaking audit trails or resource links. For people for whom this is insufficient, you can of course offer hard deletion.

khaledh3y ago

One reason we encourage keeping soft-deleted records at least for a while is synchronizing data across systems. We want to propagate deletions downstream. At some point when all downstream consumers have caught up, we can purge the soft-deleted records.

wizofaus3y ago

The assumption seems to be that the undelete operation is performed by the vendor's support staff, rather than the end user. I've been involved in the implementation/ maintenance of systems with soft delete that was entirely for that purpose - it allowed the user to delete/undelete at will. In our case it also meant certain uniqueness constraints were kept in place effectively reserving things like email addresses or business registration numbers that couldn't be reused until a hard delete was issued. Arguably it's more like an "is active" flag in such a case, but it's debatable what the distinction is.

AdrianB13y ago

Soft deletes are really worth in the right scenario. There are cases when they can be avoided, cases when they are not worth and cases when they are worth, for the problems presented in the article there are solutions or workarounds.

timomax23y ago

We just have a history table (for each table) where all deleted and past versions of record are stored. Seems to solve all the issues. The history table is NOT part of the application, but is there for audit and diagnostics etc.

ivank3y ago

https://github.com/xocolatl/periods implements SYSTEM VERSIONING for PostgreSQL and moves deleted rows to a history table.

mizzao3y ago

The most famous example is perhaps the recent weeks-long Jira outage, right?

ccleve3y ago

Someday we'll have a database that handles this for us. We'll specify whether a particular table should have an audit trail. The system will know about foreign keys and related tables, and save them as well. Everything will be configurable, of course. Internally, the system will save the relevant data using the write-ahead log. Restoring deleted data will be easy, a simple command. Purging data that should disappear forever will be another command. This is all very possible.

Someday.

I'm embarrassed to admit how many decades I've been waiting for this.

hn_throwaway_993y ago

The deleted records table he mentions at the end is a good approach, but:

1. This can easily be done with a trigger, so that you just call a DELETE on the table and deleted tables are copied to the deletion table automatically.

2. I prefer, instead of having a jsonb column, that each table has a corresponding `deleted_original_table_name` table that exactly matches the schema of the base table, with the addition that the first column is a `deleted_at` timestamp. It's easy to use helper methods in schema migrations to always keep the table definitions in sync.

runeks3y ago

This article touches on something I’ve always wondered: how do I determine whether to add a BOOLEAN column to a table or create a new table instead?

For all tables containing a BOOLEAN column it’s always possible to simply split this table into two separate tables with the same columns, where the name of the table signals whether the factored-out BOOLEAN column would be TRUE or FALSE.

My gut instinct says it’s cleaner to have two separate tables, but I’ve never found a definite answer.

jandrewrogers3y ago

The complexity of soft deletes is that they implicitly introduce the difficult semantics of bi-temporality into the data model, typically without the benefit of a formal specification that minimizes the number of edge cases that have to be dealt with.

Mechanically, I've typically supported soft deletes with audit tables that shadow the primary table, with a bunch of automation in the database to make management mostly automagic. It isn't too bad in PostgreSQL.

sfink3y ago

I don't really have enough experience with this stuff for my opinion to have value, but a lot of the opinions I see here appear to me to be dancing around the real question.

I disagree with the terminology in the article. "Soft deletion" suggests that the complexity is in the "soft" part, and that a common way of implementing it is problematic. I disagree. There isn't some orthogonal "soft vs hard" dimension to a generic concept of deletion. The complexity is in the meaning of deletion.

In accounting, you don't simply delete. Or when you do, you really do, and the two operations aren't the same in any meaningful sense. If you want data to still be available—whether it's for debugging or analysis or auditing or whatever—you should be thinking about the semantics of what you need, and structure your data model accordingly.

The `deleted_at` column approach is a DB design smell if it isn't supporting application logic (where "application" may include auditing or whatever). It works against the DB's mechanisms to maintain data integrity. FKs are just one example.

An example: consider the place where you want to keep historical data, but you're also going to be modifying your schema. If you use a `deleted_at` column, your migrations will start inventing more and more things that were simply not true at the time a deleted row was alive. It will lie to you. It's the same if you move data to a single deleted data table and then migrate that table repeatedly. For maximal semantic purity, you probably ought to leave "deleted" data in a historical table matching the historical schema, and then use views to glue things together for convenience. If you update the live data schema in an incompatible way, you might even be saved by the FK constraints on the archive tables.

But that's a pain, and whether or not it's less pain than the other approaches depends again on what deletion semantics you are targeting. Crossing your fingers and closing your eyes and hoping that your chosen mechanism's semantics are close enough to the semantics you need is going to bite you.

A `deleted_at` column can absolutely be the right solution if your rows have a status that changes over time, and one of the statuses that you're willing to support (with potentially brittle code) is "archived".

mst3y ago

Soft delete has always caused me more trouble than it was worth.

Keeping a deleted recrords table via app code or triggers has always been more trouble than it took to build.

jb36893y ago

Dealing with a separate table is still hard (I know because we do this). What happens when you do a migration or need to shard something and want consistent partitioning across your data? You have to consider your one off table that everyone inevitably forgets about. I agree that a deleted_at column is too big of a liability for compliance reasons though

scifibestfi3y ago

> When I worked at Heroku, we used soft deletion. When I worked at Stripe, we used soft deletion. At my job right now, we use soft deletion.

> As far as I’m aware, never once, in ten plus years, did anyone at any of these places ever actually use soft deletion to undelete something.

That's wild. So it seems the idea of needing undelete is largely an unfounded fear.

pilgrimfff3y ago

All you need is a layer of abstraction to get past the downsides of soft deletion. You can use views or your ORM (if you use one)

In Django, it's really easy to create almost seamless soft deletion logic in the model manager or in your querysets.

Over the last decade, I find myself using soft deletion more and more - usually to accommodate user/client requests.

jaitsu3y ago

Very similar thoughts to an article I wrote back in 2014: https://jameshalsall.co.uk/posts/why-soft-deletes-are-evil-a...

Excuse the dramatic title of the post

jarek833y ago

I wonder how author handles relations that have to stay even when origin needs to be gone. Like in the given example with invoices - they have to stay otherwise your accounting people will visit you quite a lot. Whenever we thought we can do hard delete it almost always proved wrong.

ajuc3y ago

If you need this why reimplement it when you can use database history (dbms_flashback or SELECT AS OF in Oracle)?

dudeinjapan3y ago

At my company, the soft-deleted items in our DB were a source of massive confusion for our data engineers. "How can a row be both deleted and undeleted at once, like Schroedinger's cat?" they puzzled.

We renamed "deleted_at" to "archived_at". And there was much rejoicing.

Smoosh3y ago

DB2 has implemented temporal tables which can automatically capture all changes to the primary table.

https://www.ibm.com/docs/en/db2/10.1.0?topic=tables-history

openthc3y ago

I use a delta-log table, so each INSERT/UPDATE/DELETE on objects I care about are captured (via trigger) -- but that one has to get date partitioned. So in my system a DELETE statement (and DELETE CASCADE) work as expected -- any history has to be discovered from the logs

waspight3y ago

I use soft deletes to maintain insights. For instance I would like to know how many users that has been created in total even if some has been deleted later on. Is this a bad approach? Most of the other comments here seems to use it only to be able to restore deleted entries.

n4jm43y ago

I can't even tell you how much political capital I lost at a major retailer recommending against wasting time implementing soft deletions... on an internal portal that babysat linter configurations.

Don't ask me why the linter configurations weren't simply persisted in git.

msie3y ago

I've used Soft Deletion so many times so I'll say it's been worth it. I believe using an audit table would have made recovery more difficult for me. Anyways take this advice with a grain of salt. It's only one guy's opinion. As is mine.

kleebeesh3y ago

Maybe a more accurate take: Half-assed soft deletion definitely isn't worth it.

If you're just going to throw in some deleted bool or deleted_at timestamp without thorough testing, you might as well just skip it. It's virtually certain to go wrong.

bob10293y ago

If you are going to think about this pattern, why not go one step further and simply event source everything with an append-only, immutable log?

You could even sprinkle cryptographic guarantees into the mix. This would be very challenging to do with mutable DB rows.

krascovict3y ago

If it's the case of deleting files safely, I recommend shared, it's very good...

https://wiki.archlinux.org/title/Securely_wipe_disk

AtNightWeCode3y ago

”The concept behind soft deletion is to make deletion safer, and reversible.” Well, that is one reason. To keep the actual data can be done for many reasons. Audits, reports, laws and so on.

Edit: Deletion is always reversible btw since there are backups.

qxxx3y ago

in one project I was working on, we used a similar version of the 2nd method from the article:

Every table had the same table with _del suffix (eg. users_del). If a record was deleted, it was simply moved to _del table. We used code for this but later we started to use db triggers.

It worked quite well, and yes, there was always someone who wanted to undelete things. One downside was, if the schema changed on the source table, we needed to also change the schema in _del table. I like the approach with storing the data as json. That way there could be only 1 deleted_stuff table because it was looking quite strange having all the _del tables.

wruza3y ago

But with soft deletion, this goes out the window. A customer may be soft deleted with its deleted_at flag set, but we’re now back to being able to forget do the same for its invoices.

What? You do not delete invoices, unless you’re trying to take revenge on your accountant. This is what soft deletion is (partially) for: you don’t want to see Alice in a customer list for some reason, but her invoices are the accomplished fact. You can even visit her card from there, but it is unlisted everywhere else.

Of course that depends on which sense you put into deletion, e.g. you may put obsolete cards into a special group instead and only use deletion to remove data completely with all references. But then deletion is useless, because only a programmer to the bone can imagine deletion of a customer together with all historical (legal) documents they participated in.

outworlder3y ago

I wish Datomic was made open-source (with maybe some features available as an 'enterprise' offering) so that we could actually have a decent alternative for this 'soft-delete' problem.

satyrnein3y ago

We switched a lot of tables to soft deletes so we could replicate those deletes into our data warehouse. You can also use bin log replication for hard deletes, but every schema change would break it.

duxup3y ago

> All our selects look something like this: SELECT * FROM customer WHERE id = @id AND deleted_at IS NULL;

Solution… a whole other table of deleted stuff… in a new structure.

Man soft deletes just look better to my eye.

dcdc1233y ago

If you are using a state manager with models in something like rails/django/etc then it is trivial to support soft deletion without it infecting your entire code base.

pierrebai3y ago

The author claims pruning soft-deleted entries requires a complex query, but hard-deleting an entry would have required the same complexity. So it's really not an argument.

mmmuhd3y ago

I remember when a rouge employee of a client went ahead to do stupid deletions on students' and staff data, soft delete saved the day and made us some money.

magundu3y ago

We use soft deletion by moving all related rows into different archive database which will be cleaned for 60 days older entries.

For accidental delete, we will undelete from archive.

jtwebman3y ago

The bigger reason to use soft deletes is to keep history. Just because someone does not access doesn't mean we should report on the things they did months ago.

encoderer3y ago

Even if you don’t “undelete” something, soft deletes make it possible to instantly hide something while saving the expensive sql delete for processing later.

BatteryMountain3y ago

The purpose of soft deleting is not to be reversible...that's just a free side effect if you really need it.

galaxyLogic3y ago

Couldn't soft deletion be happening behinds the scenes by the database engine?

Then have a statement like RESTORE * FROM ...

sam_lowry_3y ago

I do not understand the foreign keys issue. Do not use the deleted_at timestamp that is nullable by default. Instead, nullify the field when the line is deleted. Foreign keys on NULL values will be possible.

In any case, soft deletion is usually a sign of incompetence. Whenever I saw it on a project, both soft deletion and the project turned sour.

tgbugs3y ago

One use case that I think is not sufficiently considered in this is related to two comments I made about a year ago [0, 1].

If you can _actually_ delete something, then that means that a malicious actor can fabricate data an claim that you deleted it. GDPR may be well intentioned but systems that have the ability to remove any record of a thing lay the groundwork for systematic fabrication of data, because any record of the past has been erased.

Operationally, I can totally see why soft delete might be considered to be problematic in certain cases, but from an information security point of view I think it is absolutely critic for protecting users against a whole class of attacks.

0. https://news.ycombinator.com/item?id=27249738 1. https://news.ycombinator.com/item?id=27691442

jacobsenscott3y ago

Deletion is never worth it full stop. How do you delete from a backup? You can't delete all your backups. Effective dates and app level encryption to allow for cryptographic "deletes" is the way to go.

whoomp123423y ago

good idea but only if you dont use foreign keys. If you do use foreign keys, then you must create custom deletion logic for each relationship. Yuck!

revskill3y ago

In realworld, there's no concept as deletion from DB !

There's only deactivate account, archive a legacy product,...

Because there's no such thing as delete something from real world.

Pakdef3y ago

not worth it for short term profits... which is why most of today's internet will disappear

jacksnipe3y ago

The ONLY reason that you should avoid soft deletion is that deleting things permanently in a soft-deletion-based system is hard and error prone.

GDPR, among other regulations, requires that you be able to do this sometimes; and it requires that the data REALLY BE GONE.

But I really think that soft deletion should be the default unless you think you’ll be fielding user data deletion requests.

amerine3y ago

Brandur will know what I mean when I say, it’s always worked out in our favor at Heroku to soft-delete logicals, but hard-delete physicals. It’s not hard to remember to append a “where deleted_at is null” to some sql, or build into higher order UI’s.

However, GDPR/customer data demands across regimes makes me agree with him and would suggests folks listen. <3

kache_3y ago

wait until this guy finds out about financial regulations

marginalia_nu3y ago

Including "deleted_at IS NULL" is surely something that you'd solve using a view, rather than explicitly entering it into the queries.

GDPR is the big thing to consider, I think.

pgt3y ago

XTDB: xtdb.com

gigatexal3y ago

There’s a lot wrong with this write up. Why would anyone want to delete corresponding invoices when you “delete” the corresponding user? And GDPR provides a caveat that if you need the data for a biz usecase like legacy reporting you can keep the data (I think it has to be masked or something but it’s not insane to say you must delete data on request that could materially affect a company like removing transactions).

Just put a filtered index on the column to better query non deleted data.

On the whole I don’t think in practice the author’s take makes much sense.

OOPMan3y ago

Nice way to get on HN.

Post a daft hot take.

kleer0013y ago

Yea, it is.

rzwitserloot3y ago

> But the technique has some major downsides. The first is that soft deletion logic bleeds out into all parts of your code. All our selects look something like this:

A view solves that problem. Make a view that only has the non-deleted stuff. Give it ON UPDATE and ON INSERT triggers so that it walks, talks, swims, and quacks like a table. Voila, no more code bleeding.

> Another consequence of soft deletion is that foreign keys are effectively lost.

Bit trickier, you have a few options:

* Make the constraint include that the foreign object has `deleted_at IS NULL`.

* Add a trigger that automatically marks as deleted (I'm assuming a setup where you'd ordinarily use ON DELETE CASCADE) anything that refs row X when you mark row X as deleted. If you prefer commit failure ON DELETE instead, triggers can do that too.

> GDPR scaremongering

You don't have to delete records from backup tapes either (SOURCE: I read the whole thing). Using soft delete is actually making life easier for you - you presumably _do_ have certain data storage requirements (for audit trails and the like), and now you can just have the one canonical database that contains it all. When its time to prune an entire customer/user into oblivion, it's simpler to do that then - just `DELETE` the right rows away (actual DELETE, not UPDATE SET deleted_at).

Yes, GDPR has something to say about keeping data around where you have no feasible auditing or any other reason to have it, but that's a red herring: You don't want your database tables to grow humongous with 99% of the rows 'deleted'. That view with some indexes can do a lot but it isn't magic. Presumably you want a cleanup task that, every month or so, DELETEs anything with a deleted_at value that's older than a month or what not. This fully takes care of your GDPR requirements as far as unreasonable data retention goes: A script automatically runs to wipe out all rows in all tables whose deleted_at is too long ago, and then reports that it did this so that you have the audit trail.

So, for your requirement to delete specific records upon request, it's easier. For your requirement to not keep unneccessary data around beyond reasonable bounds, it's a simple script.

> Here’s a snippet from one that I wrote recently which keeps all foreign keys satisfied by removing everything as part of a single operation

If you set your constraints explicitly to checking only at the end of a commit this isn't at all difficult the way the author says it is. Just delete what you wanna delete, commit at the end, and poof - all is well. You can force postgres specifically into a 'yeah yeah do not check any constraints until I commit' mode if you don't want to change your ON DELETE clauses.

> data deletion has non-data sideeffects

That depends on the use case. It feels like a bit of a strawman argument - obviously if the delete action does irreversible damage, marking the database row using a soft delete is rather silly. Of course. Most delete operations are nothing like that though.

> Alternative: A deleted records table

Author's previous point about non-data sideeffects kills this just as badly. But, sure, this isn't a bad idea. However, most of the complaints about soft deletion apply in a different fashion to this model. For example, if you have reference constraints, and using ON DELETE CASCADE, you need to do a heck of a lot of copying. You don't just 'copy' the row you want to delete, you also have to copy every row of every table that refs your table with ODCascade constraints to its 'deleted' variant first, and only then can you delete the lot.

> Hard deleting old records for regulatory requirements gets really, really easy: DELETE FROM deleted_record WHERE deleted_at < now() - '1 year'::interval.

It is _exactly_ as simple to do this if you use soft-delete. Bit of an own goal.

Author's got the right idea (soft delete needs some thought), but the technical aspects are a swing and a miss, I think. However, some database make some of these solutions hard. As far as I remember, they don't all support ON UPDATE/ON INSERT rules on views, for example. Fortunately, postgres supports all of this stuff.

2022062412033y ago

It's something that a team of a PM, a QA and two developers can bill for at least a sprint. So, well worth it.

nikanj3y ago

”Here’s my argument for why airbags are useless: In my 15 years of driving, I haven’t needed them once”

Remember the Attlassian outage from earlier this year. They sure would have appreciated a soft delete

j / k navigate · click thread line to collapse

494 comments

241 comments · 108 top-level

JohnBooty3y ago· 48 in thread

I've been a software dev since the 90s and at this point, I've learned to basically do things like audit trails and soft deletion by default, unless there's some reason not to.

Soft deletion has obvious drawbacks but is usually far less work than implementing equivalent functionality out-of-stream, with verbose logging or some such.

Retrofitting your app and adding soft deletion and audit trails after the fact is usually an order of magnitude more work. Can always add it pre-launch and leave it turned off.

gmiller1234563y ago

Though you really shouldn't be relying on a database for an audit trail. It might help find some issues, but things actually used for security shouldn't be writable so easily.

JohnBooty3y ago

    I think this is the part they miss. I've never 
    undeleted a user either, but there have been many 
    times I've gone back to look at something.

Yeah. As far as a user-facing "Undelete" button existing or being used... that's very rare in my experience.

Alternately, maybe it was the app's fault. Still plays out nearly the same!

Soft deletes and/or audit trails save you from all of that.

    Though you really shouldn't be relying on a 
    database for an audit trail. It might help 
    find some issues, but things actually used 
    for security shouldn't be writable so easily.

I mean, at some level you need to trust the database right?

How would you set up a secure audit trail that didn't rely on the application and/or database at some level? Even if it lives outside of the database, that data came from the database.

Not a rhetorical question. Genuinely curious!

5 more replies

coldtea3y ago

>I've never undeleted a user either, but there have been many times I've gone back to look at something.

I, for one, have undeleted things tons of times, taking them of the trash can before emptying it, undoing the delete action (in apps where this is possible), and so on.

Akronymus3y ago

I messed up a mass update query enough times to leave myself SOME provision to undo it.

The exceptions are when there is a well tested query that affects a single account or something. Like GDPR

pustan3y ago

>The author uses the "no one ever undeleted anything" as the primary justification. I think this is the part they miss.

But did they though?

alerighi3y ago

You cannot just keep user information forever "just in case" they are useful again.

18 more replies

pradn3y ago

Also note, if you use a soft-deleted column, indexes need to be keyed by that column as well if you want to access non-deleted objects quickly. That's extra complexity.

mjevans3y ago

Even more important; the deleted records don't need to live in your cache / RAM / etc. Potentially faster queries.

JohnBooty3y ago

    but that is less of a cost than having to add "deleted=False" 
    predicates in all of your queries.

Maybe or maybe not. You can use a view. Or you may be using an ORM that lets you set a default scope (essentially, a default WHERE clause - ActiveRecord lets you do this)

It also depends on if you're designing an app for this from the ground up or if you're trying to retrofit an existing app with 90 million different hardcoded queries.

To be clear, I don't hate the "another table" solution. It's the right choice in a lot of situations IMO.

Gordonjcp3y ago

> but that is less of a cost than having to add "deleted=False" predicates in all of your queries.

It's like people have forgotten what views are.

1 more reply

smackeyacky3y ago

3 more replies

efsavage3y ago

Also if people know that deletion is reversible, they're more likely to actually do it, which can keep things generally tidier.

JohnBooty3y ago

For some applications this is fine, depending on your app/business logic but for a lot of applications states like active/pending/suspended and "deleted" are not mutually exclusive.

Suppose I soft-delete an active, pending, or suspended user using your scheme.

Now I need to un-delete the user. What status should they have? We don't know.

robertlagrant3y ago

2 more replies

rjzzleep3y ago

In rails you get these things for free. What I don't get is why everyone rolls their own framework with node.js. It's basically 90s PHP all over again.

joshmanders3y ago

I don't know why you're ragging on Node.js users or even PHP for that matter as both ecosystems have this stuff covered too.

Also you're comparing language/runtime with an actual framework and then dogging those users...

If you want to compare Rails with Node/PHP then I'd suggest comparing with things like Laravel (PHP), Adonis (Node) and you'll find everything you can do in Rails is done in Node/PHP too.

3 more replies

sky_rw3y ago

Nothing is free. In Rails you have _currently well maintained libraries_ for this. There are still complexity costs, dependency costs, data costs, performance cots, etc, etc, etc.

tomschlick3y ago

> It's basically 90s PHP all over again.

Which is funny because Laravel give you this for free as well through it's ORM. Soft deletes are an easily solved problem with $table->softDeletes() in your migration.

1 more reply

jesseryoung3y ago

pwm3y ago

> Likewise, it's not an audit trail it's a "history" or "undo".

Depends on the industry. The one I work in audit trail is a well-defined and mandatory business concern.

augustl3y ago

Scarbutt3y ago

tsuujin3y ago

It’s hard to get adoption for expensive toys.

They’re really shooting themselves in the foot by not having a one-click install free tier or a self hosted option.

1 more reply

nightski3y ago

Personally looking at their pricing since it is so tied to AWS it is completely non-transparent how much it's going to cost us now or in the future.

I really like the concept of datomic though.

rjbwork3y ago

I can't wait until Postgres has this kind of functionality baked in. It's such a nice feature.

1 more reply

th0ma53y ago

For me it is simply that isn't open source (at least last I checked.)

synthc3y ago

Datomic is great but i think its missing some features that many enterprises need (access control and a query planner). Also it seems to be mostly built for DynamoDB/AWS.

wnevets3y ago

wst_3y ago

Isn't it the problem of UI, though. If the user would be informed about the consequences (possibly with bold red font and with a confirmation checkbox) would they still click that button?

6 more replies

corrral3y ago

JohnBooty3y ago

I personally would not agree.

But... your experiences are also real and I believe you when you say that your experiences DO support your conclusion. :)

The "soft delete" design decision is usually pretty impactful and is in my experience potentially much more of a pain in the butt to implement later if you haven't included it from day 0.

Audit trails and soft-deletes are also crazy useful for developers (both for debugging and for general cover-thine-ass utility) even if end users never touch them.

Whereas, tags are easier to tack on later and are not intrinsically useful to developers.

2 more replies

nicoburns3y ago

aboodman3y ago

  > Somebody always wants to undelete something
  > or examine it to see why it was deleted
  > or see who changed something, or blah blah blah

People for whom this resonates should look into Dolt: https://dolthub.com/.

There is a lot more power that comes with making the entire database versioned and forkable by default. For example it makes it much easier to recover from catastrophic bad code pushes, etc.

note: Dolt was forked from Noms, my previous project, but I don't work for Dolt or have a stake. Just a fan.

makeitdouble3y ago

> unless there's some reason not to.

Yes.

JohnBooty3y ago

    My lesson from that was to at least have one test 
    deleting a mock user that spans the maximum data 
    breadth of the service . We caught a bunch of these 
    loops in test at dev time and that was pretty great.

Thank you. That is SUPER insightful. If it wasn't too late to edit my initial post I would add this.

Falkon13133y ago

Good point. That could be tricky when it crosses the boundaries of schema changes and data migrations.

cal853y ago

> Somebody always wants to undelete something, or examine it to see why it was deleted, or see who changed something, or blah blah blah.

In my experience this happens “rarely”, not “always”.

nivertech3y ago

> That’s why we have protective measures like soft deleting and event sourcing

IMO soft deletion is a hack trying to fix problems in CRUD, which is a hack in itself.

CRUD attempts to model everything in the universe as a collections of mutable items, loosely based on RDBMS/SQL.

Event Sourcing is more realistic: it models everything as a append-only logs of immutable events/facts, which preserves both the historical data and, more importantly, the original intent.

Unlike CRUD, Event Sourcing is technology agnostic.

femiagbabiaka3y ago

dcow3y ago

What's wrong with the author's proposed compromise/solution?

ReptileMan3y ago

The proper thing is database snapshots or something like that.

I don't think soft delete is wrong per se, but it is something that should be native to the database engine.

cpursley3y ago

Can you recommend any best practices, design patterns, etc - for implementing soft delete in SQL systems?

tomschlick3y ago

The biggest one is to use a timestamp instead of a boolean. For instance Laravel used `deleted_at` as the field name with the type of TIMESTAMP (for mysql at least).

This allows you to do a `select * from users where deleted_at is null` to get the active records, but also know when the user was deleted if you need to audit / rollback.

JohnBooty3y ago

I think the best answer is to optimize for the expected data and use cases.

- How common is deletion? Are we expecting 1% of the records to be soft-deleted, or more like 90%? If it's the latter, you may not want all those records clogging up your main table.

brightball3y ago

Agreed. The key to soft deletes is to actually move the record out of the original table to ensure they don’t accidentally end up in a query join.

But there’s always something that needs to be undeleted. You can either have an easy way to do it or restore an entire DB backup and cross query the missing records. Soft deletes are a lot easier.

4m1rk3y ago

It's audit log enough for that? As deletion should also be logged.

xxs3y ago

GDPR compliance and all - we encrypt and delete stuff... or at the very least "lose" encryption keys 1st.

pg_12343y ago

This 100%

dafelst3y ago· 43 in thread

pbardea3y ago

> assuming the deletes are done appropriately

If the goal is to keep a history of all records for compliance reasons or "just in case", I tend to prefer a CDC stream into a separate historical system of record.

dspillett3y ago

> Since I can no longer rely on ON DELETE CASCADE relationships

> I tend to prefer a CDC stream into a separate historical system of record.

This is similar, though with system versioned tables you are pretty much always keeping the history/audit in the same DB.

---

1 more reply

wvenable3y ago

1 more reply

dexwiz3y ago

danielrhodes3y ago

4 more replies

munk-a3y ago

1 more reply

kukx3y ago

Is not there any attempt to improve the soft deletion at the engine/SQL level? I can see it as a possible feature request.

4 more replies

dragonwriter3y ago

> This is one gripe I have with soft-deletion. Since I can no longer rely on ON DELETE CASCADE relationships

semiquaver3y ago

(this is all postgres specific, it may be untrue for other systems)

enepture3y ago

As an FYI using a tool like DBT solves this problem. As someone who was not a data engineer I was not familiar, there were tools like this

4 more replies

rodw3y ago

smallnamespace3y ago

To avoid an outage, have you tried fronting the matview with an additional view to allow hot-swapping?

slt20213y ago

it is very dangerous to have dependency on materialized view - it is a poor architectural decision from DBA to do that.

if you want view depending on mat view - materialize it yourself in a table, and refresh it yourself controllably.

1 more reply

rodw3y ago

Seriously. That "Downsides: Code leakage" point is nonsensical.

``` CREATE OR REPLACE VIEW active_customer AS SELECT * FROM customer WHERE deleted_at IS NULL OR deleted_at <= NOW() ; ```

There, I fixed it.

Just use `active_customer` instead of `customer ... deleted_at IS NULL`.

In fact, since the deleted_at column is a timestamp, the original "leakage" query:

``` SELECT * FROM customer WHERE id = @id AND deleted_at IS NULL; ```

is actually broken. A non-null `deleted_at` timestamp that's in the future implies the record hasn't been deleted yet, right?

> I feel like a lot of developers underutilize the capabilities of the massively advanced database engines they code against

djur3y ago

1 more reply

doctor_eval3y ago

> the database is part of the application

In my entire career I've changed backend databases for an application exactly twice. It's not easy, and no amount of abstraction is likely to make it easier.

1 more reply

madisp3y ago

2 more replies

aeyes3y ago

At least in Postgres, having a huge amount of "dead" data in large tables is problematic because vacuum always has to read the full data set.

You can't beat a separate insert only archive table which you never have to vacuum.

paulmd3y ago

* hot and cold do not churn each others indexes or tables, you effectively have only one set of indexes (and data) that's actually churning and the others are stable.

(note that for audit logging, I think it's simpler to just do the separate table. But the partition strategy does have some advantages for "cold" or "dead" data as a more generic concern imo)

anarazel3y ago

However, indexes do currently have to be scanned as a whole. But that's only done by autovacuum if there's enough row versions for that to be worth it (in recent versions).

layer83y ago

Shouldn’t partitioning help with that? (I have no experience with Postgres.)

1 more reply

jaydub3y ago

SoftTalker3y ago

remram3y ago

It seems Oracle does it although there is a special syntax to opt-in. That seems wild. I am not aware of another DBMS having that limitation though.

1 more reply

firloop3y ago

dafelst3y ago

IIRC Postgres has supported predicate push down on trivial views like this for over a decade now, and possibly even more complex views these days (I haven't kept up with the latest greatest changes).

1 more reply

AdrianB13y ago

There is no impact with views in MS SQL. You can also have indexed views and filtered indexes, so you can have even better performance.

5e92cb50239222b3y ago

It couples nicely with some hackery which turns removes into soft-deletes. You can remove objects as usual and they get soft-deleted in the database.

I've used this in a few projects and it's fantastic.

https://docs.microsoft.com/en-us/ef/core/querying/filters

https://www.thereformedprogrammer.net/ef-core-in-depth-soft-...

7crow3y ago

> there are fairly well developed techniques for keeping DB and app states, logic and schemas aligned via migrations and partitioning and whatnot.

Hi, <1 yr experience swe here. Would HN mind unpacking "whatnot" with specific names of some these techniques?

OJFord3y ago

You could have views that say 'thing I have a foreign key to is not deleted' of course, but that sort of seems like 'code leakage' again, just in SQL this time.

ibejoeb3y ago

> developers underutilize the capabilities of the massively advanced database engines

So true. There are so many amazing, powerful features in all of the major players.

layer83y ago

How does this help with foreign keys? Normally you can’t have foreign keys referencing a view.

I agree that one should make use of RDBMS capabilities. A check constraint may be practical instead of (or in addition to) the foreign-key constraint.

nousermane3y ago

> (...updatable view...) WHERE deleted_at IS NULL

This is the way. Also, save record creation timestamp, and you can have very flexible "time-machine" selects/views of your table essentially for free.

moggers3y ago

What's an appropriate naming convention?

airstrike3y ago

Views are such a powerful concept I’m honestly disheartened by how hard it is to use, replicate or leverage that functionality outside of dropping straight into the db shell

quickthrower23y ago

See http://materialize.com

zozbot2343y ago

coding1233y ago

quickthrower23y ago

Where is this anti-fb culture? Is it a startup thing?

Everywhere I have worked people know a decent amount about their data store. Not architects, just mid devs and higher.

1 more reply

vlunkr3y ago

That doesn't solve the foreign key problem. You can still easily have a reference to a record that is "deleted"

yen2233y ago

Also in Postgres, you cannot have a foreign key constraint that references a view, not even a materialised view.

I'm with the author on this one. Any soft delete logic does have a tendency to bleed into other systems and make your systems more complicated, for very little gain.

chrisshroba3y ago

1 more reply

dimgl3y ago

100% this. And yes, with inner joins it also solves the relationship issues.

giantg23y ago· 10 in thread

"The concept behind soft deletion is to make deletion safer, and reversible."

That's one part. The other part is that in many industries you have regulatory data retention and audit requirements. This is arguably the most valuable and common reason to perform Logical deletes.

brtkdotse3y ago

In banking and bookkeeping, there’s no such thing as a “delete”. Once something is in the ledger you can’t undo it - you have to make a new entry that negates the old one.

4 more replies

jewayne3y ago

I would argue that in many cases the concept behind soft deletion is to make deletion permanent.

2 more replies

nirvdrum3y ago

3 more replies

silisili3y ago

Why not just use an audit table, to keep from littering your indices?

2 more replies

MonkeyMalarky3y ago

jiggywiggy3y ago

Ha, and then there is the opposite regulation that you have to delete user data.

3 more replies

taeric3y ago

This is almost certainly going to bite you if you don't push all customer identification data out of your main data stores.

necovek3y ago

And yet another part is making deletes (appear) instantaneous: useful when it involves cleaning up a bunch of "related" data possibly living on different services (eg. S3, ES...).

This also helps with the original goal of making them safer by manually implementing "eventual consistency" for data living outside the transactional world.

1 more reply

dexwiz3y ago

1 more reply

yen2233y ago

Nowadays, regulations goes both ways. Data sovereignty regulations and GDPR-like laws can mandate that data must get hard-deleted.

dfee3y ago· 6 in thread

My experience is that soft-deletes are blunt tools bridging the gap between hard deletes and event sourcing (capturing all the changes against the table, in a replay-worthy stream).

rodelrod3y ago

redavni3y ago

If accidentally writing the wrong query is a problem, then writing the wrong query is your problem.

btown3y ago

zozbot2343y ago

An event store is just a special case of a temporal database. The whole point of temporal databases is to natively support the notion of historical vs. current data.

Terr_3y ago

> My experience is that soft-deletes are blunt tools bridging the gap between hard deletes and event sourcing

Agreed, sometimes it makes business-sense to implement it, but in the big picture it's still kludgy and not-ideal.

vbezhenar3y ago

You can restore to any point of time from your database backup. So it can cover some requirements.

danielrhodes3y ago· 3 in thread

pradn3y ago

1 more reply

foolfoolz3y ago

latchkey3y ago

That sounds more like a lack of backups and disaster recovery than it does soft deletes.

jelkand3y ago· 3 in thread

For context, I've worked in fintech where I often needed to review backoffice approvals, transactions, offers, etc.

hinkley3y ago

krstf133y ago

chomp3y ago

1 more reply

Gurgler3y ago· 2 in thread

unemployable3y ago

dubswithus3y ago

There are Rails gems that can handle this in various ways.

But the easiest way is to deactivate the user account (is_active boolean) and continue to reference the user in internal records.

1 more reply

munk-a3y ago· 2 in thread

I just wanted to touch on the fact that eliding soft-deleted rows from queries is really, really easy - this article makes it out to be a constant headache but here's my suggested approach.

    ALTER TABLE blah ADD COLUMN deleted_at NULL TIMESTAMP;
    ALTER TABLE blah RENAME TO blahwithdeleted;
    CREATE VIEW blah (SELECT * FROM blahwithdeleted WHERE deleted_at IS NULL);

bjourne3y ago

The author is right; soft-deletes are probably most definitely not worth it. There are many better ways to solve the problem.

5 more replies

yladiz3y ago

But, assuming you don't really need the data, why make your queries more complex and instead just actually delete the data?

2 more replies

GartzenDeHaes3y ago· 2 in thread

munk-a3y ago

So if an account was active in your system and is active no longer... do you soft delete it (even if that means UPDATE ... SET active = 'f') or hard delete it?

1 more reply

znpy3y ago

Your taste in database design is probably nor gdpr compliant, i hope you don’t work in the eu.

2 more replies

adrianmsmith3y ago· 2 in thread

> Instead of keeping deleted data in the same tables from which it was deleted from, there can be a new relation specifically for storing all deleted data

Terr_3y ago

> The article asserts you'll never need to "undelete" the data.

layer83y ago

> now have to do it in two different ways

Use a view.

> if there are any unique constraints e.g. on username or email address

Have those in a dedicated table where they aren’t deleted, and add a synthetic key referenced by the other tables.

scott_w3y ago· 1 in thread

The example the author gives is… frankly awful.

Using a soft delete, your invoices won’t “disappear” because your app WILL have a view for looking at just the invoices.

Source: I built a bookkeeping system and soft deletes is a necessary feature.

decebalus13y ago

> I can’t think of a single case where you’d want to remove the invoices of a customer you delete. Ever.

But I don't really agree with the author on his take about soft deletions.

1 more reply

vyrotek3y ago· 1 in thread

I've found SQL Server Temporal Tables are a good alternative to get the benefits of soft-deletes without some of the drawbacks.

https://docs.microsoft.com/en-us/sql/relational-databases/ta...

tfigment3y ago

Mysql also has this now. I've wanted to rewrite out apps to use it but haven't gotten around to it. Postgres has it as an addon but feels like it wouldn't work for us until its first class supported.

2 more replies

habibur3y ago· 1 in thread

Which is why I don't add that extra deleted field. Rather duplicate all the tables into a new database called "archive" and then insert there before deleting from main.

tehbeard3y ago

How do you account for maintaining integrity in the archive?

E.g. you have 3 users sign up with the same email (a unique field) one after the other with deletions in-between each sign-up?

2 more replies

tehbeard3y ago· 1 in thread

I don't get this statement. You wouldn't have had the env or data without soft delete? You did use it!

I would say, soft delete isn't a tick the box and done solution as many ORMs make it.

You need to consider the data model, and adjust your queries to that.

It may make sense for a product to be deleted, but orderlines still able to access it to display product name etc.

Reuse of unique fields is the sticking point I run into often, as mysql interprets null as not clashing with other nulls so composite uniques using the ID and deletion date don't work.

TylerE3y ago

> mysql interprets null as not clashing with other nulls

Which is correct per SQL. Null is NaN, not zero (or negative infinity).

whack3y ago· 1 in thread

We use soft-deletes extensively at our startup. Here's a couple reasons:

- It makes debugging easier. You have a clear record of everything that used to exist. You don't have to go digging through your logs to find something that used to exist but has now been deleted

waspight3y ago

Pxtl3y ago· 1 in thread

Trivial case I hit:

1) Client wants to remove user from the system who have left their org but

2) There are objects that were contributed by that user which are required to persist beyond the user's deletion.

Those are ideal cases for soft deletion. We can still query information about the deleted user to explain who created this object, with the note that their account has been deleted.

And unlike the article author, I have used soft deletion to undelete things. Many times. Maybe he has better users than I do, I don't know.

mixmastamyk3y ago

Another way to handle is to set each obj.user to a “Deleted User” record.

muhaaa3y ago· 1 in thread

If disk gets to full, project the latest time slice into a new database and move the old database onto a cold storage.

hu33y ago

MariaDB as well: https://mariadb.com/kb/en/system-versioned-tables/

justin_oaks3y ago· 1 in thread

I've probably restored data from backup maybe 4 times in my career. I greatly prefer to do this on the rare one-off scenario than to deal with the overhead of soft deleting everything.

marcosdumay3y ago

The difference in framing one gets by looking around is amazing, even funny.

> I've probably restored data from backup maybe 4 times in my career.

Yet, I often use soft-deletes because it allows people to undelete things from the software interface and not call me all day long.

But that's not the most common reason I have for them. Normally it is because the data just can not be gone, and the full table is still important somewhere.

justin_oaks3y ago· 1 in thread

For those who are expressing favor with soft deletes, do you default to soft deletes on every table unless you know you won't need them? Or do you only apply them where you know you'll need them?

I think people arguing for and against soft deletes both understand that there are cases where you want to use them and when you don't.

baq3y ago

soft delete everywhere by default. true deletes only after retention policy expires, if FK constraints allow it (best if you can drop whole partitions).

codemac3y ago· 1 in thread

Well, there are several problems with this analysis when you go very large (>10000 machines):

mixmastamyk3y ago

Advice should be aimed at the 99% rather than the 1%, right? I guess Heroku and Stripe don’t have the biggest datasets in the world but they are probably larger than most folks will need to manage.

1 more reply

llimos3y ago· 1 in thread

Do any databases let you refer to constant values in foreign keys?

Then you could do

  FOREIGN KEY (foreign_id, NULL) REFERENCES foreign_table(id, deleted_at)

munk-a3y ago

matusp3y ago· 1 in thread

The "Code leakage" problem can easily be solved by using views. Or am I missing something?

dboreham3y ago

Solved with "...and deleted = false"

lowercased3y ago

But... weren't you using all those env and data info from the soft-deleted set?

People are used to 'undo' and 'undelete'. Soft-deletes are one way to provide that functionality for some projects.

vivegi3y ago

Once your main tables start getting to the order of tens of millions of records, the filtering by 'deleted_at is NULL' or 'deleted_at is NOT NULL' gets in the way of query performance.

NULL is also not indexed. So, that throws the spanner in the works sometimes (depending on the query).

deerIRL3y ago

As someone who has done development work with Class A data and specifically in the realm of justice, soft deletes aren't simply a good idea, they are required by law.

Most of these downsides are easily mitigatable issues as well. As many users have stated, something like views solves the issue of forgetting the 'deleted' clause.

willlll3y ago

jonstaab3y ago

Source: I write back of house software for resale store owners, and accidental deletes happen occasionally. Being able to restore things instills a lot of confidence for our customers.

radu_floricica3y ago

It's surprisingly manageable. I mean, yes, it's definitely the largest table in the db, but:

1. it's well worth it

2. most of the stuff in it isn't the main scenario above (a human does something and I record the change) but various automated processes I also want to track, like API calls. which leads to:

3. it's easy to prune - both in time period kept, and by selectively deleting the automated stuff earlier

phibz3y ago

For foreign key constraints you can set the foreign key to null and orphan the records if the relation is deleted. You could also hard delete them in this case. It depend on your use case.

You may very well find others interested in consuming your audit journal to track changes. Updates and even inserts fit great in the more general case.

Apreche3y ago

Either way, it allows you to have soft deletion and/or full activity logging functionality without the application having to know about it.

Ensorceled3y ago

I use soft deletes in our system and literally used it to restore an accidentally deleted item about 3 hours ago. Took a second to toggle the deleted item.

Minor49er3y ago

dunkelheit3y ago

So yeah, be careful with the soft-delete pattern :)

unemployable3y ago

astura3y ago

>so you can be left with your customer being “deleted”, but its invoices still live.

This not a problem, its is almost always what's desired, otherwise you have no records for, for example, the tax auditor.

When you switch classrooms you don't want all the events that ever happened in the old classroom to disappear.

This sort of systems are the kinds of systems I've worked with my entire career. Undeletion happens all the time too (employees get rehired, for example).

Most computer systems aren't B2C free social media sites where you CAN just delete anything you want because no data is important.

rubyist5eva3y ago

One thing that I could find in the article: performance.

armchairhacker3y ago

Dumb solution: make soft deletes explicit in your backup system.

joshuanapoli3y ago

https://news.ycombinator.com/item?id=31015813

lolsal3y ago

dragonwriter3y ago

> The concept behind soft deletion is to make deletion safer, and reversible

IME, as with “updated_by” and “last_modified_at” columns, it's usually hazy audit requirements, not making deletion reversible, that motivates it.

Views conceptually would work, but then you get bitten by all the ways that all relations are not equal in real-world RDBMSs.

jasonhansel3y ago

ThePhysicist3y ago

agentultra3y ago

Soft deletion does come at a cost. Choose carefully!

1 more reply

nwah13y ago

If you have a lot of stored procs then the argument makes some sense. If you do most things in code, then I would argue these complaints are moot.

viiralvx3y ago

[1]: https://github.com/jhawthorn/discard

rvr_3y ago

kardianos3y ago

This poster misses the point completly. Soft delete is a must have for historical data, where you want to keep history, but keep the current set clean.

Effectively, you don't check for the soft delete flag if you get to it from a an un-deleted record, but you do check for it if you access it the other way around.

spfzero3y ago

mrinterweb3y ago

For audit trails in rails, I still like papertrail. https://github.com/paper-trail-gem/paper_trail. It provides the ability to restore records as well as auditing abilities.

rtpg3y ago

I believe you can get most of the advantages of soft deletion through a notion of archival.

khaledh3y ago

wizofaus3y ago

AdrianB13y ago

timomax23y ago

ivank3y ago

https://github.com/xocolatl/periods implements SYSTEM VERSIONING for PostgreSQL and moves deleted rows to a history table.

mizzao3y ago

The most famous example is perhaps the recent weeks-long Jira outage, right?

ccleve3y ago

Someday.

I'm embarrassed to admit how many decades I've been waiting for this.

hn_throwaway_993y ago

The deleted records table he mentions at the end is a good approach, but:

1. This can easily be done with a trigger, so that you just call a DELETE on the table and deleted tables are copied to the deletion table automatically.

runeks3y ago

This article touches on something I’ve always wondered: how do I determine whether to add a BOOLEAN column to a table or create a new table instead?

My gut instinct says it’s cleaner to have two separate tables, but I’ve never found a definite answer.

jandrewrogers3y ago

sfink3y ago

I don't really have enough experience with this stuff for my opinion to have value, but a lot of the opinions I see here appear to me to be dancing around the real question.

mst3y ago

Soft delete has always caused me more trouble than it was worth.

Keeping a deleted recrords table via app code or triggers has always been more trouble than it took to build.

jb36893y ago

scifibestfi3y ago

> When I worked at Heroku, we used soft deletion. When I worked at Stripe, we used soft deletion. At my job right now, we use soft deletion.

> As far as I’m aware, never once, in ten plus years, did anyone at any of these places ever actually use soft deletion to undelete something.

That's wild. So it seems the idea of needing undelete is largely an unfounded fear.

pilgrimfff3y ago

All you need is a layer of abstraction to get past the downsides of soft deletion. You can use views or your ORM (if you use one)

In Django, it's really easy to create almost seamless soft deletion logic in the model manager or in your querysets.

Over the last decade, I find myself using soft deletion more and more - usually to accommodate user/client requests.

jaitsu3y ago

Very similar thoughts to an article I wrote back in 2014: https://jameshalsall.co.uk/posts/why-soft-deletes-are-evil-a...

Excuse the dramatic title of the post

jarek833y ago

ajuc3y ago

If you need this why reimplement it when you can use database history (dbms_flashback or SELECT AS OF in Oracle)?

dudeinjapan3y ago

We renamed "deleted_at" to "archived_at". And there was much rejoicing.

Smoosh3y ago

DB2 has implemented temporal tables which can automatically capture all changes to the primary table.

https://www.ibm.com/docs/en/db2/10.1.0?topic=tables-history

openthc3y ago

waspight3y ago

n4jm43y ago

I can't even tell you how much political capital I lost at a major retailer recommending against wasting time implementing soft deletions... on an internal portal that babysat linter configurations.

Don't ask me why the linter configurations weren't simply persisted in git.

msie3y ago

kleebeesh3y ago

Maybe a more accurate take: Half-assed soft deletion definitely isn't worth it.

If you're just going to throw in some deleted bool or deleted_at timestamp without thorough testing, you might as well just skip it. It's virtually certain to go wrong.

bob10293y ago

If you are going to think about this pattern, why not go one step further and simply event source everything with an append-only, immutable log?

You could even sprinkle cryptographic guarantees into the mix. This would be very challenging to do with mutable DB rows.

krascovict3y ago

If it's the case of deleting files safely, I recommend shared, it's very good...

https://wiki.archlinux.org/title/Securely_wipe_disk

AtNightWeCode3y ago

”The concept behind soft deletion is to make deletion safer, and reversible.” Well, that is one reason. To keep the actual data can be done for many reasons. Audits, reports, laws and so on.

Edit: Deletion is always reversible btw since there are backups.

qxxx3y ago

in one project I was working on, we used a similar version of the 2nd method from the article:

Every table had the same table with _del suffix (eg. users_del). If a record was deleted, it was simply moved to _del table. We used code for this but later we started to use db triggers.

wruza3y ago

But with soft deletion, this goes out the window. A customer may be soft deleted with its deleted_at flag set, but we’re now back to being able to forget do the same for its invoices.

outworlder3y ago

I wish Datomic was made open-source (with maybe some features available as an 'enterprise' offering) so that we could actually have a decent alternative for this 'soft-delete' problem.

satyrnein3y ago

We switched a lot of tables to soft deletes so we could replicate those deletes into our data warehouse. You can also use bin log replication for hard deletes, but every schema change would break it.

duxup3y ago

> All our selects look something like this: SELECT * FROM customer WHERE id = @id AND deleted_at IS NULL;

Solution… a whole other table of deleted stuff… in a new structure.

Man soft deletes just look better to my eye.

dcdc1233y ago

If you are using a state manager with models in something like rails/django/etc then it is trivial to support soft deletion without it infecting your entire code base.

pierrebai3y ago

The author claims pruning soft-deleted entries requires a complex query, but hard-deleting an entry would have required the same complexity. So it's really not an argument.

mmmuhd3y ago

I remember when a rouge employee of a client went ahead to do stupid deletions on students' and staff data, soft delete saved the day and made us some money.

magundu3y ago

We use soft deletion by moving all related rows into different archive database which will be cleaned for 60 days older entries.

For accidental delete, we will undelete from archive.

jtwebman3y ago

The bigger reason to use soft deletes is to keep history. Just because someone does not access doesn't mean we should report on the things they did months ago.

encoderer3y ago

Even if you don’t “undelete” something, soft deletes make it possible to instantly hide something while saving the expensive sql delete for processing later.

BatteryMountain3y ago

The purpose of soft deleting is not to be reversible...that's just a free side effect if you really need it.

galaxyLogic3y ago

Couldn't soft deletion be happening behinds the scenes by the database engine?

Then have a statement like RESTORE * FROM ...

sam_lowry_3y ago

In any case, soft deletion is usually a sign of incompetence. Whenever I saw it on a project, both soft deletion and the project turned sour.

tgbugs3y ago

One use case that I think is not sufficiently considered in this is related to two comments I made about a year ago [0, 1].

0. https://news.ycombinator.com/item?id=27249738 1. https://news.ycombinator.com/item?id=27691442

jacobsenscott3y ago

whoomp123423y ago

good idea but only if you dont use foreign keys. If you do use foreign keys, then you must create custom deletion logic for each relationship. Yuck!

revskill3y ago

In realworld, there's no concept as deletion from DB !

There's only deactivate account, archive a legacy product,...

Because there's no such thing as delete something from real world.

Pakdef3y ago

not worth it for short term profits... which is why most of today's internet will disappear

jacksnipe3y ago

The ONLY reason that you should avoid soft deletion is that deleting things permanently in a soft-deletion-based system is hard and error prone.

GDPR, among other regulations, requires that you be able to do this sometimes; and it requires that the data REALLY BE GONE.

But I really think that soft deletion should be the default unless you think you’ll be fielding user data deletion requests.

amerine3y ago

However, GDPR/customer data demands across regimes makes me agree with him and would suggests folks listen. <3

kache_3y ago

wait until this guy finds out about financial regulations

marginalia_nu3y ago

Including "deleted_at IS NULL" is surely something that you'd solve using a view, rather than explicitly entering it into the queries.

GDPR is the big thing to consider, I think.

pgt3y ago

XTDB: xtdb.com

gigatexal3y ago

Just put a filtered index on the column to better query non deleted data.

On the whole I don’t think in practice the author’s take makes much sense.

OOPMan3y ago

Nice way to get on HN.

Post a daft hot take.

kleer0013y ago

Yea, it is.

rzwitserloot3y ago

> But the technique has some major downsides. The first is that soft deletion logic bleeds out into all parts of your code. All our selects look something like this:

> Another consequence of soft deletion is that foreign keys are effectively lost.

Bit trickier, you have a few options:

* Make the constraint include that the foreign object has `deleted_at IS NULL`.

> GDPR scaremongering

So, for your requirement to delete specific records upon request, it's easier. For your requirement to not keep unneccessary data around beyond reasonable bounds, it's a simple script.

> Here’s a snippet from one that I wrote recently which keeps all foreign keys satisfied by removing everything as part of a single operation

> data deletion has non-data sideeffects

> Alternative: A deleted records table

> Hard deleting old records for regulatory requirements gets really, really easy: DELETE FROM deleted_record WHERE deleted_at < now() - '1 year'::interval.

It is _exactly_ as simple to do this if you use soft-delete. Bit of an own goal.

2022062412033y ago

It's something that a team of a PM, a QA and two developers can bill for at least a sprint. So, well worth it.

nikanj3y ago

”Here’s my argument for why airbags are useless: In my 15 years of driving, I haven’t needed them once”

Remember the Attlassian outage from earlier this year. They sure would have appreciated a soft delete

j / k navigate · click thread line to collapse