undefined | Better HN

0 pointsmike_d3y ago0 comments

> If data validation doesn't belong in the database, then what does? At that point you're treating your RDBMS as a fancy filesystem and drastically underutilizing the tool.

Data storage and querying. Validation should be done in the application layer for security and scalability.

"Premature optimization" is the rallying cry of people standing in the corner of a room with a painted floor holding a paint brush.

0 comments

3 comments · 3 top-level

billythemaniam3y ago

If you are thinking of SQL injection when you say "security", yes you need to sanitize the input before querying the database. But that is different than validating the sanitized data is correct. The database is generally a better place to do that. If worrying about scalability is a legitimate concern and not a premature optimization, then look at one of the distributed SQL databases (Spanner, TiDB, CockroachDB, Yugabyte) which are getting pretty good.

sgk2843y ago

> Data storage and querying

Storage + querying is vastly underutilizing a tool that is designed to help you safely manage the entire lifecycle of your data – From defining that data (DDL), making it accessible (queries, indexes), ensuring consistency (checks, constraints, types, etc...), managing cascading data events (triggers), and more!

It's a fool's errand to try to (poorly) replicate these things elsewhere when there's already a battle hardened system with decades of investment that allows you to do these things declaratively and consistently for free.

And it's much safer! The DB will enforce that validation even when services are doing massive migrations or new clients are mutating the data in ways you didn't anticipate. It's far less error prone than ensuring that all of your services are validating data the same way.

> Validation should be done in the application layer for security and scalability.

Input validation, for sure. Sanitize as close to the perimeter as you can. But for consistency and integrity of data, let the database do what it's great at.

Your application server will probably not do it as efficiently as your highly-tuned database. There's no reason to assume that pushing that work into the database isn't a better scalability strategy. And it's certainly a more robust strategy from an organizational perspective.

Fixing slow things is frequently trivial compared to fixing bad data or tracking down subtle data bugs. For example, borrowing from a real world scenario in an Elixir service that uses Ecto to handle data mapping and validation, our code assumed a key in a JSON blob was a string, but in a handful of old rows it's a number. This inconsistency only existed in ~0.0005% of rows, and only in old data, but it surfaced in all sorts of weird ways. And was difficult to track down.

Using the RDBMS to handle validation would have never allowed this to happen.

> "Premature optimization" is the rallying cry of people standing in the corner of a room with a painted floor holding a paint brush.

That strikes me as an uninformed opinion. Optimizations usually work by making lots of assumptions on the nature of your problem and then implementing something that depends on those assumptions always holding. This leads to more convoluted and fragile code, but in return you get performance.

Writing optimized code frequently involves removing optionality, which is the equivalent of painting yourself into a corner.

silversmith3y ago

Saying "Premature optimization" is fine, as long as you leave yourself a way out. And the way out is clear here - when your database no longer can handle the load, drop the DB check constraints and move them over to application layer.

Could you please elaborate on the "security" part of your comment? To me it seems database level checks would be more secure, as they would work with every possible way to update data, rather than just the blessed paths going through the app layer.

And as far as scalability goes... it depends. If your ambition is to take on twitter, you probably need to take scaling seriously on every step along the way. However, I've been building mainly B2B systems for 15 years now, and just about all of them have been purring away happily on the low end of AWS RDS offerings. There are orders of magnitude more performance available at the click of a button. Database scalability is not something that keeps me up at night.

j / k navigate · click thread line to collapse