Either get rid of it, or allow nulls (and logically any other union) by supporting union types.
It is quite arbitrary to allow a type that makes sense (string, int) etc. and then also allow nulls so you allow INT | NULL union but no other unions. So you have this multi-purpose "other value" whose meaning is inferred by the application.
I guess they were added as a pragmatic "I dunno" field for CRUD systems without needing to go to all the effort to support unions. For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.
A simple fix would be to add a dialect option that makes any operation on a potentially null value a syntax error. This is possible because in SQL it's almost always possible to determine if a value X could potentially be NULL or not. If it could be NULL, something like (X == y) or (X + y) should generate a syntax error. Instead you should have to write (X is not null and X == y) or ISNULL(X + y, 0).
What doesn't make sense is to treat a nullable type the same as non-nullable type causing errors everywhere (Java), or make it so special that it basically has completely separate logic and operators attached (SQL)
Nulls in programming languages are a special “sentinel” value indicating the lack of an object. Nulls in sql means “unknown” or “I am not able to answer that question”.
This is why in most programming languages two nulls are equal, since they are the same sentinel value. Not so in SQL, since one unknown value is not necessaily the same as some other unknown value.
No it doesn't. It's a bad idea in programming languages just as it's a bad idea in SQL.
The problem is the way it's handled in many programming languages, where it can blend so easily with legitimate value and blow the whole program off at execution time.
It's a practical problem, not a theoretical one.
And it's not the same as union types. The division of two numbers always returns a number, and nothing else, it's just that, for some values, it's not defined.
We can all agree that the "silently accept null" approach is, by now, a quadrillion dollar mistake.
But having a safe "or null" union type as the _only_ union type in your language, isn't as far fetched as you make it sound.
It's the good old "allow zero, one or an infinite amount of any feature" rule. Sometimes "one" is the right choice.
And the same principle applies to values. A plain value is one value. A list/collection is an arbitrary number of values. And `null` is no value. Some would, rightly, say that `void` is no value, so `null` it's really a value representing no value. And then you will also want to represent zero-or-one value, which is where the -or-null type comes in.
This is a more fundamental union than just "a Foo or a Bar" which is always a value, then we're just arguing over type.
You can always use an `Option` class instead. If you have classes. But if you're going to use `Option.none` to represent the value of an uninitialized variable, you're going to build it into the language anyway, and then you might as well admit to that one union type.
Nulls are a special case though - even if null was not allowed in base tables, you would still need them in outer joins.
(actual question) how does 5NF apply here?
I thought that this case could be handled null-free by introducing a separate table
people_middlenames (human_id PK, middlename NOT NULL);
Is it already 5NF? I thought 5NF requires more complex structure of data, like explained in Wikipedia, for example.
Thanks,
That being said, names is an awful example; if you have to assume a specific structure from someone's name you kinda already lost [1][2].
Better to just assume it's a opaque mutable unicode string and hope for the best.
[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
[2] https://shinesolutions.com/2018/01/08/falsehoods-programmers...
You need something to represent lack of a value on LEFT JOINs. The author acknowledges as much, the two things he seems to be complaining about are:
1. It's impossible to distinguish between true lack of value from LEFT/RIGHT JOIN and a value that was directly set as NULL.
- Interestingly, this is exactly why JavaScript has both null and undefined, a design decision that many people also complain about and grumble that it should have been unified into one.
2. There's no reason for NULL to not equal NULL, which makes NULL a headache to handle and requires special IS NULL/IS NOT NULL operators to test for NULL-ness.
- Postgres 15 now addresses this with the DDL option UNIQUE NULLS NOT DISTINCT.
Finally, the author claims to have a solution that solves all these problems but sadly locked it behind a paywall on a website which is now defunct.
> Until recently, there was no logically correct, relational solution to missing data. We offer an outline of a possible such a solution in Practical Database Foundations paper #8, “The Final NULL in the Coffin,” which also summarizes the problems with NULLs.
http://www.dbdebunk.citymax.com/page/page/1396241.htm
> This website has been cancelled.
> Click here to go to the 5-minute website builder.
https://web.archive.org/web/20041209115415/http://www.dbdebu...
> ORDERING AND PRICING
> Delivery will be in PDF format. We strongly recommend to upgrade to the latest version of Acrobat reader and to have the following fonts installed: Verdana, Courier, Arial Narrow, Arial Black and Wingdings, so we don't have to embed these fonts and enlarge the files.
Only if you consider NULL to be an exact value, to have one meaning i.e. does not exist. In the mathematical set theory upon which relational database theory was built this is not the case.
NULL really represents “unknown” not just the subset that is “unknowable”/“does not exist” as which point not all NULLs are equal as they may represent entirely different things that you currently don't know - this is why NULL=<anything> == NULL (including NULL=NULL == NULL) and NULL≠<anything> == NULL.
Representing unknown/non-existent without allowing an explicit NULL value is can be modelled by the property being another entity, you have a child table that lists the values of the property so where you would store a NULL otherwise there simply exists no row in that table. In SQL this means extra joins and in most (all?) SQL implementations this results in (sometimes significantly) lower performance. The big argument there is whether this points to a problem in the theory (if arguing that NULL should not be “stored”), in SQL as an implementation of relational database theory, or in the implementations of SQL…
[I'm aware my terminology is all over the place, as most people's is: when talking about relational theory rather than SQL as a speicific implementation of it I should use tuple not row, relation not table, etc., but getting that right only serves to confuse people (a great many of those with a more self-taught and/or field trained programming background background than one that involves any computer science study) who know only SQL and have thus far not needed to be aware of the theory or history].
It's also clearly not a problem with SQL since people very loudly and clearly prefer the version with nulls and operations that create nulls.
It is a problem with the idea that people should program in relational algebra. They shouldn't. It's the same kind of issue as functional languages that extend lambda calculus; logical languages with explicit evaluation strategies; type systems with undecidable situations; module systems that allows access to private features... No actual usable system is a perfect representation of the math behind it.