Nulls Nullified (2005) (opens in new tab)

(dbazine.com)

28 pointsdeclanhaigh2y ago44 comments

44 comments

30 comments · 6 top-level

quickthrower22y ago· 21 in thread

NULL makes little sense from first principles.

Either get rid of it, or allow nulls (and logically any other union) by supporting union types.

It is quite arbitrary to allow a type that makes sense (string, int) etc. and then also allow nulls so you allow INT | NULL union but no other unions. So you have this multi-purpose "other value" whose meaning is inferred by the application.

I guess they were added as a pragmatic "I dunno" field for CRUD systems without needing to go to all the effort to support unions. For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.

rstuart41332y ago

I don't know about not making sense from first principles, but if a principle concern is bug free code then NULL is a disaster. I've seen so bugs caused by people forgetting a column could be NULL and using '<', '<=' I avoid them like the plague now. It's not just the unfamiliar behaviour, it's also that everyone forgets to write tests for it.

A simple fix would be to add a dialect option that makes any operation on a potentially null value a syntax error. This is possible because in SQL it's almost always possible to determine if a value X could potentially be NULL or not. If it could be NULL, something like (X == y) or (X + y) should generate a syntax error. Instead you should have to write (X is not null and X == y) or ISNULL(X + y, 0).

dtech2y ago

NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

What doesn't make sense is to treat a nullable type the same as non-nullable type causing errors everywhere (Java), or make it so special that it basically has completely separate logic and operators attached (SQL)

bazoom422y ago

A lot of confusion comes from SQL nulls having the same name as nulls in programming languages. They are conceptually completely different things.

Nulls in programming languages are a special “sentinel” value indicating the lack of an object. Nulls in sql means “unknown” or “I am not able to answer that question”.

This is why in most programming languages two nulls are equal, since they are the same sentinel value. Not so in SQL, since one unknown value is not necessaily the same as some other unknown value.

1 more reply

MrJohz2y ago

Typescript does have full unions, although only partially discriminated ones (in the sense that `number | string` is discriminated but `number | number` is not - custom discriminators can be written fairly easily though). This resolves the problem that the previous poster had - nullability is not some magic trait of certain types, but rather a function of the general case of union types. In that case, `string | null` is no different from `string | number`, and null is just one possible type among many.

1 more reply

lmm2y ago

> NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

No it doesn't. It's a bad idea in programming languages just as it's a bad idea in SQL.

1 more reply

sacado22y ago

In theory NULL (or any such sentinel value, such as the infamous NaN is floatin-point arithmetic) makes sense because there are partially defined functions. Division being the most famous example, but most basic operations on lists share that property too.

The problem is the way it's handled in many programming languages, where it can blend so easily with legitimate value and blow the whole program off at execution time.

It's a practical problem, not a theoretical one.

And it's not the same as union types. The division of two numbers always returns a number, and nothing else, it's just that, for some values, it's not defined.

AnimalMuppet2y ago

I could argue that { number that is defined | number that is not defined } is in fact a union type (or, perhaps more correctly, a sum type).

bruce3434342y ago

Being pedantic here but "sentinel" means "signalling the end of a sequence" (such as a 0 byte for C strings, or visually when you close an array or a block of code with `}`)

Mikhail_Edoshin2y ago

As far as I understand NULLs were indeed a pragmatic choice, but the chief reason was the need to compute derived table values. E.g. in a join it is normal to get unknown values for a cell. This has to be expressed somehow. How? NULL seems to be a reasonably good generic solution to this. It may be possible to come up with a different generic solution, but such a solution would probably require something like conditional fields in a table and this is a whole new level of relational logic.

cubefox2y ago

I wonder how Codd's original relational algebra handled this case, as he didn't have NULL cells.

2 more replies

reichstein2y ago

I'll have to disagree.

We can all agree that the "silently accept null" approach is, by now, a quadrillion dollar mistake.

But having a safe "or null" union type as the _only_ union type in your language, isn't as far fetched as you make it sound.

It's the good old "allow zero, one or an infinite amount of any feature" rule. Sometimes "one" is the right choice.

And the same principle applies to values. A plain value is one value. A list/collection is an arbitrary number of values. And `null` is no value. Some would, rightly, say that `void` is no value, so `null` it's really a value representing no value. And then you will also want to represent zero-or-one value, which is where the -or-null type comes in.

This is a more fundamental union than just "a Foo or a Bar" which is always a value, then we're just arguing over type.

You can always use an `Option` class instead. If you have classes. But if you're going to use `Option.none` to represent the value of an uninitialized variable, you're going to build it into the language anyway, and then you might as well admit to that one union type.

cubefox2y ago

While I'm a big proponent of union types in programming languages (against Haskell-like Option wrapping and unwrapping), it is not clear to me that they would work for something like SQL. Would it mean allowing columns with union types?

bazoom422y ago

As soon as you have composite types, you will want to filter and join on the individual components, which would require extending the query language to allow drilling down into nested composite types. Basically you are back to hieracical databases. This is why Codd recommends expressing composite types as relations rather than complex values. You can express the same information, but can use relational algebra rather then special-case operators.

Nulls are a special case though - even if null was not allowed in base tables, you would still need them in outer joins.

petalmind2y ago

> For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.

(actual question) how does 5NF apply here?

I thought that this case could be handled null-free by introducing a separate table

people_middlenames (human_id PK, middlename NOT NULL);

Is it already 5NF? I thought 5NF requires more complex structure of data, like explained in Wikipedia, for example.

Thanks,

ElectricalUnion2y ago

Looks 5NF enough to me.

That being said, names is an awful example; if you have to assume a specific structure from someone's name you kinda already lost [1][2].

Better to just assume it's a opaque mutable unicode string and hope for the best.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

[2] https://shinesolutions.com/2018/01/08/falsehoods-programmers...

SPBS2y ago

> NULL makes little sense from first principles.

You need something to represent lack of a value on LEFT JOINs. The author acknowledges as much, the two things he seems to be complaining about are:

1. It's impossible to distinguish between true lack of value from LEFT/RIGHT JOIN and a value that was directly set as NULL.

- Interestingly, this is exactly why JavaScript has both null and undefined, a design decision that many people also complain about and grumble that it should have been unified into one.

2. There's no reason for NULL to not equal NULL, which makes NULL a headache to handle and requires special IS NULL/IS NOT NULL operators to test for NULL-ness.

- Postgres 15 now addresses this with the DDL option UNIQUE NULLS NOT DISTINCT.

Finally, the author claims to have a solution that solves all these problems but sadly locked it behind a paywall on a website which is now defunct.

> Until recently, there was no logically correct, relational solution to missing data. We offer an outline of a possible such a solution in Practical Database Foundations paper #8, “The Final NULL in the Coffin,” which also summarizes the problems with NULLs.

http://www.dbdebunk.citymax.com/page/page/1396241.htm

> This website has been cancelled.

> Click here to go to the 5-minute website builder.

https://web.archive.org/web/20041209115415/http://www.dbdebu...

> ORDERING AND PRICING

> Delivery will be in PDF format. We strongly recommend to upgrade to the latest version of Acrobat reader and to have the following fonts installed: Verdana, Courier, Arial Narrow, Arial Black and Wingdings, so we don't have to embed these fonts and enlarge the files.

dspillett2y ago

> There's no reason for NULL to not equal NULL

Only if you consider NULL to be an exact value, to have one meaning i.e. does not exist. In the mathematical set theory upon which relational database theory was built this is not the case.

NULL really represents “unknown” not just the subset that is “unknowable”/“does not exist” as which point not all NULLs are equal as they may represent entirely different things that you currently don't know - this is why NULL=<anything> == NULL (including NULL=NULL == NULL) and NULL≠<anything> == NULL.

Mikhail_Edoshin2y ago

It seems to be available for $14.99 on Amazon.

classified2y ago

It's not a union type, but an option. INT | NULL --> Option<INT>.

dtech2y ago

Option is an union type

1 more reply

contravariant2y ago

Why do pointed sets not make sense from first principles?

cubefox2y ago· 3 in thread

The author talks a lot about how NULL in SQL is supposedly unnecessary, and claims that it is totally possible to handle unknown SQL values without NULL. Yet he successfully avoids explaining how his solution is supposed to work. At the end he links to a (paid?) paper where his solution is apparently explained, though the link has since ceased to work. Great.

dspillett2y ago

There are many papers describing how NULL as an explicit “value” (you still at least need something to present empty results from left or right joins) can be made unnecessary – just look back to the development of relational databases and the theory that are based upon. No need to pay for this author's paper when it probably just restates things that are found in many open access papers/books/other.

Representing unknown/non-existent without allowing an explicit NULL value is can be modelled by the property being another entity, you have a child table that lists the values of the property so where you would store a NULL otherwise there simply exists no row in that table. In SQL this means extra joins and in most (all?) SQL implementations this results in (sometimes significantly) lower performance. The big argument there is whether this points to a problem in the theory (if arguing that NULL should not be “stored”), in SQL as an implementation of relational database theory, or in the implementations of SQL…

[I'm aware my terminology is all over the place, as most people's is: when talking about relational theory rather than SQL as a speicific implementation of it I should use tuple not row, relation not table, etc., but getting that right only serves to confuse people (a great many of those with a more self-taught and/or field trained programming background background than one that involves any computer science study) who know only SQL and have thus far not needed to be aware of the theory or history].

marcosdumay2y ago

It's not a problem with the theory, because the theory was not created to be a high-performance application.

It's also clearly not a problem with SQL since people very loudly and clearly prefer the version with nulls and operations that create nulls.

It is a problem with the idea that people should program in relational algebra. They shouldn't. It's the same kind of issue as functional languages that extend lambda calculus; logical languages with explicit evaluation strategies; type systems with undecidable situations; module systems that allows access to private features... No actual usable system is a perfect representation of the math behind it.

bazoom422y ago

But how to represent missing values in left/right/outer joins?

1 more reply

tragomaskhalos2y ago

I assume the author's alternative to null is to punt nullable values into their own table, where the presence of the row tells you that the value is in fact there. Thing is, if your table has three such columns then you're going to need three extra tables. The whole argument is reminiscent of the polemic against surrogate keys, viz that they are a violation of Codd. No-one is arguing that nulls in SQL aren't a mess, but an RDBMS is a tool, not a holy shrine, and most of us prefer to just use bloody things to get work done and ignore the purist bikeshedding.

TRiG_Ireland2y ago

The author is right about one thing. If NULL means "unknown", then the result of the comparison A > B where either or both values is NULL should itself be NULL, not FALSE.

cactusfrog2y ago

Null means unknown. I think the main issue is that people use it to mean “not applicable to this row”, which is different than null’s initial purpose. I think the problems with null could be solved by adding in another None value to distinguish between these two situations.

al2o3cr2y ago

LOL @ "I'm smarter than the authors of the SQL standard, buy my paper to find out why!"

j / k navigate · click thread line to collapse

44 comments

30 comments · 6 top-level

quickthrower22y ago· 21 in thread

NULL makes little sense from first principles.

Either get rid of it, or allow nulls (and logically any other union) by supporting union types.

rstuart41332y ago

dtech2y ago

NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

bazoom422y ago

A lot of confusion comes from SQL nulls having the same name as nulls in programming languages. They are conceptually completely different things.

Nulls in programming languages are a special “sentinel” value indicating the lack of an object. Nulls in sql means “unknown” or “I am not able to answer that question”.

This is why in most programming languages two nulls are equal, since they are the same sentinel value. Not so in SQL, since one unknown value is not necessaily the same as some other unknown value.

1 more reply

MrJohz2y ago

1 more reply

lmm2y ago

> NULL/missing is so common that it makes sense to provide special language support, even if you don't have full unions. See e.g. Kotlin and Typescript.

No it doesn't. It's a bad idea in programming languages just as it's a bad idea in SQL.

1 more reply

sacado22y ago

The problem is the way it's handled in many programming languages, where it can blend so easily with legitimate value and blow the whole program off at execution time.

It's a practical problem, not a theoretical one.

And it's not the same as union types. The division of two numbers always returns a number, and nothing else, it's just that, for some values, it's not defined.

AnimalMuppet2y ago

I could argue that { number that is defined | number that is not defined } is in fact a union type (or, perhaps more correctly, a sum type).

bruce3434342y ago

Being pedantic here but "sentinel" means "signalling the end of a sequence" (such as a 0 byte for C strings, or visually when you close an array or a block of code with `}`)

Mikhail_Edoshin2y ago

cubefox2y ago

I wonder how Codd's original relational algebra handled this case, as he didn't have NULL cells.

2 more replies

reichstein2y ago

I'll have to disagree.

We can all agree that the "silently accept null" approach is, by now, a quadrillion dollar mistake.

But having a safe "or null" union type as the _only_ union type in your language, isn't as far fetched as you make it sound.

It's the good old "allow zero, one or an infinite amount of any feature" rule. Sometimes "one" is the right choice.

This is a more fundamental union than just "a Foo or a Bar" which is always a value, then we're just arguing over type.

cubefox2y ago

bazoom422y ago

Nulls are a special case though - even if null was not allowed in base tables, you would still need them in outer joins.

petalmind2y ago

> For example not everyone has a middle name, but you don't want to go all 5th Normal Form on it.

(actual question) how does 5NF apply here?

I thought that this case could be handled null-free by introducing a separate table

people_middlenames (human_id PK, middlename NOT NULL);

Is it already 5NF? I thought 5NF requires more complex structure of data, like explained in Wikipedia, for example.

Thanks,

ElectricalUnion2y ago

Looks 5NF enough to me.

That being said, names is an awful example; if you have to assume a specific structure from someone's name you kinda already lost [1][2].

Better to just assume it's a opaque mutable unicode string and hope for the best.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

[2] https://shinesolutions.com/2018/01/08/falsehoods-programmers...

SPBS2y ago

> NULL makes little sense from first principles.

You need something to represent lack of a value on LEFT JOINs. The author acknowledges as much, the two things he seems to be complaining about are:

1. It's impossible to distinguish between true lack of value from LEFT/RIGHT JOIN and a value that was directly set as NULL.

- Interestingly, this is exactly why JavaScript has both null and undefined, a design decision that many people also complain about and grumble that it should have been unified into one.

2. There's no reason for NULL to not equal NULL, which makes NULL a headache to handle and requires special IS NULL/IS NOT NULL operators to test for NULL-ness.

- Postgres 15 now addresses this with the DDL option UNIQUE NULLS NOT DISTINCT.

Finally, the author claims to have a solution that solves all these problems but sadly locked it behind a paywall on a website which is now defunct.

http://www.dbdebunk.citymax.com/page/page/1396241.htm

> This website has been cancelled.

> Click here to go to the 5-minute website builder.

https://web.archive.org/web/20041209115415/http://www.dbdebu...

> ORDERING AND PRICING

dspillett2y ago

> There's no reason for NULL to not equal NULL

Only if you consider NULL to be an exact value, to have one meaning i.e. does not exist. In the mathematical set theory upon which relational database theory was built this is not the case.

Mikhail_Edoshin2y ago

It seems to be available for $14.99 on Amazon.

classified2y ago

It's not a union type, but an option. INT | NULL --> Option<INT>.

dtech2y ago

Option is an union type

1 more reply

contravariant2y ago

Why do pointed sets not make sense from first principles?

cubefox2y ago· 3 in thread

dspillett2y ago

marcosdumay2y ago

It's not a problem with the theory, because the theory was not created to be a high-performance application.

It's also clearly not a problem with SQL since people very loudly and clearly prefer the version with nulls and operations that create nulls.

bazoom422y ago

But how to represent missing values in left/right/outer joins?

1 more reply

tragomaskhalos2y ago

TRiG_Ireland2y ago

The author is right about one thing. If NULL means "unknown", then the result of the comparison A > B where either or both values is NULL should itself be NULL, not FALSE.

cactusfrog2y ago

al2o3cr2y ago

LOL @ "I'm smarter than the authors of the SQL standard, buy my paper to find out why!"

j / k navigate · click thread line to collapse