undefined | Better HN

0 pointstalaketu6y ago0 comments

I'm sceptical. Do you have examples?

Take the middle name example: "What was U.S President Theodore Roosevelt’s middle name?" when you know he didn't have a middle name.

Are you suggesting that a blank is the correct choice here?

I don't think it's accurate to say they have a blank middle name. I think it's better to say they don't have a middle name.

0 comments

8 comments · 3 top-level

kbenson6y ago· 4 in thread

That's entirely a schema/data policy question. Null could indicate no name, or it just as easily could indicate tha data was not collected, and an empty string would indicate it was collected but is empty.

Both are entirely valid methods of encoding the data, and require knowing how the data is intended to be collected, entered and used to determine the best choice.

talaketuOP6y ago

Should there be an empty date, as a corollary to empty string, for representing dates that are "collected but empty" eg date of death?

kbenson6y ago

If that's the data policy of the group collecting the data, then an empty date (which some systems support, 0000-00-00 is valid in MySQL for certain modes of strictness IIRC), or a sentinel value could be used (1111-11-11, depending on expected values).

If the data needs to differentiate between those cases and a separate boolean bit of data to track whether it was collected is no feasible (as it so often isn't in the real world), then you do what you must. How that works out in the real world would be the policy for how to interpret the data in that field. It's really no different than any time you've seem a field that is named something counter-intuitive to what it holds in some schema. Something tracks that difference, whether it be institutional knowledge, come conversion code, or a label prior to display. That's what I mean by it being a "schema/data policy question".

1 more reply

zadokshi6y ago

No, what you suggest is counter intuitive and should never pass a code review.

kbenson6y ago

> No, what you suggest is counter intuitive and should never pass a code review.

You're assertion is that it's counter-intuitive to use the database's ability to encode a lack of information to denote a lack of knowledge about the information, and should only be used to denote the information does not universally exist? And additionally that this is so well accepted that to do otherwise would not be accepted by any peers?

If so, that's a fairly bold assertion to make when you haven't provided any any evidence or reasoning to back it up.

okennedy6y ago· 1 in thread

The problem is a bit more subtle. NULL is often used with two distinct meanings:

- I know that Theodore Roosevelt has no middle name.

- I don't know what Theodore Roosevelt's middle name is

The semantics of each case are different in a subtle but important way. Consider the following two predicates

'Jim' = [no value] vs 'Jim' = [I don't know]

The former predicate is obviously false. 'Jim' is not equivalent to a non-existent value. The latter predicate however, can't be assigned a truth value. It's entirely possible that the unknown value could later turn out to be 'Jim'.

The key issue is that SQL treats NULL values according to the latter ('I don't know') rule and ternary (Kleene) logic, as TFA discusses. That, in turn means that using NULL to represent [no value] will lead to painful and hard to diagnose logic bugs (e.g., WHERE x = 1 OR x != 1 not returning all records).

[edit: clarifying why bugs will arise]

talaketuOP6y ago

I do recognize (1) there is a logical difference between "missing because absent" and "missing because unknown", (2) empty string is not an element from the domain of personal names.

In a sense, empty string is available as a marker, just as NULL is available as a marker, to satisfy either scenario. I think the possibility of three-valued logic applies in either scenario.

I'm not at all convinced empty (non-null) string is appropriate for many real world applications, just as empty (non-null) date is appropriate for similar scenarios.

Eg. Date of death: unknown - NULL; not dead yet - NULL

zadokshi6y ago

This is the perfect example. In an enterprise, some forms might ask middle name and some forms might not.

In the database you need to represent if the user entered a blank name “”, or if the middle name was never even requested NULL.

Only in Oracle is it not possible to make this distinction. Oracle is definitely broken in this regard.

Inserting a string in the middle name column “unknown” or “unspecified” is a clunky workaround.

j / k navigate · click thread line to collapse

0 comments

8 comments · 3 top-level

kbenson6y ago· 4 in thread

Both are entirely valid methods of encoding the data, and require knowing how the data is intended to be collected, entered and used to determine the best choice.

talaketuOP6y ago

Should there be an empty date, as a corollary to empty string, for representing dates that are "collected but empty" eg date of death?

kbenson6y ago

1 more reply

zadokshi6y ago

No, what you suggest is counter intuitive and should never pass a code review.

kbenson6y ago

> No, what you suggest is counter intuitive and should never pass a code review.

If so, that's a fairly bold assertion to make when you haven't provided any any evidence or reasoning to back it up.

okennedy6y ago· 1 in thread

The problem is a bit more subtle. NULL is often used with two distinct meanings:

- I know that Theodore Roosevelt has no middle name.

- I don't know what Theodore Roosevelt's middle name is

The semantics of each case are different in a subtle but important way. Consider the following two predicates

'Jim' = [no value] vs 'Jim' = [I don't know]

[edit: clarifying why bugs will arise]

talaketuOP6y ago

I do recognize (1) there is a logical difference between "missing because absent" and "missing because unknown", (2) empty string is not an element from the domain of personal names.

In a sense, empty string is available as a marker, just as NULL is available as a marker, to satisfy either scenario. I think the possibility of three-valued logic applies in either scenario.

I'm not at all convinced empty (non-null) string is appropriate for many real world applications, just as empty (non-null) date is appropriate for similar scenarios.

Eg. Date of death: unknown - NULL; not dead yet - NULL

zadokshi6y ago

This is the perfect example. In an enterprise, some forms might ask middle name and some forms might not.

In the database you need to represent if the user entered a blank name “”, or if the middle name was never even requested NULL.

Only in Oracle is it not possible to make this distinction. Oracle is definitely broken in this regard.

Inserting a string in the middle name column “unknown” or “unspecified” is a clunky workaround.

j / k navigate · click thread line to collapse