Take the middle name example: "What was U.S President Theodore Roosevelt’s middle name?" when you know he didn't have a middle name.
Are you suggesting that a blank is the correct choice here?
I don't think it's accurate to say they have a blank middle name. I think it's better to say they don't have a middle name.
Both are entirely valid methods of encoding the data, and require knowing how the data is intended to be collected, entered and used to determine the best choice.
If the data needs to differentiate between those cases and a separate boolean bit of data to track whether it was collected is no feasible (as it so often isn't in the real world), then you do what you must. How that works out in the real world would be the policy for how to interpret the data in that field. It's really no different than any time you've seem a field that is named something counter-intuitive to what it holds in some schema. Something tracks that difference, whether it be institutional knowledge, come conversion code, or a label prior to display. That's what I mean by it being a "schema/data policy question".
You're assertion is that it's counter-intuitive to use the database's ability to encode a lack of information to denote a lack of knowledge about the information, and should only be used to denote the information does not universally exist? And additionally that this is so well accepted that to do otherwise would not be accepted by any peers?
If so, that's a fairly bold assertion to make when you haven't provided any any evidence or reasoning to back it up.
- I know that Theodore Roosevelt has no middle name.
- I don't know what Theodore Roosevelt's middle name is
The semantics of each case are different in a subtle but important way. Consider the following two predicates
'Jim' = [no value] vs 'Jim' = [I don't know]
The former predicate is obviously false. 'Jim' is not equivalent to a non-existent value. The latter predicate however, can't be assigned a truth value. It's entirely possible that the unknown value could later turn out to be 'Jim'.
The key issue is that SQL treats NULL values according to the latter ('I don't know') rule and ternary (Kleene) logic, as TFA discusses. That, in turn means that using NULL to represent [no value] will lead to painful and hard to diagnose logic bugs (e.g., WHERE x = 1 OR x != 1 not returning all records).
[edit: clarifying why bugs will arise]
In a sense, empty string is available as a marker, just as NULL is available as a marker, to satisfy either scenario. I think the possibility of three-valued logic applies in either scenario.
I'm not at all convinced empty (non-null) string is appropriate for many real world applications, just as empty (non-null) date is appropriate for similar scenarios.
Eg. Date of death: unknown - NULL; not dead yet - NULL
In the database you need to represent if the user entered a blank name “”, or if the middle name was never even requested NULL.
Only in Oracle is it not possible to make this distinction. Oracle is definitely broken in this regard.
Inserting a string in the middle name column “unknown” or “unspecified” is a clunky workaround.