Generally speaking you want context of what the date or datetime is being used for. If you want to know when Christmas takes place you will use a date: it does not take place on 2022-12-25 00:00:00 through 2022-12-25 11:59:59 because this would require a timezone and Christmas takes place at different UTC times around the globe. But you can reasonably say that Christmas takes place on 2022-12-25 and leave it at that to let the implementation of whatever program figure out if it is or is not currently Christmas based on the information it has about time and timezone.
This is way overstated. It's more like asking if the integer `1L` is between the float `1.0D` and `2.0D`. It requires an implicit cast. Decide whether operations involving LocalDate[0] and LocalDateTime should use the former or the latter as the working type (the latter, IMO), do the cast (LocalDate becomes a LocalDateTime with time component 00:00:00, or LocalDateTime becomes a LocalDate by dropping its time component), then do the operation.
If the types you're dealing with are LocalDate and Instant, then you'd need a contextual timezone to perform the conversion in either direction.
I don't think this is a hard problem, it's just one that requires more specificity than one might think at first glance. Users should be encouraged to read the spec of any language that does these sorts of implicit casts, in the same way that C programmers should be aware of the implicit typecasting rules in that language.
0. I'm using the java.time/org.joda.time type names here. Their equivalents should exist in any good time library. However, I think in reality many libraries fall short of the types you really need to express date & time concepts, which is what leads to so much confusion around them. Once you're familiar with LocalDate, LocalTime, LocalDateTime, OffsetDateTime, ZonedDateTime, Instant, Duration, and Period, (whew), it's possible to be clear about what you're doing.
Some of the other Orthodox Churches (such as the Greeks) technically celebrate Christmas, not using the Gregorian calendar, but rather the “Revised Julian” - which happens to be identical to the Gregorian until 2800. I wonder if, come 2800, they’ll remember to move the date of Christmas, or if they’ll think “there’s no point to it, let’s not” (assuming of course that both they, and humanity as a whole, are still around in 2800)
Wait, what did you do to the other 12 hours of the day?
Oh.
Time is hard.
My mental model for these kinds of things is that Times are instants and Dates are either:
1. Ranges (with the start and end Time depending on TZ and possibly other context)
2. Discrete cells of a calendar (which are mostly TZ independent - July 5th doesn't happen at the same time everywhere, but it is well-defined everywhere)
Also, I've been writing code for 25 years and I still have no idea what a DateTime is.
Pretty sure these are the two cases which the python stdlib datetime covers.
The question is whether the definition is useful or not.
Looked at as ranges, equality can be defined as whether they overlap, and that works for most use cases but not all.
I’d argue though, that without context, a date has the time 00:00 in your local time zone.
I'd argue that without context a Date cannot have any time inferred. In the programming context it should be treated as a programming error and comparisons should fail to compile unless additional context is given.
A lot of the confusion here seems to stem from how we traditionally store dates (seconds/millis after a reference point in time), and over time we've confused the predominant way in which the data is represented from what the data actually is intended to mean.
let date1 = Date("06/05/2022")
let date2 = DateTime("06/05/2022 20:05 UTC")
if date2 > date1 {
// SHOULD NOT COMPILE
}
let date3 = Date("06/05/2022").withTime("05:00 UTC")
if date2 > date3 {
// DOES COMPILE
}
Injecting context automatically without the intentional action/declaration of the programmer is where a billion bugs are born.The issue here is around the semantics of the mathematical operators. It isn't even really about the types to which they apply; there are systems where `=` is well defined on heterogenous types.
The reason the answer to all these questions is a clear "no" is that they do not satisfy the core properties of the operators. For example, take equality. Equality in almost all mathematical constructs means that it supports substitution, is symmetric, transitive and reflexive. There are also well defined properties for the concepts of `greater than` and `less than`.
So, no, the OP's conclusion "Literally we’re all just making it up" is incorrect. You cannot use the operators `=`, `<`, and `>` between dates and times because they do not satisfy the core properties that define those operators. (I guess you could try to document an alternate definition of equality without the symmetric property in your documentation but... good luck with that not leading to massive confusion.)
Where you can just make it up is to define new operators as you actually want them to be. It's not `=`, it's `myDateTime=()` and then you're free to write the definition of that yourself. As long as you're consistent in the UI of how you present it (don't pretend it's vanilla `=` to the user!) you will at least be telling the truth. It may not solve all your problems but at least you won't be feeding any more to the fires of confusion, which you will as long as you keep pretending it's possible to make `=` mean something that's not reflexive.
If I define a date MM/DD/YYYY as equal to the time MM/DD/YYYY 00:00 GMT, which of the equality properties am I lacking?
Similarly, if I use that definition of equality to convert the date into a time, and then compare it, can't I get '>' and '<'?
That's the exact problem - if you contrived a date into an int then yes, it is compatible with the =, >, and < operators.
The point of the blog post is to assert that that contrivance is not appropriate in nearly all use cases, despite how popular it seems to be. In the vast majority of use cases contriving MM/DD/YYYY into MM/DD/YYYY 00:00 GMT is not reasonable and can mask a vast amount of unintended behavior.
For example, a person in California who enters a date of 06/05/2022 in reference to their local time zone will suddenly wind up the day before - because 06/05/2022 00:00 GMT is actually 06/04/2022 16:00 PST.
If the developer wants to contrive the date into a "midnight instant", they are welcome to do so and there should be plenty of convenience functions to allow them to do that, but implicitly performing that contrivance is dangerous.
If you only have instants, then sure, you could do what you're saying, but the context in which the question was poised seems to imply that "dates" and "times" are separate.
And if you do have separate "date" and "instant" types, you lose the property of substitution: f(x) = f(y) if x = y for any arbitrary function f.
For example non-mathematicians/programmers will specify date range as 2022-01-01 to 2022-01-31.
to map this you need a special mapping to convert to instant... (you need to map the upper range to 2022-02-01 (00:00), anything else is probably wrong).
The only question in circumstance like this is whether or not my requirements allow for-- or can reasonably be modified-- to strip off time and only compare dates.
If time is an inherent requirement to the project then the response I give is simple: "Then begin collecting time data."
If you want to ask "Does a given date contain a time?", you find the start and end of that date in the relevant timezone and check if the time value falls in that range.
It's not a programming issue, a type casting issue or a CS issue.
It's a UI issue. That the point you got the date and the datetime values, there was a human being (either a user, an admin or a programmer) sitting in front of the screen and they were asked a question in some way.
The precise way that question was presented affected their intent. It's their intent you are asking about here. So you need to think about what they were asked and how they were asked it. If that was consistent then you can make a reasonable call here. If it's inconsistent at different times and places then the meaning of that date is different and you can't solve this in any reasonable way.
Whether you are comparing a date to a datetime or saving a date as a datetime, almost every language turns '2022-01-02' into something like '2022-01-02T00:00:00+0000' i.e. the first second of the day.
In practice it is usually safer to convert to '2022-01-02T12:00:00+0000', centering the time in the day and minimizing the chance that timezone related or other errors will move you into a whole new day.
The time component is always a lie, but noon is the safest lie.
The static typing answer: type error, type error and type error.
The dynamic typing answer: exception, exception and exception.
The philosophy answer: category error, category error and category error.
Where I work (banks and other financial institutions) there is frequently a need to check whether something happened within a particular day. Or maybe select records from the database for a day, or do some other kind of logic or filtering.
For some unknown reason, most people decide that best way to do this is to take start date as the beginning of the period and then add to it 23 hours, 59 minutes and 59 seconds and use that as the end of the period.
Explanations that they are missing a whole second do not seem to be working. People are absolutely convinced they are doing this correctly.
I thought about this for a long time and I arrived at an explanation.
It seems that some people use the time as a label for a span of time of unit length. March 21st is a label for an entire day. 12 pm is a label for an entire hour that starts at 12pm and lasts an hour, etc. 2021 is a whole year.
And in some contexts it makes sense. Way say something happened on 12th of January -- we use Jan, 12th as a label for an entire day that we would otherwise have to denote with two timestamps. But in some contexts what we need is an exact point in time, a timestamp. And here is where a lot of people just don't think / can't recognise a difference between timestamps and labels for a span of time.
If you use a wrong model then yes, the day starts with a second labeled 00:00:00 and ends with a second labeled 23:59:59.
Except that's not how most of the underlying software works. Most software in this case expects two exact timestamps to denote the end of the span of time. And 23:59:59 is just 1 second shy of the actual end of day which means that, even if we are missing an entire second of the day, most of the time everything seems to work fine.
Unless you are large bank and you have millions of transactions all over the clock that have to accumulated exactly. Then yes, it makes a lot of difference.
Another explanation is that people don't seem to be comfortable with the concept of selecting items from between 00:00:00 of one day and 00:00:00 of the next because they are seeing another date.
... and date >= '2020-02-13' and date < '2020-02-14' -- notice: >= and <
Since this also applies to real numbers, maybe it is easier to keep in mind with an example in that realm. You would never check for a real number in a given range with: ... and X >= 3.0 and X <= 3.9 // or 3.99, or 3.999999
but instead, of course, you would do: ... and X >= 3.0 and X < 4 DATE(widget_made) = :user_date
Which does what I want and avoids the problem you brought up. But sometimes I have some more complicated logic that falls down if I try to compare datewise. So I manually create the DateTime ranges, and use 23:59:59 as the end of the range. So something like widget_made BETWEEN :user_date '00:00:00' AND :user_date '23:59:59'
Anything that came off the line at 23:59:59.3421 will be excluded. Fortunately for me, I don't think that has ever actually happened. But now I know to be on the lookout, and use proper date-handling tools to ensure correctness.Drawing examples from pythons stdlib there are (roughly) three types: dates, datetimes (tz naive), and datetimes (tz aware)
I assume this is a response to specifically this part:
> Dates without times are ranges.
If we're talking conceptually about general date/time/datetime comparisons, not pegged to a specific programming language or type system, I agree with it: Dates are a 24-hour range.
You can frame this from either the perspective of ranges or the perspective of degrees of precision.
I think both are valid, useful conceptions. You could argue they're semantically identical, I think?
The easiest option is that comparators should only work on the same data type, to avoid any ambiguity, leaving it up to the user to do explicit casting, and throwing errors if they don’t.
Of course, like integers and decimals, maybe it makes sense to have implicit casting, but it’s unintuitive to me if “‘2020-01-01’ > ‘2020-01-01 12:000’” should be cast to two dates, two timestamps, or two timestamptzs. Even if a language allows implicit casting, it’s probably an area where not doing it as a author is a smell.
It seems they’re talking about Instants with different resolutions.
My answers are no, no, and yes.
If we see what they’re calling date as integers and times as floats, the questions become
Is 4 equal to 4.5? No
Is 4 between 4.5 and 200.5? No
Is 4.5 after 4? Yes
Is today lunch and a few months from now? No, but also yes. Parts of today are, but the entire "today" isn't within that window.
Is today's lunch after today? No! It's during today, not tomorrow or some other future day. But also yes! Lunch happens after the calendar has flipped to today.
So we got Instants, Calendar Objects ("days", "weeks", etc), Durations, etc. Depending on context you could want some set operators to determine unions and intersections, or maybe you want simple numeric operators.
- what is assumed as "today" was a range and the fact that the question is asked is a sign something got really wrong down the chain. So the question has no answer.
- it is assumed that "today" is a time. If the exact time value existed but was cut off at some point, we're again in a "no answer" situation. The only way this can be answered is if the date is implicitly a time at midnight.
To me too, the only proper answer is "no,no,yes" and any other context is a "what happened?"
"2022-02-12" is not always implicitly a representation of "2022-02-12 00:00:00".
Sometimes granularity is omitted because it is zero, and sometimes granularity is omitted because it doesn't exist.
(and sometimes granularity that doesn't exist is represented as zero due to encoding limitations/mistakes).
So,an hour range starting at a half hour boundary might fall entirely within a day, entirely outside that day, or half in that day and half before or after it.
All of that neglects time zones, or assumes they're at most recorded for local convenience and/or historical purposes but converted for calculation into UTC or the like.
So, for instants, either they're equal or they're not; but for ranges, they could be equal, or a smaller range entirely within a larger, or partly overlapping, or disjoint. Pick whichever model (instant or range) works best, and be consistent thereafter, and at least the surprises shouldn't be too surprising.
In the generic case where one has conflicting data types to work with, including all sorts of lost precision, the boring answer is that it is context dependent.
Direct user interfacing applications should follow the principle of least surprise, which for an interval search would be to always treat the value as inclusive when in doubt.
For example, "time" can be used as "time of day" or "duration". In my own work we have to measure "from this time, to that time" and yhd result should be duration. [1]
We also use the phrase Time when we mean Timestamp (a date /time combination). And we use "local timezone" implicitly almost everywhere, where we should be storing everything as UTC and then displaying as desired. (this would make comparisons trivial.)
Daylight savings is an abomination since it means time is not contiguous, and the sooner it goes away the better. This is somewhat solved by using UTC when time-stamping.
Overall I prefer storing everything as UTC, then displaying that as preferred by each user (ie as their local time).
[1] should be, but is instead stored as a time field for historical design reasons.
A `date` is a time interval (from midnight to midnight in a given time zone). One could argue that `time` values in computer programs also cover a span, even if it is infinitesimally small for most purposes.
The correct operation to compare intervals of varying lengths is not equality, it is either containment or overlap.
Interesting take from 1999: Erik Naggum: The Long, Painful History of Time
Or is the provocation in measuring the contour of a coast? Can it be measured more precisely (surely it can)? In which case is the provocation one of mere precision?
Or is the provocation one which asks how many fairies may fit on the head of a pin?
Here both of those pieces of information are missing on all three questions, so no answers can be given. Any answer to these questions makes assumptions or deductions about those pieces of information.
It's that "Everything else" that programmers have to deal with.
And then you just draw the rest of the owl.
Unix Epoch is by definition at granularity of seconds. How do you compare 2 epochs where one is in seconds while other in milliseconds. We sort of end up in same comparison game.
Date-of-birth?
Calendar-quarters?
Computer-local time? (No location, only the computer's UTC offset).
- Does this datetime equal this date? No, no it doesn't and never will.
- Does this datetime fall on this date? Yes, it might.
- Is this date, the date of this datetime? Yes, it might be.
First, are we talking about what _is_ or what _should be_?
Second, what is the business logic behind the choice of Date versus Time to represent data (a data point or a segment) in the timeline? And is that logic consistent with the original design, and therefore with how the data have been collected?
There is more, of course. Lots more, some touched upon in the article and comments.
In practical usage, you would probably want to slice off the times and only compare the dates, or use a fuzzier comparison, or just conclude that your input data is crap.
This is important in financial systems. Transactions in Hawaii can legitimately land in a fiscal period that has already closed in New York.
(assuming I’m not planning to need to know the history of the database entry)
date: the concept of a calendar date (Your birthday)
timestamp: the concept of of a calendar datetime (You should get a new years kiss at 2022-01-01 00:00)
timestamptz: the concept of a precise moment in history (This comment was written at xxx time utc)
You can design a system where timestamps/datetimes are considered to be precise moments in time, utc, but that’s a matter of the impmentation you’re dealing with. Again, postgres does not assume timestamps as being in UTC (which has messed me up on more than one occasion).
I just had a long-standing bug come to the fore because I shifted a view to a materialized view, and embedded deep down in it was a date cast from a time stamp with tome zone.
When running as a view, everything was done in the user's time zone (because tits set per connection). When it’s a materialized view, it's he refresher's time zone.
This led to some inconsistencies, as the server is properly but inconveniently in UTC.
Say you are recording logins from a user. A suspicious login may be outside of work hours in local time. Without knowing the local time, you cannot apply this rule.
The article indicates the opposite: this is a real situation (it's explained why they are asking) and like Zach Holman states (somewhat exaggerated, true): unless your software has no users and is only one hour old, you'll find yourself in this situation.
So their problem is real-world, and has a solution. We're supposed to solve the problem without the benefit of the context that they have, which makes it a contrived academic problem, which was my point.
[1]: https://en.wikipedia.org/wiki/The_Nine_Billion_Names_of_God
Dates are Intervals of Timestamp (I put uppercase for clarity).
A Date is [01/06/2020 0h0m0s..23h59mn59.999s]
A Time can be seen as an Interval of Timestamp, albeit of length 0
A Time is [01/06/2020 12:00:00.. 01/06/2020 12:00:00]
With these definitions, it's
no (interval are different)
no [01/06/2020 0h0m0s..1h59mn59.999s] is not included
no the interval is fully included
Originally in the intro to the second star trek movie: https://en.wikipedia.org/wiki/Kobayashi_Maru