https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls...
https://news.ycombinator.com/item?id=39417231 (147 comments)
Now at least there’s an LLM that might spot a bug every now and then so that’s nice.
I wonder what benefits this choice has that outweigh the risks of this behavior.
https://en.wikipedia.org/wiki/Acid3
I like this new lib (Thank You) but the name unfortunately implies the opposite of what it is. "Whenever" sounds like you don't care, but you'd only be using this if you did care! Also Shakira, haha. Hmm, pedantic is taken. Timely, precise, punctual, meticulous, ahorita, pronto, etc. I like that temporal name.
Finally, none of these links mention immutability, but it should be mentioned at the top.
I learned the hard way, that dependencies kill projects.
Not saying this isn't great, thanks for creating it! It does have its use cases, of course.
I work in healthcare. If I have a choice between "reading docs/changelogs carefully, implementing functions", and "adding an extra dependency", I'm taking the dependency every single time.
I don't want footguns in my code, I don't want code I have to write and test myself, and I don't want to have to become an expert in a domain before I can write something that serves my purpose.
For the datetime library, specifically, I'm switching to whenever for everything, because I've been bitten by conversions and naive/aware datetime confusion too many times.
Link to Tom Scott & Computerphile from 10y ago on tz madness. https://www.youtube.com/watch?v=-5wpm-gesOY
This kinda sums up the sorry state of software engineering. People can't even be bothered to read docs but will just blindly install a package just because someone was able to package it and upload it to PyPI.
Taking on a dependency does not relieve you of reading docs, but it also adds a further burden as you now need to trust the code. The stdlib is much more heavily tested and documented than any 3rd party library will be.
That's why I use a Flake8 plugin to prohibit especially egregious footguns.
These things are really frustrating
As long as there is a conscious decision to build or ‘buy’, it’s fine. I think some people can be a little too careless with adding dependencies though, not realising they can have an equal if not greater maintenance burden.
The only crazier idea I can think of is implementing character encoding conversions myself.
And this for every single problem: time, text, maths, network, parsing, formatting, validating, authenticating...
You know, they import 5 libraries, each of which imports 5 more libraries, each of which imports 5 more libraries, and suddenly they're buried in 'critical' updates because there's a denial-of-service bug in the date parser used by the yaml parser used by the configuration library used by the logging library used by the application.
E.g the JS project that uses the stdlib Date API, and pulls in moment.js, and also uses date-fns.
Or the one that pulls in bits and pieces of lodash, ramda, and other functional libraries.
And maybe it uses native fetch and axios depending on the current phase of the moon.
They don’t die but time is wasted in code review trying to understand if there is any kind of deliberate approach behind the scattershot application of packages with duplicated purposes.
(picking on JS is perhaps unfair but it’s probably the most egregious example of dependency hell)
Hard pass. The complexity of having to use binary packages or build things is not worth the performance benefit. The pure-Python version requires building from source and passing special flags, so it is not possible to specify it in requirements.txt.
An issue was closed as not planned: https://github.com/ariebovenberg/whenever/issues/158
You can put any flags in requirements.txt, including -r[equiring] another txt etc.
Your point may apply to modern pyproject.toml tooling though, or at least that it wouldn't be simply another entry in the dependencies array.
Avoid general terms like "Pacific Standard Time" and stick to location-specific ones like: "Vancouver/Canada". The latter is how people expect their time to work, and correctly handles whatever quirky choices jurisdictions choose to do with their time.
Searching the list here: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
I cannot find an entry for "Pacific Standard Time" nor "Vancouver/Canada", but I can see: "America/Vancouver".
Then it would have been nice to see the benchmarks of the pure Python implementation as well. What if it's worse than arrow?
> In casual benchmarks, the pure-Python version is about 10x slower than the Rust version, making it 5x slower than the standard library but still (in general) faster than Pendulum and Arrow.
"(in general)" here since the speed compares differently per operation, while the Rust version is faster across the board. That said, there's no operation that is _significantly_ (or unnecessarily) slower than Arrow or Pendulum.
edit: I'm considering adding comparison to the pure Python version once I get the time for a more expanded "benchmarks" page in the docs
Almost all of the time UTC is enough, if I need to filter/bucket/aggregate by some range, I can reach for datetime with tz for these filter/bucket/aggregate criteria, convert them to UTC and on continues `int` comparison.
I'd imagine all of the cases handled by Whenever are mostly when datetime is a long lived object, which I don't see a need for at all.
I use it purely for allowing tz input from client, convert to UTC immediately when it arrives, or, if I really need the tz, then save it separately, which is rare (one example is calendar, where tz should be stored, although probably not even next to every UTC but at the user level, another is workforce scheduling, where 8am-4pm or 8pm-4am can mean different things for different locations -- but this is no longer datetime, it's purely time in a timezone).
And so anything server-related with calendars will be making tons of these conversions constantly. And you can't cache things long-term in UTC because the conversions of future events can change, when countries change DST etc.
So you would not store that in UTC but just in time.
But yes, I’m ignoring the standard of calendar formats , maybe they are simpler .
I read through the article listing all the weirdness of other datetime libraries and I’d say many were covering cases where you behave that timezoned datetime is long lived .
One case even pointed out datetime construction with an impossible hour.
I am currently enjoying DST-free life in Japan, and feel that people around the world deserve to get this much respect from their own official clocks.
(Schools tend to have earlier times. It's not so unusual for a school's workday to have its midpoint at about noon, I think.)
I've always been in favour of keeping the clocks at non-DST all year, but now I have a new proposal: keep them at DST and just hibernate in the winter. Work an hour or two less in the winter when it's miserable.
Maybe adjust the work schedule to e.g. start at 8 instead of 9?
Rather than mess with the actual clock.
And for program code, it wouldn’t really help as long as it’s still expected to be able to correctly handle dates in the past.
I am an amateur dev, though, so maybe someone who masters the language will be better off using the raw standard libraries.
I'm sure I'm in the top 1% of software devs for the most number of timestamps parsed. [1]
DST is not a problem in Python. It's parsing string timestamps. All libraries are bad, including this one, except Pandas. Pandas does great at DST too btw.
And I'm not shilling for Pandas either. I'm a Polars user who helicopters Pandas in whenever there's a timestamp that needs to be parsed.
Pandas has great defaults. Here's string timestamps I expect to be paesed by default. I'm willing to pass timezone in case of naive timestamps:
* All ISO 8601 formats and all its weird mutant children that differ by a tiny bit.
* 2025-05-01 (parsed not as date, but as timestamp)
* 2025-05-01 00:00:00 (or 00.0 or 00.000 or 0.000000 etc)
* 2025-05-01 00:00:00z (or uppercase Z or 00.0z or 00.000z or 0.000000z)
* 2025-05-01 00:00:00+02:00 (I don't need this converted to some time zone. Store offset if you must or convert to UTC. It should be comparable to other non naive timestamps).
* 2025-03-30 02:30:00+02:00 (This is a non existent timestamp wrt European DST but a legitimate timestamp in timestamp representation, therefore it should be allowed unless I specify CET or Europe/Berlin whatever)
* There's other timestamps formats that are non standard but are obvious. Allow for a Boolean parameter called accept_sensible_string_parsing and then parse the following:
\* 2025-05-01 00:00 (HH:mm format)
\* 2025-05-01 00:00+01:00 (HH:mm format)
[1] It's not a real statistic, it's just that I work with a lot of time series and customer data.Disclaimer: I'm on the phone and on the couch so I wasn't able to test the lib for its string parsing before posting this comment.
Javascript's big datetime redesign (Temporal) has an interesting overview of the decisions they made [1]. Whenever is currently undergoing an expansion of ISO support as well, if you'd like to chime in [2].
[1] https://tc39.es/proposal-temporal/#sec-temporal-iso8601gramm... [2] https://github.com/ariebovenberg/whenever/issues/204#issueco...
Your customers are software devs like me. When we're in control of generating timestamps, we know we must use standard ISO formatting.
However, what do I do when my customers give me access to an S3 bucket with 1 billion timestamps in an arbitrary (yet decipherable) format?
In the GitHub issue you seem to have undergone an evolution from purity to pragmatism. I support this 100%.
What I've also noticed is that you seem to try to find grounding or motivation for "where to draw the line" from what's already been done in Temporal or Python stdlib etc. This is where I'd like to challenge your intuitions and ask you instead to open the flood gates and accept any format that is theoretically sensible under ISO format.
Why? The damage has already been done. Any format you can think of, already exists out there. You just haven't realized it yet.
You know who has accepted this? Pandas devs (I assume, I don't them). The following are legitimate timestamps under Pandas (22.2.x):
* 2025-03-30T (nope, not a typo)
* 2025-03-30T01 (HH)
* 2025-03-30 01 (same as above)
* 2025-03-30 01 (two or more spaces is also acceptable)
In my opinion Pandas doesn't go far enough. Here's an example from real customer data I've seen in the past that Pandas doesn't parse.
* 2025-03-30+00:00 (this is very sensible in my opinion. Unless there's a deeper theoretical regex pattern conflicts with other parts of the ISO format)
Here's an example that isn't decipherable under a flexible ISO interpretation and shouldn't be supported.
* 2025-30-03 (theoretically you can infer that 30 is a day, and 03 is month. BUT you shouldn't accept this. Pandas used to allow such things. I believe they no longer do)
I understand writing these flexible regexes or if-else statements will hurt your benchmarks and will be painful to maintain. Maybe release them under an new call like `parse_best_effort` (or even `youre_welcome`) and document pitfalls and performance degradation. Trust me, I'd rather use a reliable generic but slow parser than spend hours writing a write a god awful regex that I will only use once (I've spent literal weeks writing regexes and fixes in the last decade).
Pandas has been around since 2012 dealing with customer data. They have seen it all and you can learn a lot from them. ISOs and RFCs when it comes to timestamps don't mean squat. If possible try to make Whenever useful rather than fast or pure. I'd rather use a slimmer faster alternative to pandas for parsing Timestamps if one is available but there aren't any at the moment.
If time permits I'll try to compile a non exhaustive list of real world timestamp formats and post in the issue.
Thank you for your work!
P.S. seeing BurntSushi in the GitHub issue gives me imposter syndrome :)