Your customers are software devs like me. When we're in control of generating timestamps, we know we must use standard ISO formatting.
However, what do I do when my customers give me access to an S3 bucket with 1 billion timestamps in an arbitrary (yet decipherable) format?
In the GitHub issue you seem to have undergone an evolution from purity to pragmatism. I support this 100%.
What I've also noticed is that you seem to try to find grounding or motivation for "where to draw the line" from what's already been done in Temporal or Python stdlib etc. This is where I'd like to challenge your intuitions and ask you instead to open the flood gates and accept any format that is theoretically sensible under ISO format.
Why? The damage has already been done. Any format you can think of, already exists out there. You just haven't realized it yet.
You know who has accepted this? Pandas devs (I assume, I don't them). The following are legitimate timestamps under Pandas (22.2.x):
* 2025-03-30T (nope, not a typo)
* 2025-03-30T01 (HH)
* 2025-03-30 01 (same as above)
* 2025-03-30 01 (two or more spaces is also acceptable)
In my opinion Pandas doesn't go far enough. Here's an example from real customer data I've seen in the past that Pandas doesn't parse.
* 2025-03-30+00:00 (this is very sensible in my opinion. Unless there's a deeper theoretical regex pattern conflicts with other parts of the ISO format)
Here's an example that isn't decipherable under a flexible ISO interpretation and shouldn't be supported.
* 2025-30-03 (theoretically you can infer that 30 is a day, and 03 is month. BUT you shouldn't accept this. Pandas used to allow such things. I believe they no longer do)
I understand writing these flexible regexes or if-else statements will hurt your benchmarks and will be painful to maintain. Maybe release them under an new call like `parse_best_effort` (or even `youre_welcome`) and document pitfalls and performance degradation. Trust me, I'd rather use a reliable generic but slow parser than spend hours writing a write a god awful regex that I will only use once (I've spent literal weeks writing regexes and fixes in the last decade).
Pandas has been around since 2012 dealing with customer data. They have seen it all and you can learn a lot from them. ISOs and RFCs when it comes to timestamps don't mean squat. If possible try to make Whenever useful rather than fast or pure. I'd rather use a slimmer faster alternative to pandas for parsing Timestamps if one is available but there aren't any at the moment.
If time permits I'll try to compile a non exhaustive list of real world timestamp formats and post in the issue.
Thank you for your work!
P.S. seeing BurntSushi in the GitHub issue gives me imposter syndrome :)