Python Enhancement Proposal 495: Local Time Disambiguation (opens in new tab)

(python.org)

51 pointsphilipn10y ago35 comments

35 comments

27 comments · 6 top-level

kbenson10y ago· 10 in thread

I'm not sure the rationale behind this. That is, the rationale section of the proposal does a poor just of explaining any case where this is actually a problem.

In every case where I've seen this problem, it's a matter of people either not storing the timezone along with the local time, or not storing in UTC time. A local time with a timezone is a unique time, it does not occur twice. A UTC time additionally does not occur twice. Store a time zone along with the date and time or store in UTC and convert on use.

Note: If there are instances where a second is repeated, it's rare special occurrence that developing a formalized interface for seems like overkill.

deathanatos10y ago

> A local time with a timezone is a unique time, it does not occur twice.

If you define timezone as an IANA timezone, this is incorrect: a whole slew of local times repeat during a DST fallback event: you'll have a (1:30 AM (dst=True), America/New_York), and then a (1:30 AM (dst=False), America/New_York); that "dst=True|False" bit is the only difference, and that needs to get stored. If you consider "America/New_York" to be the TZ, then storing that bit on the TZ isn't appropriate, as it depends on a particular timestamp.

If you've ever worked with PyTZ, there's a sort of rule of "just call normalize() always"; otherwise, you'll get funny answers to some introspections on the datetime instance: things like the offset being not what a local would say the offset should be. My understanding is that pytz stores the dst flag on the timezone instance itself; things get funny because the timezone instance is not given a chance to update after arithmetic on the datetime instance.

(Really, I feel like the whole thing would work better if there was a separate class for "instant in time" and a function for, "convert this instant in time to Gregorian year/month/day/etc. in this TZ", which then returned a broken-out-type. (And a reverse, of course, for building "Instant" instances.))

UTC datetime + IANA TZ (if relevant) is the way to go. Alas, not all data is so nice.

kbenson10y ago

Really I was thinking of it as a distinct timezone that must be tracked. Whether the DST and non-DST versions label themselves as such, the representation used to track time must distinguish whether DST is active or not to display the correct local time. Really, when I say store local time + timezone, I mean local time plus identifier that gets you to the same unique timezone representation in your medium (python, in this case).

Personally, I just always convert to UTC and store that. It changes the problem from one of data fidelity to display or computation annoyance, and annoyances are easy to reduce or eliminate with tooling.

1 more reply

mark-r10y ago

Note that the proposal doesn't fix arithmetic. Adding a 60 minute timedelta to 1:30 on the transition day will give the same result (2:30, first=True) whether first=True or first=False.

msm2310y ago

I believe the use case is for scheduling systems that need to do something at (say) 1:30am. On every day at 1:30am, please run the accounting job.

In the US, once a year you either have to have some locking system set up to avoid the second run (leave a trace that the job has already been started for the day). I believe the person proposing this thinks that it would make this determination easier.

The problem is that it only solves half the trouble caused by dual timezones. There's another day of the year when 2:30am (in the US) doesn't happen at all. If something is scheduled to occur once per day at 2:30am, then that day it is not going to happen.

There are other workarounds available such as avoiding the magic hours around 2am (in the US). But it seems to be a common problem that everyone seems to keep re-solving.

Storing the timezone with the time doesn't really solve the above issue.

kbenson10y ago

> I believe the use case is for scheduling systems that need to do something at (say) 1:30am. On every day at 1:30am, please run the accounting job.

In the instance where someone wants something to happen at 1:30 AM, if they aren't specific in the specification, then they should expect that it may happen twice or not at all at at certain times of the year. This is an imprecise specification problem, not a problem in representing time in structures that can and do contain timezones. That is, it's a failure of cron, or the user specifying the time, take your pick. What it's not is a failure of dates, times and timezones, which specifically address this problem. Timezones or UTC (which is just timezone offset 0), are what we have to deal with this specific problem.

For example, specifying the originating timezone would disambiguate the time, as would specifying it in UTC, (or automatically converting to and using the UTC equivalent on entry).

> Storing the timezone with the time doesn't really solve the above issue.

It doesn't address the "do this thing at this local time daily" problem when someone chooses a time that has special behavior, but it does address the "do this thing at this offset from UTC daily" which may be the best you can expect when specifying a time for a recurring action and not taking into account timezones. If you want to use local time, you have to deal with either the actually time the job runs possibly shifting slightly throughout the year in some locations, or possibly running twice or not at all.

My real problem with the proposal is that adding a .first() method doesn't solve anything, and really just makes half the problem (it doesn't work if the time doesn't exist), and in a way that's already easily solved, since you can't get a valid result from .first without knowing the timezone already.

1 more reply

robochat4210y ago

This whole working in UTC and transforming the time 'at the last minute' into the timezone is only the current accepted practise precisely because of this problem of the time changing. Ideally, we should be able to just set tzinfo to the timezone location that we want when we construct the time and then we work with that object. Yes, even if we do arithmetic in local time, it should just work.

As far as I read it this pep should eventually lead to a better pytz api. Currently, we have to use the normalize() and localize() functions in order to handle the ambiguous times that happen twice a year when the clocks change, this is ugly and hard to remember to do. I think that the 'first' flag should eventually allow us (once suitable timezones are created in pytz) to do arithmetic with local times and automatically transform between the summer and winter timezones.

mark-r10y ago

I don't think this is going to fix pytz. There's a difference in philosophy - Python expects a tzinfo object that's intelligent and responds to the datetime that's passed into its calls, while pytz expects to be configured once then provides a fixed offset after that.

murbard210y ago

Seconds are not repeated AFAIK. Leap seconds are only added and the "seconds" field goes 58, 59, 60, 00, 01...

I also completely second your point. Naive datetime objects should only ever be used to describe UTC. Anything else should have an explicit timezone attached to it. Doing otherwise is asking for trouble.

deathanatos10y ago

> Seconds are not repeated AFAIK.

In POSIX time, seconds can repeat[1],

> [POSIX] is neither a linear representation of time nor a true representation of UTC […] The Unix time number increases by exactly 86400 each day […] Observe that when a positive leap second occurs (i.e., when a leap second is inserted) the Unix time numbers repeat themselves.

The upside to this is that days are always 86400 seconds "long": computing the start of the next day is simple. (Computing the length is not so much, and computing to-the-second elapsed time is also harder.)

This is somewhat relevant to Python, as the datetime module "ignores" (which I interpret to mean, "repeats the prior second"; I've never watched to see what really happens) leap seconds.

[1]: https://en.wikipedia.org/wiki/Unix_time#Leap_seconds

lifeisstillgood10y ago

Agreed - and more succinctly put than I managed - clearly it's the wrong time for me :-)

IanCal10y ago· 7 in thread

When the clocks change, don't you shift timezone? There aren't duplicate times in BST, we just switch from GMT to BST and then back.

> In these situations, the information displayed on a local clock (or stored in a Python datetime instance) is insufficient to identify a particular moment in time.

Does the datetime instance not store the timezone?

danbruc10y ago

You are always in the same time zone but the zone time changes. BST and GMT are not time zones - time zones are (usually) named by geographic features like large cities and is Europe/London in your case - but zone times. Therefore when entering or leaving daylight saving time you remain in the same time zone but the zone time of your time zone changes.

wyldfire10y ago

It would be nice if those two things which have an important but very subtle distinction had names which were not just the reverse of each other. It seems like despite you explaining this, it would be very easy for one to mistake "zone time" for "time zone".

Tehnix10y ago

Isn't it about summer and winter time? I'm pretty sure you're still in the same timezone in those cases.

That said, I don't see the reason for this PEP. You should only store a value in the DB that doesn't change like that, and then when displayed you can change it to fit the local time of the user.

Marazan10y ago

In Britian during the summer we use the BST timezone and during winter we use GMT.

But once again this is one of those "assumptions about time that everyone gets wrong" just because we shift timezones in Britian doesn't necessarily mean that people in other countries do the same - they might just change what the timezone offset is and keep the timezone the same.

5 more replies

msm2310y ago

Yes, you could get the information from the timezone, but how would one do that in code?

The only time that one has the time fold is when you turn the clock backward (let's just call that shifting from daylight savings to standard timezone). And this would only affect code which used wall clock time (time as it's read, e.g. 1:30am PDT), and would also only affect code which wanted to run something only once at a time within that fold (e.g. 1:30am ... not on both 1:30am's).

So, using the timezone method, just check to see if your current 1:30am is in your daylight savings timezone. Hurray! You're in the clear. Go ahead and do that thing you wanted to do only on the first 1:30am.

But the next day you're going to run into a problem. The only 1:30am you're going to get is in the standard timezone. So now you have to check for this timezone change only on the day of the change, which is yet another piece of data you have to keep track of. On the day of the change, do this timezone comparison, and on every other day don't worry about it.

When the clock hits your interesting time of 1:30am, just check to see if today is the day of the change, check what the current timezone is, check what the daylight savings time zone is, check to see if those those two values are the same, and now do your thing. Otherwise, just do your thing.

All of the above also ignores that people change times at different times (11pm, 1am, 2am, 3am), some don't change a full hour, and some don't change at all.

The proposal gets rid of all of that convoluted logic in everyone's programs, and instead it provides a single boolean value: is this the second time I've seen this time because of daylight savings shenanigans.

IanCal10y ago

> The proposal gets rid of all of that convoluted logic in everyone's programs

Does it? It doesn't cover the scheduling problem the other half of the year when the clocks move the other direction.

> So now you have to check for this timezone change only on the day of the change, which is yet another piece of data you have to keep track of.

If running a job twice is a problem, then why not check that the job has not already been run?

> is this the second time I've seen this time

Is this unambiguous? If it's 2015-10-25-01-30-00 GMT, have I seen that time before? In the UK, yes, in Mali no.

ars10y ago

You don't need to do any of that. Just get a unixtime, and move on.

Why are you doing time math in local time?

Simply do all the time math in universal time and be done with it.

Marazan10y ago· 3 in thread

In general any solution to datetimes that doesn't involve the time being in UTC is solving thew wrong problem.

akvadrako10y ago

I would disagree. UTC is ambiguous and unpredictable. Computers should really be based on GPST (seconds since 1980-Jan-6 UTC).

ubernostrum10y ago

If you want to be pedantic about getting rid of unpredictability in your time measurement, use TAI.

heinrich599110y ago

That's defined as TAI minus some seconds, just go for TAI. :)

toyg10y ago· 1 in thread

Terrible PEP, I hope it gets rejected. One-off flags like this are hacks that shouldn't be in stdlib. It simply stinks, in an area (time handling) where the stdlib does not really smell like roses already.

Dealing with time adjustments is the OS's job, not userland. If your job has to be scheduled exactly and cannot rely on the OS, and you refuse to deal with UTC, it's your own damn fault and you can always just use a long-running process with timers.

bliti10y ago

I agree. It's library baggage.

lifeisstillgood10y ago

So :

Time in UK is currently UTC +1 (BST) At 2am on 25 OCT we will return to GMT / UTC. It will therefore become 1am, and for the next hour all times will have happened before

The idea is to put a bit flag that says "alreadyseenthistime"

It seems to me this is a solution to the wrong problem.

Store all strings as bytes, assuming UTF-8, store all times as longs assuming UTC

If we convert all python datelines to non-naive (ie embedded with a TZ) then we are forced always to choose an encoding just like in strings. The right encoding is to always assume incoming dates are UTC, to throw error if they are non naive, and to assume that local clocks are set correctly (which we do anyway)

I need to read it more carefully - but it seems the wrong solution

mayoff10y ago

The more I'm exposed to other date/time libraries, the more impressed I become with Apple's date/time library.

https://developer.apple.com/library/mac/documentation/Cocoa/...

j / k navigate · click thread line to collapse

35 comments

27 comments · 6 top-level

kbenson10y ago· 10 in thread

I'm not sure the rationale behind this. That is, the rationale section of the proposal does a poor just of explaining any case where this is actually a problem.

Note: If there are instances where a second is repeated, it's rare special occurrence that developing a formalized interface for seems like overkill.

deathanatos10y ago

> A local time with a timezone is a unique time, it does not occur twice.

UTC datetime + IANA TZ (if relevant) is the way to go. Alas, not all data is so nice.

kbenson10y ago

1 more reply

mark-r10y ago

Note that the proposal doesn't fix arithmetic. Adding a 60 minute timedelta to 1:30 on the transition day will give the same result (2:30, first=True) whether first=True or first=False.

msm2310y ago

I believe the use case is for scheduling systems that need to do something at (say) 1:30am. On every day at 1:30am, please run the accounting job.

There are other workarounds available such as avoiding the magic hours around 2am (in the US). But it seems to be a common problem that everyone seems to keep re-solving.

Storing the timezone with the time doesn't really solve the above issue.

kbenson10y ago

> I believe the use case is for scheduling systems that need to do something at (say) 1:30am. On every day at 1:30am, please run the accounting job.

For example, specifying the originating timezone would disambiguate the time, as would specifying it in UTC, (or automatically converting to and using the UTC equivalent on entry).

> Storing the timezone with the time doesn't really solve the above issue.

1 more reply

robochat4210y ago

mark-r10y ago

murbard210y ago

Seconds are not repeated AFAIK. Leap seconds are only added and the "seconds" field goes 58, 59, 60, 00, 01...

deathanatos10y ago

> Seconds are not repeated AFAIK.

In POSIX time, seconds can repeat[1],

This is somewhat relevant to Python, as the datetime module "ignores" (which I interpret to mean, "repeats the prior second"; I've never watched to see what really happens) leap seconds.

[1]: https://en.wikipedia.org/wiki/Unix_time#Leap_seconds

lifeisstillgood10y ago

Agreed - and more succinctly put than I managed - clearly it's the wrong time for me :-)

IanCal10y ago· 7 in thread

When the clocks change, don't you shift timezone? There aren't duplicate times in BST, we just switch from GMT to BST and then back.

> In these situations, the information displayed on a local clock (or stored in a Python datetime instance) is insufficient to identify a particular moment in time.

Does the datetime instance not store the timezone?

danbruc10y ago

wyldfire10y ago

Tehnix10y ago

Isn't it about summer and winter time? I'm pretty sure you're still in the same timezone in those cases.

That said, I don't see the reason for this PEP. You should only store a value in the DB that doesn't change like that, and then when displayed you can change it to fit the local time of the user.

Marazan10y ago

In Britian during the summer we use the BST timezone and during winter we use GMT.

5 more replies

msm2310y ago

Yes, you could get the information from the timezone, but how would one do that in code?

All of the above also ignores that people change times at different times (11pm, 1am, 2am, 3am), some don't change a full hour, and some don't change at all.

IanCal10y ago

> The proposal gets rid of all of that convoluted logic in everyone's programs

Does it? It doesn't cover the scheduling problem the other half of the year when the clocks move the other direction.

> So now you have to check for this timezone change only on the day of the change, which is yet another piece of data you have to keep track of.

If running a job twice is a problem, then why not check that the job has not already been run?

> is this the second time I've seen this time

Is this unambiguous? If it's 2015-10-25-01-30-00 GMT, have I seen that time before? In the UK, yes, in Mali no.

ars10y ago

You don't need to do any of that. Just get a unixtime, and move on.

Why are you doing time math in local time?

Simply do all the time math in universal time and be done with it.

Marazan10y ago· 3 in thread

In general any solution to datetimes that doesn't involve the time being in UTC is solving thew wrong problem.

akvadrako10y ago

I would disagree. UTC is ambiguous and unpredictable. Computers should really be based on GPST (seconds since 1980-Jan-6 UTC).

ubernostrum10y ago

If you want to be pedantic about getting rid of unpredictability in your time measurement, use TAI.

heinrich599110y ago

That's defined as TAI minus some seconds, just go for TAI. :)

toyg10y ago· 1 in thread

bliti10y ago

I agree. It's library baggage.

lifeisstillgood10y ago

So :

Time in UK is currently UTC +1 (BST) At 2am on 25 OCT we will return to GMT / UTC. It will therefore become 1am, and for the next hour all times will have happened before

The idea is to put a bit flag that says "alreadyseenthistime"

It seems to me this is a solution to the wrong problem.

Store all strings as bytes, assuming UTF-8, store all times as longs assuming UTC

I need to read it more carefully - but it seems the wrong solution

mayoff10y ago

The more I'm exposed to other date/time libraries, the more impressed I become with Apple's date/time library.

https://developer.apple.com/library/mac/documentation/Cocoa/...

j / k navigate · click thread line to collapse