[Datetime-SIG] PEP-495 - Strict Invalid Time Checking

Tue Aug 25 19:56:07 CEST 2015

On 25 August 2015 at 21:51, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:

> On Tue, Aug 25, 2015 at 9:30 AM, Stuart Bishop <stuart at stuartbishop.net>
> wrote:
>>
>> As mentioned elsewhere, pytz requires strict checking to remain
>> backwards compatible.
>
> Can you provide the specific examples where strict checking is required?

Systems where it is better to fail than continue with incorrect
results. For example, ingesting transaction logs. It is more desirable
for the script parsing the log files to fail with a traceback than to
feed incorrect results into the rest of the system, where some poor
DBA is going to have to repair the cascade of damage months or years
later.

Yes, pytz already has a disambiguation solution. I would love to be
able to deprecate it, and encourage users to use stdlib as much as
possible.

> Since pytz already has a disambiguation solution, PEP 495 is not as
> indispensable for it as it is for datetime or dateutil.  However, there is
> one case I can think of where pytz will benefit: with the PEP, it will be
> possible to make say Eastern.localize(datetime.now()) work correctly at all
> times.  If for backward compatibility, you want to continue raising
> AmbiguousTimeError during one hour each year, I am sure you will figure out
> how to make Eastern.localize(datetime.now(), isdst=None).  (Hint: Don't
> change anything in this branch of your code.)

Yes, that is nice.

What would be even nicer is if users didn't have to use localize at all:

>> datetime.now(tz=pytz.timezone('US/Eastern'))

> I assume you refer to "Another suggestion was to use first=-1 or first=None
> to indicate that
> the program truly has no means to deal with the folds and gaps and
> dt.utcoffset() should raise an error whenever dt represents an
> ambiguous or missing local time."
>
> It looks like you want to make it impossible to construct invalid dt
> instances.  In other words, you want to make dt.replace(fold=-1) or
> dt.replace(tzinfo=Eastern) raise an error under certain circumstances.  Is
> this right?

Right. When a user requests that exceptions are raised, it becomes
impossible to construct invalid dt instances. This does require
calling pytz Python code from the datetime constructor, which you
discuss below.

>> pytz users need to optionally have exceptions raised
>> when they try to construct an invalid or ambiguous datetime instance,
>
> This is a legitimate need, by why does it need to be done in datetime rather
> than in pytz itself?  You already ignore the datetime(..., tzinfo=...)
> constructor and require your users to call localize() instead.  What stops
> you from providing a function strict_datetime() that will perform any checks
> that you or your users desire?

My goal is to have no pytz specific API, or at least minimize it and
make it unnecessary for the most common use cases. There will be much
less confusion, and it will be easier for large projects to migrate to
stdlib if it meets their needs.

>> directly via __new__ or indirectly with something like  dt.replace().
>
> __new__ and .replace() are low level methods called in many performance
> critical places.  We cannot afford to call arbitrary python code in those
> methods.

This seems to be the crux of the issue.

The datetime constructor needs to call into the tzinfo implementation,
in the same way as converting from utc time invokes
tzinfo.fromutc(dt). If it did this, pytz could swap in the correct
fixed offset tzinfo and, if requested, perhaps raise an exception. The
localize method would be gone entirely, the biggest half of pytz'
problematic API. The datetimes filtered through pytz would always be
valid, and dt.utcoffset() will never raise an exception causing
confusing failures.

Is the overhead on calling a method on the tzinfo that bad? If the
tzinfo implementation does not override it, it should still be fast.
pytz users are already paying the overhead in the form of the localize
method (which seems about 20x slower than just using datetime.now()
unwrapped, 20usec vs 2usec according to timeit. But if you cared, you
would be using time.time() at 0.08usec).

Is there some way of me providing a hook that doesn't cause major
slowdowns for non pytz users?

For what its worth, the only complaints I've ever had about
performance with pytz have been about how long it took to import the
package. I'm not sure what sort of application instantiates so many
timezone aware datetime instances that constructor overhead becomes
noticeable. All examples I can come up with that create timestamps so
rapidly would never add timezone information, and would be using a
float or long for further optimization in any case.

>> If these methods are called with first=None, it will be passed through
>> to dt.utcoffset() and it may raise an exception.
>
> This part I don't understand.  If __new__ raises an exception - you will
> have no instance to "be passed through to dt.utcoffset()."

Sorry. I'm mixing up tzinfo.utcoffset() and dt.utcoffset() here.

>  .astimezone() does not and need not "allow the user to specify which side
> of the fold in the target zone."  As long as it knows how to interpret the
> time that it is given (disambiguate the fold and "normalize" the gap) it
> should be able to set the fold=1 attribute correctly in the result if it
> happen to fall into the repeated hour and should never produce a time that
> is in the gap of the target zone.

Yes. I got this backwards sorry.

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/