It's been a while since I compiled python and run the test suite.
Now that I'm starting to work in a patch, wanted to do that as a
sanity check, and even "./configure" finishing ok, "make" fails :/
(I'm on Ubuntu Trusty, all packages updated, all dependencies
Full trace here:
Before going into a deep debug, I thought about sending a mail to see
if anybode else hit this issue, if it's a common problem, if there's a
In issue23903, I've created a script that will produce PC/python3.def
by scraping the header files in Include. There are are many many
discrepencies between what my script generates and what is currently
in the repository (diff below), but in every case I've checked the
script has been right: what the script finds is actually exported as
part of the limited API, but due to not being in the .def file it's
not actually exported from python3.dll. Almost all of the differences
are things that the script found that weren't present, but there are a
couple things going the other way.
The point of this message is to ask everybody who maintains anything
in C to take a look through and make sure everything in their area is
properly guarded (or not) by Py_LIMITED_API. Alternately, if somebody
can find a bug in my script and brain that's finding too much stuff,
that would be great too.
Ideally, after this is all settled I'd like to add the script to both
the Makefile and the Windows build system, such that PC/python3.def is
always kept up to date and flags changes that weren't meant to be
(I'm afraid Gmail might mangle this beyond recognition, you can find the diff at
if it does.)
diff -r 24f2c0279120 PC/python3.def
--- a/PC/python3.def Mon Apr 13 15:51:59 2015 -0500
+++ b/PC/python3.def Mon Apr 13 16:10:34 2015 -0500
@@ -1,13 +1,15 @@
; This file specifies the import forwarding for python3.dll
; It is used when building python3dll.vcxproj
+; Generated by python3defgen.py, DO NOT modify directly!
@@ -39,7 +41,6 @@
@@ -58,6 +59,7 @@
+ PyCmpWrapper_Type=python35.PyCmpWrapper_Type DATA
@@ -68,6 +70,7 @@
@@ -122,6 +125,7 @@
@@ -132,14 +136,25 @@
@@ -171,12 +186,21 @@
+ PyExc_BlockingIOError=python35.PyExc_BlockingIOError DATA
+ PyExc_BrokenPipeError=python35.PyExc_BrokenPipeError DATA
+ PyExc_ChildProcessError=python35.PyExc_ChildProcessError DATA
+ PyExc_ConnectionAbortedError=python35.PyExc_ConnectionAbortedError DATA
+ PyExc_ConnectionError=python35.PyExc_ConnectionError DATA
+ PyExc_ConnectionRefusedError=python35.PyExc_ConnectionRefusedError DATA
+ PyExc_ConnectionResetError=python35.PyExc_ConnectionResetError DATA
+ PyExc_FileExistsError=python35.PyExc_FileExistsError DATA
+ PyExc_FileNotFoundError=python35.PyExc_FileNotFoundError DATA
@@ -185,18 +209,23 @@
+ PyExc_InterruptedError=python35.PyExc_InterruptedError DATA
+ PyExc_IsADirectoryError=python35.PyExc_IsADirectoryError DATA
- PyExc_MemoryErrorInst=python35.PyExc_MemoryErrorInst DATA
+ PyExc_NotADirectoryError=python35.PyExc_NotADirectoryError DATA
+ PyExc_PermissionError=python35.PyExc_PermissionError DATA
+ PyExc_ProcessLookupError=python35.PyExc_ProcessLookupError DATA
+ PyExc_ResourceWarning=python35.PyExc_ResourceWarning DATA
@@ -205,6 +234,7 @@
+ PyExc_TimeoutError=python35.PyExc_TimeoutError DATA
@@ -215,6 +245,7 @@
+ PyExc_WindowsError=python35.PyExc_WindowsError DATA
@@ -242,10 +273,12 @@
@@ -253,8 +286,10 @@
@@ -310,10 +345,18 @@
@@ -327,9 +370,15 @@
@@ -343,6 +392,7 @@
@@ -355,6 +405,7 @@
@@ -367,6 +418,7 @@
@@ -393,6 +445,7 @@
@@ -405,6 +458,7 @@
@@ -431,9 +485,10 @@
- PyObject_Type=python35.PyObject_Type DATA
@@ -474,8 +529,8 @@
@@ -484,9 +539,11 @@
@@ -503,6 +560,24 @@
@@ -561,34 +636,51 @@
@@ -599,30 +691,28 @@
- PyWeakref_GetObject=python35.PyWeakref_GetObject DATA
@@ -633,6 +723,8 @@
@@ -660,44 +752,95 @@
+ Py_hexdigits=python35.Py_hexdigits DATA
+ _PyMethodWrapper_Type=python35._PyMethodWrapper_Type DATA
+ _PyNamespace_Type=python35._PyNamespace_Type DATA
+ _PyNone_Type=python35._PyNone_Type DATA
+ _PyNotImplemented_Type=python35._PyNotImplemented_Type DATA
+ _Py_DumpTraceback=python35._Py_DumpTraceback DATA
+ _Py_DumpTracebackThreads=python35._Py_DumpTracebackThreads DATA
+ _Py_HashSecret_Initialized=python35._Py_HashSecret_Initialized DATA
+ _Py_RefTotal=python35._Py_RefTotal DATA
- _Py_TrueStruct=python35._Py_TrueStruct DATA
I wrote PEP-431 two years ago, and never got around to implement it.
This year I got some renewed motivation after Berker Peksağ made an
effort of implementing it.
I'm planning to work more on this during the PyCon sprints, and also
have a BoF session or similar during the conference.
Anyone interested in a session on this, mail me and we'll set up a
time and place!
If anyone is interested in the details of the problem, this is it.
The big problem is the ambiguous times, like 02:30 a time when you
move the clock back one hour, as there are two different 02:30's that
day. I wrote down my experiences with looking into and trying to
implement several different solutions. And the problem there is
actually how to tell the datetime if it is before or after the
== How others have solved it ==
=== dateutil.tz: Ignore the problem ===
dateutil.tz simply ignores the problems with ambiguous datetimes, keeping them
=== pytz: One timezone instance per changeover ===
Pytz implements ambiguous datetimes by having one class per timezone. Each
change in the UTC offset changes, either because of a DST changeover, or because
the timezone changes, is represented as one instance of the class.
All instances are held in a list which is a class attribute of the timezone
class. You flag in which DST changeover you are by uising different instances
as the datetimes tzinfo. Since the timezone this way knows if it is DST or not,
the datetime as a whole knows if it's DST or not.
- Only known possible implementation without modifying stdlib, which of course
was a requirement, as pytz is a third-party library.
- DST offset can be quickly returned, as it does not need to be calculated.
- A complex and highly magical implementation of timezones that is hard to
- Required new normalize()/localize() functions on the timezone, and hence
the API is not stdlib's API.
- Hundreds of instances per timezone means slightly more memory usage.
== Options for PEP 431 ==
=== Stdlib option 0: Ignore it ===
I don't think this is an option, really. Listed for completness.
=== Stdlib option 1: One timezone instance per changeover ===
Option 1 is to do it like pytz, have one timezone instance per changeover.
However, this is likely not possible to do without fundamentally changing the
datetime API, or making it very hard to use.
For example, when creating a datetime instance and passing in a tzinfo today
this tzinfo is just attached to the datetime. But when having multiple
instances of tzinfos this means you have to select the correct one to pass in.
pytz solves this with the .localize() method, which let's the timezone
class choose which instance to pass in.
We can't pass in the timezone class into datetime(), because that would
require datetime.__new__ to create new datetimes as a part of the timezone
arithmetic. These in turn, would create new datetimes in __new__ as a part of
the timezone arithmetic, which in turn, yeah, you get it...
I haven't been able to solve that issue without either changing the API/usage,
or getting infinite recursions.
- Proven soloution through pytz.
- Fast dst() call.
- Trying to use this technique with the current API tends to create
infinite recursions. It seems to require big API changes.
- Slow datetime() instance creation.
=== Stdlib option 2: A datetime _is_dst flag ===
By having a flag on the datetime instance that says "this is in DST or not"
the timezone implementation can be kept simpler.
You also have to either calculate if the datetime is in a DST or not either
when creating it, which demands datetime object creations, and causes infinite
recursions, or you have to calculate it when needed, which means you can
get "Ambiguous date time errors" at unexpected times later.
Also, when trying to implement this, I get bogged down in the complexities
of how tzinfo and datetime is calling each other back and forth, and when
to pass in the current is_dst and when to pass in the the desired is_dst, etc.
The API and current implementation is not designed with this case in mind,
and it gets very tricky.
- Simpler tzinfo() implementations.
- It seems likely that we must change some API's.
- This in turn may affect the pytz implementation. Or not, hard to say.
- The DST offset must use slow timezone calculations. However, since datetimes
are immutable it can be a cached, lazy, one-time operation.
=== Stdlib option 3: UTC internal representation ===
Having UTC as the internal representation makes the whole issue go away.
Datetimes are no longer ambiguous, except when creating, so checks need to
be done during creation, but that should be possible without datetime creation
in this case, resolving the infinite recursion problem.
- Problem solved.
- Minimal API changes.
- Backwards compatibility with pickles.
- Possible other backwards incompatibility problems.
- Both DST offset and date time display representation must use slow timezone
calculations. However, since datetimes are immutable it can be a cached,
lazy, one-time operation.
I'm currently trying to implement solution #2 above. Feedback is welcome.
On 15-04-15, Akira Li <4kir4.1i(a)gmail.com> wrote:
> Isaac Schwabacher <ischwabacher(a)wisc.edu> writes:
> > ...
> > I know that you can do datetime.now(tz), and you can do datetime(2013,
> > 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able
> > to add a time zone to an existing naive datetime is painful (and
> > strptime doesn't even let you pass in a time zone).
> `.now(tz)` is correct. `datetime(..., tzinfo=tz)`) is wrong: if tz is a
> pytz timezone then you may get a wrong tzinfo (LMT), you should use
> `tz.localize(naive_dt, is_dst=False|True|None)` instead.
The whole point of this thread is to finalize PEP 431, which fixes the problem for which `localize()` and `normalize()` are workarounds. When this is done, `datetime(..., tzinfo=tz)` will be correct.
On 15-04-10, Stuart Bishop wrote:
> On 10 April 2015 at 17:12, Nick Coghlan <ncoghlan(a)gmail.com> wrote:
> > The question of "store the DST flag" vs "store the offset" is essentially a
> > data normalisation one - there's only a single bit of additional information
> > actually needed (whether the time is DST or not in the annual hour of
> > ambiguity), which can then be combined with the local time, the location and
> > the zone info database to get the *actual* offset.
One thing that hasn't been mentioned is that is_dst and offset are not parallel on the datetime.time class, since you have no information about when in the history of the time zone a time was recorded. If you only record the time zone and is_dst flag, then updating zoneinfo can change the UTC time corresponding to existing aware time objects. Whereas if you only store offsets, it can change whether a time is interpreted as being DST or not (or whether the offset is considered valid at all). Which is mostly to say that aware time objects that don't also store dates are probably just a bad idea in general.
> The dst flag must be stored in the datetime, either as a boolean or
> encoded in the timezone structure. If you only have the offset, you
> lose interoperability with the time module (and the standard IANA
> zoneinfo library, posix etc., which it wraps). (dt +
> timedelta(hours=1)).ctime() will give an incorrect answer when you
> cross a DST transition.
>From local time + offset you can compute UTC time and from there you can lookup in the tzinfo whether it's DST or not. But yes, I'm starting to be less enamored of this idea. I get that it's basically the same as doing everything in UTC and caching local time, but I wasn't thinking about the fact that so many existing APIs need (local time, is_dst) that the friction is a problem.
> > A question the PEP perhaps *should* consider is whether or not to offer an
> > API allowing datetime objects to be built from a naive datetime, a fixed
> > offset and a location, throwing NonExistentTimeError if the given date, time
> > and offset doesn't match either the DST or non-DST times at that location.
> I don't think you need a specific API, apart from being able to
> construct a tzinfo using nothing but an offset (lots of people require
> this for things like parsing email headers, which is why pytz has the
> FixedOffset class).
This doesn't work:
>>> from datetime import *
ValueError: astimezone() cannot be applied to a naive datetime
But also, how is this different from needing to know the offset in order to construct an aware datetime?
I know that you can do datetime.now(tz), and you can do datetime(2013, 11, 3, 1, 30, tzinfo=zoneinfo('America/Chicago')), but not being able to add a time zone to an existing naive datetime is painful (and strptime doesn't even let you pass in a time zone). Some of the people who designed the instruments that we use to collect data understood why we would care when it was collected, and some of them didn't, and I need to be able to handle their data regardless. (The makers of one device with subsecond resolution opted for maximum compatibility with Microsoft Excel by writing a CSV with times as naive days since Dec. 30, 1899 to five decimal digits, but I doubt there's anything to be done for them.)
In addition, it would be nice to able to say that going back and forth between aware and naive datetimes is evil and you should never do it, but it's a necessity if you want to be able to implement relative timedeltas in some form. Unless the stdlib wants to step up with a be-all, end-all implementation of "this time tomorrow" and friends, there shouldn't be unnecessary friction here.
> > P.S. The description of NonExistentTimeError in the PEP doesn't seem quite
> > right, as it currently says it will only be thrown if "is_dst=None", which
> > seems like a copy and paste error from the definition of AmbiguousTimeError.
> The PEP is correct here. If you explicitly specify is_dst, you know
> which side of the transition you are on and which offset to use. You
> can calculate datetime(2000, 12, 31, 2, 0, 0, 0).astimezone(foo,
> is_dst=False) using the non-DST offset and get an answer. It might not
> have ever appeared on the clock on your wall, but it is better than a
> punch in the face. If you want a punch in the face, is_dst=None
> refuses to guess and you get the exception.
Returning a DST time precisely when is_dst=False is passed isn't a punch in the face? I mean, I get what you're saying and I agree that "given that this time is skipped, which offset do I want to interpret it in?" is the right question to ask, but this is horribly counterintuitive to anyone who hasn't spent a lot of time pondering it. If nonexistent times are going to behave that way, the API needs to have a better solution to Naming Things.
> (Except for those cases where the timezone offset is changed without a
> DST transition, but that is rare enough everyone pretends they don't
As far as method parameters go, you might as well just say the first time is DST and the second is STD and explain in the docs that "is_dst" is just a mnemonic. But if you have an is_dst flag on your datetimes, suddenly this is an issue.
On 15-04-08, Lennart Regebro wrote:
> === Stdlib option 2: A datetime _is_dst flag ===
> By having a flag on the datetime instance that says "this is in DST or not"
> the timezone implementation can be kept simpler.
Storing the offset itself instead of a flag makes things conceptually cleaner. You get a representation that's slightly harder to construct from the sorts of information you have lying around (tz name, naive datetime, is_dst flag) but is no harder to construct *and* validate, and then is easier to work with and harder to misuse. As an added bonus, you get a representation that's still meaningful when time changes happen for political rather than DST reasons.
- tzinfo.utcoffset() and local->UTC conversions don't require zoneinfo access.
- it's harder to represent "I know this time is DST but I don't know what tz it's in" [I put this in pro because I don't see how this kind of ambiguity can lead to anything but trouble, but ymmv]
- this representation is meaningful regardless of whether a time zone has DST
- this representation meaningfully disambiguates across time changes not resulting from DST
- tzinfo.dst() requires zoneinfo access
- tzinfo.tzname() requires zoneinfo access IF you want those horrible ambiguous abbreviations (is CST America/Chicago or America/Havana?) [I really wanted to put this in pro]
- (datetime, offset, tz) triples require validation [but so do (datetime, is_dst, tz) triples, even though it's easy to pretend they don't]
- constructing an aware datetime from a naive one, an is_dst flag, and a time zone requires zoneinfo access [but this is needed for the validation step anyway]
On 15-04-08, Alexander Belopolsky wrote:
> With datetime, we also have a problem that POSIX APIs don't have to deal with: local time
> arithmetics. What is t + timedelta(1) when t falls on the day before DST change? How would
> you set the isdst flag in the result?
It's whatever time comes 60*60*24 seconds after t in the same time zone, because the timedelta class isn't expressive enough to represent anything but absolute time differences (nor should it be, IMO). But it might make sense to import dateutil.relativedelta or mxDateTime.RelativeDateTime into the stdlib to make relative time differences (same local time on the next day, for instance) possible.
On 15-04-15, Lennart Regebro wrote:
> On Wed, Apr 15, 2015 at 12:40 PM, Isaac Schwabacher
> <ischwabacher(a)wisc.edu> wrote:
> > I am on the fence about EPOCH + offset as an internal representation. On the one hand, it is conceptually cleaner than the horrible byte-packing that exists today, but on the other hand, it has implications for future implementations of leap-second-awareness. For instance, offset measures the difference between the local time and UTC. But is it honest-to-goodness leap-second aware UTC, or is it really Unix time? This could be solved by giving each tzinfo a pointer to the UTC from which it is offset, but that sounds like a can of worms we don't want to open. But if don't and we store EPOCH + offset, we can't add leap second awareness to UTC without corrupting all persisted aware datetimes.
> That's true, with separate values like there is now we can easily
> allow 23:59:60 as a timestamp during leap seconds. I'm not entirely
> sure it makes a big difference though, I don't think we ever wants to
> deal with leap seconds by default.
> I don't think we ever want the standard arithmetic deal with leap
> seconds anyway.
> datetime(2012, 6, 30, 23, 30) + timedelta(seconds=3600) returning
> datetime(2012, 7, 1, 0, 29, 59)
> I guess leap second implementations should rather have special
> functions for arithmethics that deal with this.
You need relative timedeltas to mitigate the pain of leap seconds, yes. But as soon as you have timedeltas that are capable of representing "this number of seconds into the next minute" ("one minute") as opposed to "sixty seconds", this isn't so much of a problem. Though of course subtraction will (and should!) continue to yield timedelta(seconds=3601).
>From my perspective, the issue is more that the stdlib shouldn't rule out leap seconds. It's reasonable enough to expect users who actually want them to write appropriate tzinfo and timedelta classes, but we don't want to make that impossible the way the old tzinfo interface made DST-aware time zones impossible by insisting that implementers implement a function that wasn't mathematically a function.
I need to think about this more before I can get a real rant going.
On 15-04-15, Lennart Regebro wrote:
> On Wed, Apr 15, 2015 at 11:10 AM, Chris Angelico <rosuav(a)gmail.com> wrote:
> > Bikeshed: Would arithmetic be based on UTC time or Unix time? It'd be
> > more logical to describe it as "adding six hours means adding six
> > hours to the UTC time", but it'd look extremely odd when there's a
> > leap second.
> It would ignore leap seconds. If you want to call that unix time or
> not is a matter of opinion. Hm. I guess the internal representation
> *could* be EPOCH + offset, and local times could be calculated
> properties, which could be cached (or possibly calculated at
I am on the fence about EPOCH + offset as an internal representation. On the one hand, it is conceptually cleaner than the horrible byte-packing that exists today, but on the other hand, it has implications for future implementations of leap-second-awareness. For instance, offset measures the difference between the local time and UTC. But is it honest-to-goodness leap-second aware UTC, or is it really Unix time? This could be solved by giving each tzinfo a pointer to the UTC from which it is offset, but that sounds like a can of worms we don't want to open. But if don't and we store EPOCH + offset, we can't add leap second awareness to UTC without corrupting all persisted aware datetimes.
Also, I didn't mention this before because I figured people were getting sick of my dumb idea, but another advantage of always caching the offset is that you can detect future datetimes that have been corrupted by zoneinfo changes. You need both absolute time and offset to be able to do this.
I'm working on adding a numeric_owner parameter to some tarfile methods
In a review, Berker suggested making the parameter keyword-only. I agree
that you'd likely never want to pass just "True", but that
"numeric_owner=True" would be a better usage.
But, I don't see a lot of keyword-only parameters being added to stdlib
code. Is there some position we've taken on this? Barring someone saying
"stdlib APIs shouldn't contain keyword-only params", I'm inclined to
make numeric_owner keyword-only.
Is there anything stopping me from making it keyword-only?