pretty much a year ago I wrote about the optimizations I did for my
PhD thesis that target the Python 3 series interpreters. While I got
some replies, the discussion never really picked up and no final
explicit conclusion was reached. AFAICT, because of the following two
factors, my optimizations were not that interesting for inclusion with
the distribution at that time:
a) Unladden Swallow was targeting Python 3, too.
b) My prototype did not pass the regression tests.
As of November 2010 (IIRC), Google is not supporting work on US
anymore, and the project is stalled. (If I am wrong and there is still
activity and any plans with the corresponding PEP, please let me
know.) Which is why I recently spent some time fixing issues so that I
can run the regression tests. There is still some work to be done, but
by and large it should be possible to complete all regression tests in
reasonable time (with the actual infrastructure in place, enabling
optimizations later on is not a problem at all, too.)
So, the two big issues aside, is there any interest in incorporating
these optimizations in Python 3?
Have a nice day,
PS: It probably is unusual, but in a part of my home page I have
created a link to indicate interest (makes both counting and voting
easier, http://www.ics.uci.edu/~sbruntha/) There were also links
indicating interest in funding the work; I have disabled these, so as
not to upset anybody or make the impression of begging for money...
I would like to get some opinion on possible os.walk improvement.
For the sake of simplicity let's assume I would like to skip all .svn
and tmp directories.
Current solution looks like this:
for t in os.walk(somedir):
... do something
This is a very clever hack but... it relies on internal implementation
Alternative is adding os.walk parameter e.g. like this:
def walk(top, topdown=True, onerror=None, followlinks=False, walkfilter=None)
if walkfilter is not None:
and remove .svn and tmp in the walkfilter definition.
What I do not like here is that followlinks is redundant - easily
implementable through walkfilter
Simpler but braking backward-compatibility option would be:
def walk(top, topdown=True, onerror=None, skipdirs=islink)
- if followlinks or not islink(new_path):
- for x in walk(new_path, topdown, onerror, followlinks):
+ if not skipdirs(new_path):
+ for x in walk(new_path, topdown, onerror, skipdirs):
And user given skipdirs function should return true for new_path
ending in .svn or tmp
Nothing is redundant and works fine with topdown=False!
What do you think? Shall we:
a) do nothing and use the implicit hack
b) make the option explicit with backward compatibility but with
redundancy and topdown=False incompatibility
c) make the option explicit braking backward compatibility but no redundancy
I released the first package of two and PyPI went down while I was
preparing to release the second. I hope it wasn't me?
Oleg Broytman http://phdru.name/ phd(a)phdru.name
Programmers don't die, they just GOSUB without RETURN.
On Tue, 30 Aug 2011 16:22:14 +0200
eric.araujo <python-checkins(a)python.org> wrote:
> changeset: 72127:af0bcccb935b
> user: Éric Araujo <merwok(a)netwok.org>
> date: Tue Aug 30 00:55:02 2011 +0200
> Remove display options (--name, etc.) from the Distribution class.
> These options were used to implement “setup.py --name”,
> “setup.py --version”, etc. which are now handled by the pysetup metadata
> action or direct parsing of the setup.cfg file.
> As a side effect, the Distribution class no longer accepts a 'url' key
> in its *attrs* argument: it has to be 'home-page' to be recognized as a
> valid metadata field and passed down to the dist.metadata object.
I don't want to sound nitpicky, but it's the first time I see
"home-page" hyphenized. How about "homepage"?
Guido has agreed to eventually pronounce on PEP 393. Before that can
happen, I'd like to collect feedback on it. There have been a number
of voice supporting the PEP in principle, so I'm now interested in
comments in the following areas:
- principle objection. I'll list them in the PEP.
- issues to be considered (unclarities, bugs, limitations, ...)
- conditions you would like to pose on the implementation before
acceptance. I'll see which of these can be resolved, and list
the ones that remain open.
Unless I hear any objections, I plan to adjust the current PEP
statuses as follows some time this weekend:
Move from Accepted to Finished:
389 argparse - New Command Line Parsing Module Bethard
391 Dictionary-Based Configuration For Logging Sajip
3108 Standard Library Reorganization Cannon
3135 New Super
Spealman, Delaney, Ryan
Move from Accepted to Withdrawn (with a reference to Reid Kleckner's blog post)
3146 Merging Unladen Swallow into CPython
Winter, Yasskin, Kleckner
The PEP 3118 enhanced buffer protocol has some ongoing semantic and
implementation issues still to be worked out, so I plan to leave that
at Accepted. Ditto for PEP 3121 (extension module finalisation), since
that doesn't play nicely with the current 'set everything to None'
approach to breaking cycles during module finalisation.
The other Accepted PEPs are either packaging standards related or
genuinely not implemented yet.
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
I am sending this review as the BDFOP for PEP 3151. I've read the PEP and
reviewed the python-dev discussion via Gmane. I have not reviewed the hg
branch where Antoine has implemented it.
I'm not quite ready to pronounce, but I do have some questions and comments.
First off, thanks to Antoine for taking this issue on, and for his well
written and well reasoned PEP. There's definitely a problem here and I think
Python will be better off for having addressed it. I, for one, will be very
happy when I can eliminate the majority of `import errno`s from my code. ;)
One guiding principle for me is that we should keep the abstraction as thin as
possible. In particular, I'm concerned about mapping multiple errnos into a
single Error. For example both EPIPE and ESHUTDOWN mapping to BrokePipeError,
or EACESS or EPERM to PermissionError. I think we should resist this, so that
one errno maps to exactly one Error. Where grouping is desired, Python
already has mechanisms to deal with that, e.g. superclasses and multiple
inheritance. Therefore, I think it would be better to have
+ AccessError (EACCES)
+ PermissionError (EPERM)
Yes, it makes the hierarchy deeper, and means you have to come up with a few
more names, but I think it will also make it easier for the programmer to use
and debug. Also, some of the artificial hierarchy introduced in the PEP may
not be necessary (e.g. the faux superclass FileSystemPermissionError above).
This might lead to the elimination of FileSystemError as some have suggested
(I too question its utility).
Similarly, I think it would be helpful to have the errno name (e.g. ENOENT) in
the error message string. That way, it won't get in the way for most code,
but would be usefully printed out for uncaught exceptions.
A second guiding principle should be that careful code that works in Python
3.2 must continue to work in Python 3.3 once PEP 3151 is accepted, but also
for Python 2 code ported straight to Python 3.3. Given the PEP's emphasis on
"useful compatibility", I think this will be the case. Do be prepared for
complaints about compatibility for careless code though - there's a ton of
that out in the wild, and people will always complain with their "working"
code breaks due to an upgrade. Be *very* explicit about this in the release
notes and NEWS file, and put your asbestos underoos on. On the plus side,
there's not so much Python 3 code to break :). Also, do clearly explain any
required migration strategy for existing code, probably in this PEP.
Have you considered the impact of this PEP on other Python implementations?
My hazy memory of Jython tells me that errnos don't really leak into Java and
thus Jython much, but what about PyPy and IronPython? E.g. step 1's
deprecation strategy seems pretty CPython-centric.
As for step 1 (coalescing the errors). This makes sense and I'm generally
agreeable, but I'm wondering whether it's best to re-use IOError for this
rather than introduce a new exception. Not that I can think of a good name
for that. I'm just not totally convinced that existing code when upgrading to
Python 3.3 won't introduce silent failures. If an existing error is to be
re-used for this, I'm torn on whether IOError or OSError is a better choice.
Popularity aside, OSError *feels* more right.
What is the impact of the PEP on tools such as 2to3 and 3to2?
Just to be clear, am I right that (on POSIX systems at least) IOError and its
subclasses will always have an errno attribute still? And that anything
raising an exception (e.g. via PyErr_SetFromErrno) other than the new ones
will raise IOError?
I also think that rather than transforming exception when raised from Python,
i.e. via __new__ hackery, perhaps it should be a ValueError in its own right
to raise IOError with an error represented by one of the subclasses. Chained
exceptions would mean that the original exception needn't get lost.
I surveyed some of my own code and observed (as others have) that EISDIR and
ENOTDIR are pretty rare. I found more examples of ECHILD and ESRCH than the
former two. How'd you like to add those two to make your BDFOP happy? :)
What follows are some crazier ideas. I'm just throwing them out there, not
necessarily suggesting they should go into the PEP.
The new syntax (e.g. if clause on except) is certainly appealing at first
glance, and might be of more general use for Python, but I agree with the
problems as stated in the PEP. However, there might be a few things that
*can* be done to make even the uncommon cases easier. E.g.
What if all the errno symbolic names were mapped as attributes on IOError?
The only advantage of that would be to eliminate the need to import errno, or
for the ugly `e.errno == errno.ENOENT` stuff. That would then be rewritten as
`e.errno == IOError.ENOENT`. A mild savings to be sure, but still.
How dumb/useless/unworkable would it be to add an __future__ to switch from
the old hierarchy to the new one? Probably pretty. ;)
What about an api that applications/libraries could use to add additional
exceptions based on other errnos they cared about? This could be consulted in
PyErr_SetFromErrno() and raised instead of IOError. Okay, yeah, that's
probably pretty dumb too.
Anyway, that's all I have. I certainly feel like this PEP is pretty close to
being accepted. Good work!
On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum <guido(a)python.org> wrote:
> I wonder if for
> this particular purpose SWIG isn't the better match. (If SWIG weren't
> universally hated, even by its original author. :-)
Hate is probably a strong word, but as the author of Swig, let me chime in here ;-). I think there are probably some lessons to be learned from Swig.
As Nick noted, Swig is best suited when you have control over both sides (C/C++ and Python) of whatever code you're working with. In fact, the original motivation for Swig was to give application programmers (scientists in my case), a means for automatically generating the Python bindings to their code. However, there was one other important assumption--and that was the fact that all of your "real code" was going to be written in C/C++ and that the Python scripting interface was just an optional add-on (perhaps even just a throw-away thing). Keep in mind, Swig was first created in 1995 and at that time, the use of Python (or any similar language) was a pretty radical idea in the sciences. Moreover, there was a lot of legacy code that people just weren't going to abandon. Thus, I always viewed Swig as a kind of transitional vehicle for getting people to use Python who might otherwise not even consider it. Getting back to Nick's point though, to really use Swig effectively, it was always known that you might have to reorganize or refactor your C/C++ code to make it more Python friendly. However, due to the automatic wrapper generation, you didn't have to do it all at once. Basically your code could organically evolve and Swig would just keep up with whatever you were doing. In my projects, we'd usually just tuck Swig away in some Makefile somewhere and forget about it.
One of the major complexities of Swig is the fact that it attempts to parse C/C++ header files. This very notion is actually a dangerous trap waiting for anyone who wants to wander into it. You might look at a header file and say, well how hard could it be to just grab a few definitions out of there? I'll just write a few regexs or come up with some simple hack for recognizing function definitions or something. Yes, you can do that, but you're immediately going to find that whatever approach you take starts to break down into horrible corner cases. Swig started out like this and quickly turned into a quagmire of esoteric bug reports. All sorts of problems with preprocessor macros, typedefs, missing headers, and other things. For awhile, I would get these bug reports that would go something like "I had this C++ class inside a namespace with an abstract method taking a typedef'd const reference to this smart pointer ..... and Swig broke." Hell, I can't even understand the bug report let alone know how to fix it. Almost all of these bugs were due to the fact that Swig started out as a hack and didn't really have any kind of solid conceptual foundation for how it should be put together.
If you flash forward a bit, from about 2001-2004 there was a very serious push to fix these kinds of issues. Although it was not a complete rewrite of Swig, there were a huge number of changes to how it worked during this time. Swig grew a fully compatible C++ preprocessor that fully supported macros A complete C++ type system was implemented including support for namespaces, templates, and even such things as template partial specialization. Swig evolved into a multi-pass compiler that was doing all sorts of global analysis of the interface. Just to give you an idea, Swig would do things such as automatically detect/wrap C++ smart pointers. It could wrap overloaded C++ methods/function. Also, if you had a C++ class with virtual methods, it would only make one Python wrapper function and then reuse across all wrapped subclasses.
Under the covers of all of this, the implementation basically evolved into a sophisticated macro preprocessor coupled with a pattern matching engine built on top of the C++ type system. For example, you could write patterns that matched specific C++ types (the much hated "typemap" feature) and you could write patterns that matched entire C++ declarations. This whole pattern matching approach had a huge power if you knew what you were doing. For example, I had a graduate student working on adding "contracts" to Swig--something that was being funded by a NSF grant. It was cool and mind boggling all at once.
In hindsight however, I think the complexity of Swig has exceeded anyone's ability to fully understand it (including my own). For example, to even make sense of what's happening, you have to have a pretty solid grasp of the C/C++ type system (easier said than done). Couple that with all sorts of crazy pattern matching, low-level code fragments, and a ton of macro definitions, your head will literally explode if you try to figure out what's happening. So far as I know, recent versions of Swig have even combined all of this type-pattern matching with regular expressions. I can't even fathom it.
Sadly, my involvement was Swig was an unfortunate casualty of my academic career biting the dust. By 2005, I was so burned out of working on it and so sick of what I was doing, I quite literally put all of my computer stuff aside to go play in a band for a few years. After a few years, I came back to programming (obviously), but not to keep working on the same stuff. In particularly, I will die quite happy if I never have to look at another line of C++ code ever again. No, I would much rather fling my toddlers, ride my bike, play piano, or do just about anything than ever do that again. Although I still subscribe the Swig mailing lists and watch what's happening, I'm not active with it at the moment.
I've sometimes thought it might be interesting to create a Swig replacement purely in Python. When I work on the PLY project, this is often what I think about. In that project, I've actually built a number of the parsing tools that would be useful in creating such a thing. The only catch is that when I start thinking along these lines, I usually reach a point where I say "nah, I'll just write the whole application in Python."
Anyways, this is probably way more than anyone wants to know about Swig. Getting back to the original topic of using it to make standard library modules, I just don't know. I think you probably could have some success with an automatic code generator of some kind. I'm just not sure it should take the Swig approach of parsing C++ headers. I think you could do better.
P.S. By the way, if people want to know a lot more about Swig internals, they should check out the PyCon 2008 presentation I gave about it. http://www.dabeaz.com/SwigMaster/