[Python-Dev] DRAFT: python-dev Summary for 2005-09-16 to 2005-09-30

Thu Nov 17 01:36:32 CET 2005

It's been some time (all that concurrency discussion didn't help ;)  
but here's the second half of September.  Many apologies for the  
delay; hopefully you agree with Guido's 'better late than never', and  
I promise to try harder in the future.  Note that the delay is all my  
bad, and epithets should be directed at me and not Steve.  As usual,  
please read over if you have a chance, and direct comments/ 
corrections to tony.meyer at gmail.com or steven.bethard at gmail.com.   
(One particular question is whether the concurrency summary is too  
long).

=============
Announcements
=============

-----------------------------
QOTF: Quotes of the fortnight
-----------------------------

We have two quotes this week, one each from the two biggest threads  
of this fortnight: concurrency and conditional expressions.  The  
first quote, from Donovan Barda, puts Python's approach to threading  
into perspective:

     The reality is threads were invented as a low overhead way of  
easily implementing concurrent applications... ON A SINGLE PROCESSOR.  
Taking into account threading's limitations and objectives, Python's  
GIL is the best way to support threads. When hardware (seriously)  
moves to multiple processors, other concurrency models will start to  
shine.

Our second QOTF, by yours truly (hey, who could refuse a nomination  
from Guido?), is a not-so-subtle reminder to leave syntax decisions  
to Guido:

     Please no more syntax proposals! ... We need to leave the syntax  
to Guido.  We've already proved that ... we can't as a community  
agree on a syntax.  That's what we have a BDFL for. =)

Contributing threads:

- `GIL, Python 3, and MP vs. UP <http://mail.python.org/pipermail/ 
python-dev/2005-September/056609.html>`__
- `Adding a conditional expression in Py3.0 <http://mail.python.org/ 
pipermail/python-dev/2005-September/056617.html>`__

[SJB]

-------------------
Compressed MSI file
-------------------

Martin v. Lˆwis discovered that a little more than a `MiB`_ in the  
Python installer by using LZX:21 instead of the standard MSZIP when  
compressing the CAB file.  After confirmation from several testers  
that the new format worked, the change (for Python 2.4.2 and beyond)  
was made.

.. _MiB: http://en.wikipedia.org/wiki/Mibibyte

Contributing thread:

- `Compressing MSI files: 2.4.2 candidate? <http://mail.python.org/ 
pipermail/python-dev/2005-September/056694.html>`__

[TAM]

=========
Summaries
=========

-----------------------
Conditional expressions
-----------------------

Raymond Hettinger proposed that the ``and`` and ``or`` operators be  
modified in Python 3.0 to produce only booleans instead of producing  
objects, motivating this proposal in part by the common (mis-)use of  
``<cond> and <true-expr> or <false-expr>`` to emulate a conditional  
expression.  In response, Guido suggested that that the conditional  
expression discussion of `PEP 308`_ be reopened.  This time around,  
people seemed almost unanimously in support of adding a conditional  
expression, though as before they disagreed on syntax.  Fortunately,  
this time Guido cut the discussion short and pronounced a new syntax:  
``<true-expr> if <cond> else <false-expr>``.  Although it has not  
been implemented yet, the plan is for it to appear in Python 2.5.

.. _PEP 308: http://www.python.org/peps/pep-0308.html

Contributing threads:

- `"and" and "or" operators in Py3.0 <http://mail.python.org/ 
pipermail/python-dev/2005-September/056510.html>`__
- `Adding a conditional expression in Py3.0 <http://mail.python.org/ 
pipermail/python-dev/2005-September/056546.html>`__
- `Conditional Expression Resolution <http://mail.python.org/ 
pipermail/python-dev/2005-September/056846.html>`__

[SJB]

---------------------
Concurrency in Python
---------------------

Once again, the subject of removing the global interpreter lock (GIL)  
came up.  Sokolov Yura suggested that the GIL be replaced with a  
system where there are thread-local GILs that cooperate to share  
writing; Martin v. Lˆwis suggested that he try to implement his  
ideas, and predicted that he would find that doing so would be a lot  
of work, would require changes to all extension modules (likely to  
introduce new bugs, particularly race conditions), and possibly  
decrease performance.  This kicked off several long threads about  
multi-processor coding.

A long time ago (circa Python 1.5), Greg Ward experimented with free  
threading, which did yield around a 1.6 times speedup on a dual- 
processor machine.  To avoid the overhead of multi-processor locking  
on a uniprocessor machine, a separate binary could be distributed.   
Some of the code apparently did make it into Python 1.5, but the  
issue died off because no-one provided working code, or a strategy  
for what to do with existing extension modules.

Guido pointed out that it is not clear at this time how multiple  
processors will be used as they become the norm.  With the treaded  
programming model (e.g. in Java) there are problems with concurrent  
modification errors (without locking) or deadlocks and livelocks  
(with locking).  Guido's hunch (and mine, FWIW) is that instead of  
writing massively parallel applications, we will continue to write  
single-threaded applications that are tied together at the process  
level rather than at the thread level.  He also pointed out that it's  
likely that most problems get little benefit out of multiple processors.

Guido threw down the gauntlet: rather than the endless discussion  
about this topic, someone should come up with a GIL-free Python (not  
necessarily CPython) and demonstrate its worth.  Phillip J. Eby  
reminded everyone that Jython, IronPython, and PyPy exist, and that  
someone could, for example, create a multiprocessor-friendly backend  
for PyPy.

Guido also pointed out that fast threading benefits from fast context  
switches, which benefits from small register sets, and that the  
current trend in chips is towards larger register sets.  In addition,  
multiple processors with shared memory don't scale all that well  
(multiple processors with explicit interprocess communication (IPC)  
channels scale much better).  These all favour multi-processing over  
multi-threading.  Donovan Baarda went so far as to say (a QOTF, as  
above), that Python's GIL is the best way to support threads, which  
are for single-processor use, and that when multiple-processor  
platforms have matured more other concurrency models will likewise  
mature.  OTOH, Bob Ippolito pointed out that (in many operating  
systems) there isn't a lot of difference between threads and  
processes, and that threads can typically still use IPC.  Bob argued  
that the biggest argument for threading is that lots of existing C/C+ 
+ code uses threads.

Simon Percivall argued that the problem is that Python offers ("out  
of the box") some support for multi-threaded programming, but little  
for multi-process programming beyond the basics (e.g. data sharing,  
communication, control over running processes, dealing out tasks to  
be handled).  Simon suggested that the best way to stop people  
complaining about the GIL is to provide solid, standardized support  
for multi-process programming.  The idea of a "multiprocess" module  
gained a reasonable amount of support.

Phillip J. Eby outlined an idea he is considering PEPifying, in which  
one could switch all context variables (such as the Decimal context  
and the sys.* variables) simulaneously and instantaneously when  
changing execution contexts (like switching between coroutines).  He  
has a prototype implementation of the basic idea, which is less than  
200 lines of Python and very fast.  However, he pointed out that it's  
not completely PEP-ready at this point, and he needs to continue  
considering various parts of the concept.
Bruce Eckel joined the thread, and suggested that low-level threads  
people are only now catching up to objects, but as far as concurrency  
goes their brains still think in terms of threads, so they naturally  
apply thread concepts to objects.  He believes that pthread-style  
thinking is two steps backwards: you effectively throw open the  
innards of the object that you just spent time decoupling from the  
rest of your system, and the coupling is not unpredictable.

Bruce and Guido had discussed offlist "active objects": defining a  
class as "active" would install a worker thread and concurrent queue  
in each object of that class, automatically turn method calls into  
tasks and enqueue them, and prevent any other interaction other than  
enqueued messages.  Guido felt that if multiple active objects could  
co-exist in the same process, but be prevented (by the language  
implementation) from sharing data except via channels, and dynamic  
reallocation of active objects across multiple CPUs were possible,  
then this might be a solution.  He pointed out that an implementation  
would really be needed to prove this.

Phillip and Martin pointed out that preventing any other interacton  
other than enqueued messages is the difficult part; each active  
object would, for example, have to have its own sys.modules.  Phillip  
felt that such a solution (which Bruce posed as "a" solution, not  
"the" solution) wouldn't help with GIL removal, but would help with  
effective use of multiprocessor machines on platforms where fork() is  
available, if the API works across processes as well as threads.

Bruce then restarted the discussion, putting forth eight criteria  
that he felt would be necessary for the "pythonic" solution to  
concurrency.  Items on the list were discussed further, with some  
disagreement about what was possible.  The concurrency discussion  
continues next month...

Contributing threads:

- `Variant of removing GIL. <http://mail.python.org/pipermail/python- 
dev/2005-September/056423.html>`__
- `GIL, Python 3, and MP vs. UP (was Re: Variant of removing GIL.)  
<http://mail.python.org/pipermail/python-dev/2005-September/ 
056458.html>`__
- `GIL, Python 3, and MP vs. UP <http://mail.python.org/pipermail/ 
python-dev/2005-September/056498.html>`__
- `Active Objects in Python <http://mail.python.org/pipermail/python- 
dev/2005-September/056752.html>`__
- `Pythonic concurrency <http://mail.python.org/pipermail/python-dev/ 
2005-September/056801.html>`__
- `Pythonic concurrency - cooperative MT <http://mail.python.org/ 
pipermail/python-dev/2005-September/056860.html>`__

[TAM]

-----------------------------------
Removing nested function parameters
-----------------------------------

Brett Cannon proposed removing support for nested function parameters  
so that instead of being able to write::

     def f((x, y)):
         print x, y

you'd have to write something like::

     def f(arg):
         x, y = arg
         print x, y

Brett (with help from Guido) motivated this removal (for Python 3.0)  
by a few factors:

(1) The feature has low visibility: "For every user who is fond of  
them there are probably ten who have never even heard of it." - Guido
(2) The feature can be difficult to read for some people.
(3) The feature doesn't add any power to the language; the above  
functions emit essentially the same byte-code.
(4) The feature makes function parameter introspection difficult  
because tuple unpacking information is not stored in the function  
object.

In general, people were undecided on this proposal.  While a number  
of people said they used the feature and would miss it, many of them  
also said that their code wouldn't suffer that much if the feature  
was removed.  No decision had been made at the time of the summary.

Contributing thread:

- `removing nested tuple function parameters <http://mail.python.org/ 
pipermail/python-dev/2005-September/056459.html>`__

[SJB]

-----------------------------------------
Evaluating iterators in a boolean context
-----------------------------------------

In Python 2.4 some builtin iterators gained __len__ methods when the  
number of remaining items could be made available.  This broke some  
of Guido's code that tested iterators for their boolean value (to  
distinguish them from None).  Raymond Hettinger (who supplied the  
original patch) argued that `testing for None`_ using boolean tests  
was in general a bad idea, and that knowing the length of an  
iterator, when possible, had a number of use cases and allowed for  
some performance gains.  However, Guido felt strongly that iterators  
should not supply __len__ methods, as this would lead to some people  
writing code expecting this method, which would then break when it  
received an iterator which could not determine its own length.  The  
feature will be rolled back in Python 2.5, and Raymond will likely  
move the __len__ methods to private methods in order to maintain the  
performance gains.

.. _testing for None: http://www.python.org/peps/ 
pep-0290.html#testing-for-none

Contributing threads:

- `bool(iter([])) changed between 2.3 and 2.4 <http://mail.python.org/ 
pipermail/python-dev/2005-September/056576.html>`__
- `bool(container) [was bool(iter([])) changed between 2.3 and 2.4]  
<http://mail.python.org/pipermail/python-dev/2005-September/ 
056879.html>`__

[SJB]

--------------------------------------------------
Properties that only call the getter function once
--------------------------------------------------

Jim Fulton proposed adding a new builtin for a property-like  
descriptor that would only call the getter method once, so that  
something like::

    class Spam(object):

        @readproperty
        def eggs(self):
            ... expensive computation of eggs

            self.eggs = result
            return result

would only do the eggs computation once.  Currently, you can't do  
this with a property() because the ``self.eggs = result`` statement  
tries to call the property's ``fset`` method instead of replacing the  
property with the result of the eggs() call.  A few other people  
commented that they'd needed similar functionality at times, and  
Guido seemed moderately interested in the idea, but there was no  
final resolution.

Contributing thread:

- `RFC: readproperty <http://mail.python.org/pipermail/python-dev/ 
2005-September/056769.html>`__

[SJB]

--------
Codetags
--------

Micah Elliott submitted his `Codetags PEP 350`_ (after revisions  
following the comp.lang.python discussion) to python-dev for  
comment.  A common feeling was that this (particularly synonyms) was  
over-engineering; Guido pointed out that he only uses XXX, and this  
is certainly the most common (although not only) example in the  
Python source itself.  Some suggestions were made, many of which  
Micah integrated into the PEP.

The suggestion was made that an implementation should precede  
approval of the PEP.  Micah indicated that he would continue  
development on the tools, and that he encourages anyone interested in  
using a standard set of codetages to give these a try.

.. _Codetags PEP 350: http://python.org/peps/pep-0350.html

- `PEP 350: Codetags <http://mail.python.org/pipermail/python-dev/ 
2005-September/056744.html>`__

[TAM]

----------------------------
Improving set implementation
----------------------------

Raymond Hettinger suggested a "small, but interesting, C project" to  
determine whether the setobject.c implementation would be improved by  
recoding the set_lookkey() function to optimize key insertion order  
using Brent's variation of Algorithm D (c.f. Knuth vol. III, section  
6.4, p525).  It has the potential to boost performance for  
uniquification applications with duplicate keys being identified more  
quickly, and possibly also more frequent retirement of dummy entires  
during insertion operations.

Andrew Durdin pointed out that Brent's variation depends on the next  
probe position for a key being derivation from the key and it current  
position, which is incompatible with the current perturbation system;  
Raymond replaced perturbation with a secondary hash with linear  
probing.  Antoine Pitrou did some `experimenting with this`_,  
resulting in a -5% to 2% speedup with various benchmarks.

Raymond has also been experimenting with a simpler approach: whenever  
there are more than three probes, always swap the new key into the  
first position and then unconditionally re-insert the swapped-out  
key.  He reported that, most of the time, this gives an improvement,  
and it doesn't require changing the perturbation logic.  This simpler  
approach is cheap to implement, but the benefits are also smaller,  
with it improving only the worse collisions.

.. _experimenting with this: http://pitrou.net/python/sets

- `C coding experiment <http://mail.python.org/pipermail/python-dev/ 
2005-September/055965.html>`__

[TAM]

--------------
Relative paths
--------------

Nathan Bullock suggested a ''relpath(path_a, path_b)'' addition to  
os.path that returns a relative path from path_a to path_b.  Trent  
Mick pointed out that there are a `couple of`_ `recipes for this`_,  
as well as `Jason Orendorff's Path module`_.  Several people  
supported this idea, and hopefully either Nathan or one of the recipe  
authors will submit a patch with this functionality.

.. _couple of: http://aspn.activestate.com/ASPN/Cookbook/Python/ 
Recipe/302594
.. _recipes for this: http://aspn.activestate.com/ASPN/Cookbook/ 
Python/Recipe/208993
.. _Jason Orendorff's Path module: http://www.jorendorff.com/articles/ 
python/path/

Contributing threads:

- `os.path.diff(path1, path2) <http://mail.python.org/pipermail/ 
python-dev/2005-September/056391.html>`__
- `os.path.diff(path1, path2) (and a first post) <http:// 
mail.python.org/pipermail/python-dev/2005-September/056703.html>`__

[TAM]

----------------------------------
Adding a vendor-packages directory
----------------------------------

Rich Burridge followed up a `comp.lang.python thread`_ about a  
"vendor-packages" directory for Python by submitting a `patch`_ and  
asking for comments about the proposal on python-dev.  General  
consensus was that the proposal needed a better rationale, explaining  
why this improved on simply adding a .pth file to the site-packages  
directory.

Rich explained that the rationale is that Python files supplied by  
the vendor (Sun, Apple, RedHat, Microsoft) with their operating  
system software should go in a separate base directory to  
differentiate them from Python files installed specifically at the  
site.  However, Bob Ippolito pointed out that, as of OS X 10.4  
("Tiger") Apple already does this via a .pth file ("Extras.pth"),  
which points to ''/System/Library/Frameworks/Python.framework/ 
Versions/2.3/Extras/lib/python'' and includes wxPython by default.

Bob also pointed out that such a "vendor-packages.pth" should look  
like ''import site; site.addsitedir('/usr/lib/python2.4/vendor- 
packages')'' so that packages like Numeric, PIL, and PyObjC, which  
take advantage of .pth files themselves, work when installed to the  
vendor-packages location.

Phillip J. Eby pointed out that it would be good to have a document  
for "Python Distributors" that explained these kind of things, and  
suggested that perhaps a volunteer or two could be found within the  
distutils-SIG to do this.

.. _comp.lang.python thread: http://mail.python.org/pipermail/python- 
list/2005-September/300029.html
.. _patch: http://sourceforge.net/tracker/index.php? 
func=detail&aid=1298835&group_id=5470&atid=305470

Contributing thread:

- `vendor-packages directory <http://mail.python.org/pipermail/python- 
dev/2005-September/056682.html>`__

[TAM]

=======================
Version numbers on OS X
=======================

Guido asked if platform.system_alias() could be improved on OS X by  
mapping uname()'s ''Darwin x.y'' to ''OS X 10.(x-4).y''.  Bob  
Ippolito and others pointed out that this was not a good idea,  
because uname() only reports on the kernel version number and not the  
Cocoa API, which is really what OS X 10.x.y refers to.  He pointed  
out that the correct way to do it using a public API is to used  
gestalt, which is what platform.mac_ver() does.

On further inspection, it was discovered that parsing the /System/ 
Library/CoreServices/SystemVersion.plist property list is also a  
supported API, and would not rely on access to the Carbon API set.   
Bob and Wilfredo S·nchez Vega provided sample code that would parse  
this plist; Marc-Andre Lemburg suggested that a patch be written for  
system_alias() that would use this method (if possible) for Mac OS.

Contributing thread:

- `Mapping Darwin 8.2.0 to Mac OS X 10.4.2 in platform.py <http:// 
mail.python.org/pipermail/python-dev/2005-September/056651.html>`__

[TAM]

================
Deferred Threads
================

- `Python 2.5a1, ast-branch and PEP 342 and 343 <http:// 
mail.python.org/pipermail/python-dev/2005-September/056449.html>`__

===============
Skipped Threads
===============

- `Visibility scope for "for/while/if" statements <http:// 
mail.python.org/pipermail/python-dev/2005-September/056669.html>`__
- `inplace operators and __setitem__ <http://mail.python.org/ 
pipermail/python-dev/2005-September/056766.html>`__
- `Repository for python developers <http://mail.python.org/pipermail/ 
python-dev/2005-September/056717.html>`__
- `For/while/if statements/comprehension/generator expressions  
unification <http://mail.python.org/pipermail/python-dev/2005- 
September/056508.html>`__
- `list splicing <http://mail.python.org/pipermail/python-dev/2005- 
September/056472.html>`__
- `Compatibility between Python 2.3.x and Python 2.4.x <http:// 
mail.python.org/pipermail/python-dev/2005-September/056437.html>`__
- `python optimization <http://mail.python.org/pipermail/python-dev/ 
2005-September/056441.html>`__
- `test__locale on Mac OS X <http://mail.python.org/pipermail/python- 
dev/2005-September/056463.html>`__
- `possible memory leak on windows (valgrind report) <http:// 
mail.python.org/pipermail/python-dev/2005-September/056478.html>`__
- `Mixins. <http://mail.python.org/pipermail/python-dev/2005- 
September/056481.html>`__
- `2.4.2c1 fails test_unicode on HP-UX ia64 <http://mail.python.org/ 
pipermail/python-dev/2005-September/056551.html>`__
- `2.4.2c1: test_macfs failing on Tiger (Mac OS X 10.4.2) <http:// 
mail.python.org/pipermail/python-dev/2005-September/056558.html>`__
- `test_ossaudiodev hangs <http://mail.python.org/pipermail/python- 
dev/2005-September/056559.html>`__
- `unintentional and unsafe use of realpath() <http://mail.python.org/ 
pipermail/python-dev/2005-September/056616.html>`__
- `Alternative name for str.partition() <http://mail.python.org/ 
pipermail/python-dev/2005-September/056630.html>`__
- `Weekly Python Patch/Bug Summary <http://mail.python.org/pipermail/ 
python-dev/2005-September/056713.html>`__
- `Possible bug in urllib.urljoin <http://mail.python.org/pipermail/ 
python-dev/2005-September/056736.html>`__
- `Trasvesal thought on syntax features <http://mail.python.org/ 
pipermail/python-dev/2005-September/056741.html>`__
- `Fixing pty.spawn() <http://mail.python.org/pipermail/python-dev/ 
2005-September/056750.html>`__
- `64-bit bytecode compatibility (was Re: [PEAK] ez_setup on 64-bit  
linux problem) <http://mail.python.org/pipermail/python-dev/2005- 
September/056811.html>`__
- `C API doc fix <http://mail.python.org/pipermail/python-dev/2005- 
September/056827.html>`__
- `David Mertz on CA state e-voting panel <http://mail.python.org/ 
pipermail/python-dev/2005-September/056840.html>`__
- `[PATCH][BUG] Segmentation Fault in xml.dom.minidom.parse <http:// 
mail.python.org/pipermail/python-dev/2005-September/056844.html>`__
- `linecache problem <http://mail.python.org/pipermail/python-dev/ 
2005-September/056856.html>`__