[Python-Dev] [draft] python-dev Summary for 2005-08-16 through 2005-08-31

Tony Meyer t-meyer at ihug.co.nz
Sat Sep 10 05:41:22 CEST 2005

If anyone would like to take a break from all this Py3k discussion, please
feel free to read through the following draft for the second August summary.
Checking over the "O(N**2) behaviour in StreamReader.readline" summary would
be particularly appreciated.  As always, any corrections/suggestions should
be sent to me or Steve (steven.bethard at gmail.com).  Thanks!


PyPy release 0.7.0

PyPy_ has a new release, 0.7.0, which is now a fully self-contained Python
implementation. It includes whole-program type inference and translation to
both C and LLVM, a choice of refcounting or Boehm garbage collectors,
language-level compliancy with CPython 2.4.1 and much more. If you haven't
already, now's the time to check it out!

.. _PyPy: http://codespeak.net/pypy/

Contributing thread:

- `PyPy release 0.7.0


New mailbox module

Gregory K. Johnson, who's been working with A.M. Kuchling for Google's
Summer of Code, has completed a new version of the mailbox module that
allows the adding and removing of messages. It will be double-checked for
code quality, complete documentation, and full backwards-compatibility and
then hopefully checked in.

Contributing thread:

- `New mailbox module




Terry Reedy suggested that str.find() be removed in Python 3.0, in favour of
str.index(); the main issue with str.find() is that it returns -1 on
failure, leading to the common "if s.find(sub):" bug (which should be "if
sub in s:"); -1 is also a valid index into a string.  Guido agreed that
removal would be a good idea, however Tim Peters pointed out that the
requirement to use a try/except clause can lead to another kind of sloppy

Raymond Hettinger suggested that the ideal solution would be to replace
str.find() with new methods, str.partition() and str.rpartition(), which
work like::

    >>> s = ' http://www.python.org' 
    >>> s.partition('://')
    ('http', '://', 'www.python.org')
    >>> s.rpartition('.')
    (' http://www.python',  '.', 'org')
    >>> s.partition('?')
    ('' http://www.python.org',  '', '')

Replacing str.find() with str.partition() in the standard library generally
resulted in much cleaner and clearer code, without requiring the addition of
try/except blocks.  Comments were overwhelmingly in favour of this new

"part" and "cut" were suggested as alternative names to "partition",
although Raymond is very attached to the "partition" name.

Contributing threads:

- `Remove str.find in 3.0?
- `partition() (was: Remove str.find in 3.0?)
- `Proof of the pudding: str.partition()
- `partition()
- `Alternative name for str.partition()


PEP 348: Exception Reorganization for Python 3.0

This fortnight saw the final demise of `PEP 348`_. This began with `Guido's
agreement`_ to remove bare "except:" from Python 3.0 entirely. Introducing a
transition plan for this change in Python 3.0 proved problematic, however.
To quote Michael Chermside, "no syntax will work in BOTH 2.5 and 3.0". For
example, the proposed Python 3.0 code::

        my_result = call_some_library(my_data)
    except Exception: # doesn't catch KeyboardInterrupt or SystemExit

would need to be written in Python 2.5 as::
        my_result = call_some_library(my_data)
    except (KeyboardInterrupt, SystemExit):

Note that the final ``except:`` in the 2.5 code can't be written as ``except
Exception:`` - Python 2.5 will still allow exceptions that do not derive
from Exception (e.g. string exceptions). Thus deprecating bare ``except:``
would mean that some code would produce warnings, and yet not have any way
to be rewritten that would be upwards-compatible.

As a result, Guido rejected the entire PEP.

.. _PEP 348: http://www.python.org/peps/pep-0348.html
.. _Guido's agreement:

Contributing threads:

- `Bare except clauses in PEP 348
- `FW: Bare except clauses in PEP 348
- `rev. 1.9 of PEP 348: Raymond tested, Guido approved


PEP 347: Migrating the Python CVS to Subversion

Discussion about the conversion to subversion and subsequent move of the
Python source repository to svn.python.org (outlined in `PEP  347`_)
continued this fortnight.  Discussion particularly covered the means of
authentication that would be used to access  svn.python.org, how names would
appear in revision logs, and other minor details like that.  Martin has set
up a test installation (for current Python developers; there is no anonymous
access) on svn.python.org, to check that the system will work as described
in the PEP.  Assuming that things go well with this test installation, it
seems likely that the PEP will be accepted and the migration will take place
at some point in the future.

.. _PEP 347: http://www.python.org/peps/pep-0347.html

Contributing threads:

- `PEP 347: Migration to Subversion
- `Admin access using svn+ssh
- `Collecting SSH keys
- `On distributed vs centralised SCM for Python
- `Fwd: Distributed RCS
- `wush.net details
- `Subversion instructions


Partial method application

Ian Bicking suggested a partialmethod() function along the lines of the
operator module's itemgetter() and attrgetter().  The partialmethod()
function would allow the "self" argument of a method to be supplied later,

    lst = ['A', 'b', 'C']

Martin v. Löwis argued (convincing Guido at least) that a better style for
delayed method calls would look something like::


where the "virtual" object would serve as a virtual instance so that the
"self" argument to the "lower" method could be supplied later.

There was a brief discussion about consistency between this proposal and the
operator module's itemgetter() and attrgetter() which, unlike Martin's
proposal, use argument strings instead of attributes to determine the
appropriate function to produce. Additionally, in Python 2.5, both
itemgetter() and attrgetter() will allow multiple arguments, while none of
the method-calling solutions above extended reasonably to multiple methods.
However, people seemed in general agreement that the use case was a single
method call, and that supporting multiple method calls was unnecessary.

The thread concluded without coming to a full resolution. For the moment at
least, it seems that defining a regular Python function for the key=
argument is still considered the best style.

Contributing thread:

- `PEP 309: Partial method application


Moving id() to the sys module

Christian Robottom Reis suggested that the built-in function id() should be
moved into the sys module, as "id" is a useful and common name, to avoid
shadowing built-ins.  This is also list as one of `Guido's regrets`_.  He
asked whether adding sys.id() would be possible in 2.5, and adding a
deprecation warning to __builtins__.id() (to be removed in Python 3.0).

This gathered quite a lot of support, and few comments against the proposal.
However, Anthony Baxter warned that using the warnings module is expensive,
and so issuing a deprecation warning might not be a good idea.

Interestingly, Guido's opinion (not universally shared) is that shadowing at
least some built-in names is perfectly acceptable.  It wasn't clear whether
this meant that he would be against the move, or that his reasons for the
move were different (e.g. simply a more appropriate place, reducing the
number of built-ins).  
 .. _Guido's regrets:

Contributing thread:

 - `Deprecating builtin id (and moving it to sys())


O(N**2) behavior in StreamReader.readline

Keir Mierle reported a problem where _PyUnicodeUCS2_IsLinebreak was called
excessively, resulting in a huge slowdown of a CGI script.   The code that
caused the slowdown was adding the encoding line "# -*- coding: iso8859-1
-*-"; this is caused by changes to codecs in  2.4, which no longer rely on
C's readline() to do line splitting, but use unicode.splitlines() instead,
and also that  StreamReader.readline performs splitline on the entire input,
only to fetch the first line, and also uses splitlines on a single line  to
remove any trailing line breaks.  As a result, for a file with N lines,
IsLinebreak is invoked up to N*N/2 times per character.

Walter Dörwald and Martin v. Löwis worked on solutions to this problem.
Martin's `eventual solution`_ keeps the splitlines result and only invokes
IsLinebreak once per character, and copies each character only one.  This
should be much faster than the current code.

.. _eventual solution: http://www.python.org/sf/1268314

Contributing threads:

- `51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)
- `[Argon] Re: 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)


PEP 349: Allow str() to return unicode strings

Neil Schemenauer updated `PEP 349`_, which had previously proposed a text()
builtin, to instead propose that str() be allowed to return unicode strings.
The new str() would remove two types of UnicodeEncodeErrors that str() had
previously raised:

* No UnicodeEncodeError would be raised if the argument to str() is unicode.
Instead, the unicode object would be returned unmodified.
* No UnicodeEncodeError would be raised if the argument's __str__() method
returns a unicode object. Instead, this returned object would be in turn
returned from the str() call.

In the following brief discussion, it was suggested that unicode.__str__
should be changed to return "self" instead of trying to encode itself into
ascii. (Otherwise subclasses of unicode would likely get UnicodeEncodeErrors
when their __str__() methods were called.)

There was a little feedback on the proposal, with a few people wanting to go
back to the text() builtin instead of changing str(), but no final decisions
were made.

.. _PEP 349:

Contributing thread:

- `Revised PEP 349: Allow str() to return unicode strings


One argument form of setdefault()

Tim Peters asked if anyone remembered why setdefault's second argument is
optional, given that it doesn't seem at all useful, and that he wasn't able
to find any use cases outside of the test suite.  The likely explanation
seemed that it was a result of setdefault()  following the behaviour of
dict.get().  Tim suggested dropping the optional nature of the second
argument for Python 3.0 - Raymond  upped this by suggesting that it could be
done earlier (e.g. with a deprecation warning in 2.5 and gone in 2.6).

Tim did later find a use (in Zope), but the author of the code, David
Goodger, indicated that it would probably be better written in other ways,
and that if dict.pop() could be used (it was introduced in Python 2.3) then
that would be preferable, so it still seems likely that the second argument
will be made mandatory.

Contributing thread:

- `setdefault's second argument


dir() returning non-strings

As a result of a suggested patch by Michael Krasnyk, the question of whether
dir() should only return strings was raised.  Guido's position was that
dir() should hide non-strings, as these are not attributes if you use the
definition that an attribute name is a valid parameter to a getattr() or
setattr() call.  Guido suggested that a useful relationship (excluding where
__getattr__ or __getattribute__ is overridden) is::

  name in dir(x) <==> getattr(x, name) is valid

Contributing threads:

- `SWIG and rlcompleter



Raymond Hettinger wanted to move away from the current empty-string API that
file objects use for indicating that the end of the file has been reached.
To cover at least some of the use-cases, he suggested a readblocks() method,
so that code like::

    while 1:
        block = f.read(20)
        if line == '':

could be instead written as::

    for block in f.readblocks(20):

Guido couldn't see a use case for this though, and suggested that there were
other issues with files/streams that were more important (e.g. buffering
transparency and character set encodings) some of which he'd been working on
in the sandbox_.

.. _sandbox:

Contributing thread:

- `empty string api for files


Python 3.0 design principles

Raymond Hettinger is planning to put together a draft list of Python design
principles.  For example, "don't let the *type* of the return value depend
on the *value* of the arguments".  These will complement the Zen of Python,
and provide a document to refer people to when proposing new/changed

Although in design principle discussion, there has been discussion about the
Python 2.x->Python 3.0 transition, and whether it will be possible to write
code that runs in both Python 2.x and 3.0.


Contributing threads:

- `Design Principles
- `Python 3 design principles

Skipped Threads

- `Extension to dl module to allow passing strings from native function
- `implementation of copy standard lib
- `dev listinfo page (was: Re: Python + Ping)
- `remote debugging with pdb
- `A testing challenge
- `On decorators implementation
- `[Python-checkins] python/dist/src setup.py, 1.219, 1.220
- `Weekly Python Patch/Bug Summary
- `[Python-checkins] python/dist/src/Modules _hashopenssl.c, NONE, 2.1
sha256module.c, NONE, 2.1 sha512module.c, NONE, 2.1 md5module.c, 2.35, 2.36
shamodule.c, 2.22, 2.23
- `PEP 342 Implementation
- `Modules _hashopenssl, sha256, sha512 compile in MinGW, test_hmac.py
- `python/dist/src/Doc/tut tut.tex,1.276,1.277
- `Docs/Pointer to Tools/scripts?
- `python-dev Summary for 2005-08-01 through 2005-08-15 [draft]
- `Style for raising exceptions (python-dev Summary for 2005-08-01 through
2005-08-15 [draft])
- `PEP 342: simple example, closure alternative
- `operator.c for release24-maint and test_bz2 on Python 2.4.1
- `test_bz2 on Python 2.4.1
- `[Python-checkins] python/dist/src/Lib/test test_bz2.py, 1.18, 1.19
- `test_bz2 fails on Python 2.4.1 from CVS, passes on same from source
- `Python 3.0 blocks?
- `Any detail list of change between version 2.1-2.2-2.3-2.4 of Python?
- `info/advices about python readline implementation
- `test_bz2 and Python 2.4.1
- `[Python-checkins] python/dist/src/Doc/whatsnew whatsnew25.tex, 1.18, 1.19
- `Revising RE docs (was: partition() (was: Remove str.find in 3.0?))
- `Switching re and sre

More information about the Python-Dev mailing list