[Python-Dev] python-dev Summary for 2005-04-01 through 2005-04-15 [draft]

Tim Lesher tlesher at gmail.com
Mon Apr 18 19:19:04 CEST 2005

Here's the first draft of the python-dev summary for the first half of
April.  Please send any corrections or suggestions to the summarizers.

Summary Announcements

New python-dev summary team

This summary marks the first by the team of Steve Bethard, Tim Lesher,
and Tony Meyer.  We're trying a collaborative approach to the
summaries: each fortnight, we'll be getting together in a virtual
smoke-filled back room to divide up the interesting threads.  Then
we'll stitch together the summaries in roughly the same form as you've
seen in the past. We'll mark each editor's entries with his initials.

Thanks to Brett Cannon for sixty-one excellent python-dev summaries.
Also, thanks for providing scripts to help get the new summaries off
the ground! We're looking forward to the contributions you'll make to
the Python core, now that the summaries aren't taking up all your



Right Operator Methods

Greg Ewing explored an issue with new-style classes that define only
right operator methods (__radd__, __rmul__, etc.)  Instances of such
a class cannot be added/multiplied/etc. together as Python raises a
TypeError. Armin Rigo explained the rule: if the instances on both sides
of an operator are of the same class, only the non-reversed method is
ever called. Armin also explained that an __add__ or __mul__ method that
returns NotImplemented may be called twice when Python attempts to
differentiate between numeric and sequence operations.

Contributing threads:

- `New style classes and operator methods


Hierarchical groups in regular expressions

Chris Ottrey demoed his `pyre2 project`_ that can extract a hierarchy of
strings when nested groups match in a regular expression.  The current
re module (in the stdlib) only matches the last occurrence of a group in
the string, throwing away any preceding matches. People discussed some
of pyre2's proposed API, with the main suggestion being to extend the
API to support unnamed (positional) groups in addition to named groups.

Though a number of people expressed interest in the idea, it was not
clear whether the functionality should be included in the standard
library. However, most agreed that if it was included, it should
be integrated with the existing re module. Gustavo Niemeyer offered to
perform this integration if an API could be agreed upon. Further
discussion was moved to the pyre2 `development wiki`_ and `mailing

Contributing threads:

- `hierarchicial named groups extension to the re library

.. _pyre2 project: http://pyre2.sourceforge.net/

.. _development wiki: http://py.redsoft.be/pyre2/wiki/

.. _mailing list: http://lists.sourceforge.net/lists/listinfo/pyre2-devel


Security capabilities in Python

The issue of security came up again, and Ka-Ping Yee suggested that in
Python's restricted execution mode secure proxies can be created by
using lexical scoping.  He posted `some code`_ for revealing only
certain "facets" of an object by using a function to declare a proxy
class that used function local variables to build the proxy. Thus to
access the attributes used in the proxy class, you need to access
things like im_func or func_closure, which are not accessible in
restricted execution mode.

James Y Knight illustrated how strategic overriding of __eq__ in a
subclass of str could allow access to the hidden "facets". Eyal Lotem
suggested that such an attack could be countered by implementing
"facets" in C, but having to turn to C every time you needed a
particular security construct seemed unappealing.

Contributing threads:

- `Security capabilities in Python

.. _some code: http://zesty.ca/python/facet.py


Improving GilState API Robustness

Michael Hudson noted that his changes to thread handling in the
readline module appeared to trigger `bug 1176893`_ ("Readline
segfault"). However, he believed the problem lay in the GilState API,
rather than in his changes: PyGilState_Release crashes if
PyEval_InitThreads wasn't called, even if the code you're writing
doesn't use multiple threads.

He proposed several solutions, none of which met with resounding
approbation, and Tim Peters noted that `PEP 311`_, Simplified Global
Interpreter Lock Acquisition for Extensions, "specifically disowns
responsibility for worrying about whether Py_Initialize and
PyEval_InitThreads have been called."

Bob Ippolito wondered whether just calling PyEval_InitThreads directly
in Py_Initialize might be a better idea.  No objections were raised,
so long as the underlying OS locking mechanisms weren't overly
expensive; some initial benchmarks indicated that this approach was
viable, at least on Linux and OS X.

Contributing threads:

- `threading (GilState) question

.. _bug 1176893:

.. _PEP 311: http://www.python.org/peps/pep-0311.html


Unicode byte order mark decoding

Evan Jones saw that the UTF-16 decoder discards the byte-order mark
(BOM) from Unicode files, while the UTF-8 decoder doesn't. Although
the BOM isn't really required in UTF-8 files, many Unicode-generating
applications, especially on Microsoft platforms, add it.

Walter Dörwald created a patch_ to add a UTF-8-Sig codec that generates
a BOM on writing and skips it on reading, but after a long discussion
on the history of the Unicode, Microsoft's influence over its
evolution, the consensus was that BOM and signature handling belong at
a higher level (for example, a stream API) than the codec.

Contributing threads:

- `Unicode byte order mark decoding

.. _patch: http://sourceforge.net/tracker/index.php?func=detail&aid=1177307&group_id=5470&atid=305470


Developers List

Raymond Hettinger has started a `project to track developers`_ and the
(tracker and commit) privileges they have, and who gave them the privileges,
and why (for example, was it for a one-shot project). Removing inactive
developers should improve clarity, institutional memory, security, and makes
everything tidier.  Raymond has begun contacting recently inactive
developers to check whether they still require the privileges they have.

Contributing threads:

 - `Developer list update

.. _project to track developers:


Marshalling Infinity

Scott David Daniels kicked off a very long thread by asking what (un)marshal
should do with floating point NaNs.  The current behaviour (as with any NaN,
infinity, or signed zero) is undefined: a platform-dependant accident,
because Python is written to C89, which has no such concepts.  Tim Peters
pointed out all code for (de)serialing C doubles should go through
_PyFloat_Pack8()/_PyFloat_Unpack8(), and that the current implementation
suggests that the routines could simply copy bytes on platforms that use the
standard IEEE-754 single and double formats natively.  Michael Hudson
obliged by creating a `patch to implement this`_.

The consensus was that the correct behaviour is that packing a NaN or
infinity shouldn't cause an exception.  When unpacking, an IEEE-754 platform
shouldn't cause an exception, but a non-754 platform should, since there's
no sensible value that it can be unpacked to, and errors should never pass

Contributing threads:

 - `marshal / unmarshal

.. _patch to implement this: http://python.org/sf/1181301


Location of the sign bit in longs

Michael Hudson asked about the possibility of longs storing the sign bit
somewhere other than the current location, suggesting the top bit of
ob_digit[0].  Tim Peters suggested that it would be better to give struct
_longobject a distinct sign member.  This simplifies code, costs no extra
bytes for some longs, and 8 extra bytes for others, and shouldn't hurt
binary compatibility.

Michael coughed up a `longobject patch`_, which seems likely to be checked

Contributing threads:

 - `marshal / unmarshal

.. _longobject patch: http://python.org/sf/1177779


Acceptable diff formats

Nick Coghlan asked if context diffs are still favoured for patches.
Historically, context diffs were preferred, but it appears that unified
diffs are the today's choice.  Raymond Hettinger made the sensible
suggestion that whichever is most informative for the particular patch
should be used, and Bob Ippolito pointed out that if CVS is replaced with
subversion, unified diffs will have better support.  The `patch submission
guidelines`_ will be updated at some point to reflect the preference for
unified diffs, although if your diff program doesn't support '-u', then
context diffs are ok - plain patches are, of course, not.

Contributing threads:

 - `Unified or context diffs?

.. _patch submission guidelines: http://www.python.org/patches/


Skipped Threads

- python-dev Summary for 2005-03-16 through 2005-03-31 [draft]
- [Python-checkins] python/dist/src/Lib/logging handlers.py, 1.19,
- [Python-checkins] python/dist/src/Modules mathmodule.c, 2.74, 2.75
- Weekly Python Patch/Bug Summary
- Mail.python.org
- New bug, directly assigned, okay?
- inconsistency when swapping obj.__dict__ with a dict-like object...
- Pickling instances of nested classes
- args attribute of Exception objects

Tim Lesher <tlesher at gmail.com>

More information about the Python-Dev mailing list