python-dev Summary for 2005-04-01 through 2005-04-15 [draft]

Here's the first draft of the python-dev summary for the first half of April. Please send any corrections or suggestions to the summarizers. ====================== Summary Announcements ====================== --------------------------- New python-dev summary team --------------------------- This summary marks the first by the team of Steve Bethard, Tim Lesher, and Tony Meyer. We're trying a collaborative approach to the summaries: each fortnight, we'll be getting together in a virtual smoke-filled back room to divide up the interesting threads. Then we'll stitch together the summaries in roughly the same form as you've seen in the past. We'll mark each editor's entries with his initials. Thanks to Brett Cannon for sixty-one excellent python-dev summaries. Also, thanks for providing scripts to help get the new summaries off the ground! We're looking forward to the contributions you'll make to the Python core, now that the summaries aren't taking up all your time. [TDL] ========= Summaries ========= ---------------------- Right Operator Methods ---------------------- Greg Ewing explored an issue with new-style classes that define only right operator methods (__radd__, __rmul__, etc.) Instances of such a class cannot be added/multiplied/etc. together as Python raises a TypeError. Armin Rigo explained the rule: if the instances on both sides of an operator are of the same class, only the non-reversed method is ever called. Armin also explained that an __add__ or __mul__ method that returns NotImplemented may be called twice when Python attempts to differentiate between numeric and sequence operations. Contributing threads: - `New style classes and operator methods <http://mail.python.org/pipermail/python-dev/2005-April/052577.html>`__ [SJB] ------------------------------------------ Hierarchical groups in regular expressions ------------------------------------------ Chris Ottrey demoed his `pyre2 project`_ that can extract a hierarchy of strings when nested groups match in a regular expression. The current re module (in the stdlib) only matches the last occurrence of a group in the string, throwing away any preceding matches. People discussed some of pyre2's proposed API, with the main suggestion being to extend the API to support unnamed (positional) groups in addition to named groups. Though a number of people expressed interest in the idea, it was not clear whether the functionality should be included in the standard library. However, most agreed that if it was included, it should be integrated with the existing re module. Gustavo Niemeyer offered to perform this integration if an API could be agreed upon. Further discussion was moved to the pyre2 `development wiki`_ and `mailing list`_. Contributing threads: - `hierarchicial named groups extension to the re library <http://mail.python.org/pipermail/python-dev/2005-April/052508.html>`__ .. _pyre2 project: http://pyre2.sourceforge.net/ .. _development wiki: http://py.redsoft.be/pyre2/wiki/ .. _mailing list: http://lists.sourceforge.net/lists/listinfo/pyre2-devel [SJB] ------------------------------- Security capabilities in Python ------------------------------- The issue of security came up again, and Ka-Ping Yee suggested that in Python's restricted execution mode secure proxies can be created by using lexical scoping. He posted `some code`_ for revealing only certain "facets" of an object by using a function to declare a proxy class that used function local variables to build the proxy. Thus to access the attributes used in the proxy class, you need to access things like im_func or func_closure, which are not accessible in restricted execution mode. James Y Knight illustrated how strategic overriding of __eq__ in a subclass of str could allow access to the hidden "facets". Eyal Lotem suggested that such an attack could be countered by implementing "facets" in C, but having to turn to C every time you needed a particular security construct seemed unappealing. Contributing threads: - `Security capabilities in Python <http://mail.python.org/pipermail/python-dev/2005-April/052580.html>`__ .. _some code: http://zesty.ca/python/facet.py [SJB] --------------------------------- Improving GilState API Robustness --------------------------------- Michael Hudson noted that his changes to thread handling in the readline module appeared to trigger `bug 1176893`_ ("Readline segfault"). However, he believed the problem lay in the GilState API, rather than in his changes: PyGilState_Release crashes if PyEval_InitThreads wasn't called, even if the code you're writing doesn't use multiple threads. He proposed several solutions, none of which met with resounding approbation, and Tim Peters noted that `PEP 311`_, Simplified Global Interpreter Lock Acquisition for Extensions, "specifically disowns responsibility for worrying about whether Py_Initialize and PyEval_InitThreads have been called." Bob Ippolito wondered whether just calling PyEval_InitThreads directly in Py_Initialize might be a better idea. No objections were raised, so long as the underlying OS locking mechanisms weren't overly expensive; some initial benchmarks indicated that this approach was viable, at least on Linux and OS X. Contributing threads: - `threading (GilState) question <http://mail.python.org/pipermail/python-dev/2005-April/052562.html>`__ .. _bug 1176893: http://sourceforge.net/tracker/index.php?func=detail&aid=1176893&group_id=5470&atid=105470 .. _PEP 311: http://www.python.org/peps/pep-0311.html [TDL] ---------------------------------------- Unicode byte order mark decoding ---------------------------------------- Evan Jones saw that the UTF-16 decoder discards the byte-order mark (BOM) from Unicode files, while the UTF-8 decoder doesn't. Although the BOM isn't really required in UTF-8 files, many Unicode-generating applications, especially on Microsoft platforms, add it. Walter Dörwald created a patch_ to add a UTF-8-Sig codec that generates a BOM on writing and skips it on reading, but after a long discussion on the history of the Unicode, Microsoft's influence over its evolution, the consensus was that BOM and signature handling belong at a higher level (for example, a stream API) than the codec. Contributing threads: - `Unicode byte order mark decoding <http://mail.python.org/pipermail/python-dev/2005-April/052501.html>`__ .. _patch: http://sourceforge.net/tracker/index.php?func=detail&aid=1177307&group_id=5470&atid=305470 [TDL] --------------- Developers List --------------- Raymond Hettinger has started a `project to track developers`_ and the (tracker and commit) privileges they have, and who gave them the privileges, and why (for example, was it for a one-shot project). Removing inactive developers should improve clarity, institutional memory, security, and makes everything tidier. Raymond has begun contacting recently inactive developers to check whether they still require the privileges they have. Contributing threads: - `Developer list update <http://mail.python.org/pipermail/python-dev/2005-April/052540.html>`__ .. _project to track developers: http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/dist/src/Misc... [TAM] -------------------- Marshalling Infinity -------------------- Scott David Daniels kicked off a very long thread by asking what (un)marshal should do with floating point NaNs. The current behaviour (as with any NaN, infinity, or signed zero) is undefined: a platform-dependant accident, because Python is written to C89, which has no such concepts. Tim Peters pointed out all code for (de)serialing C doubles should go through _PyFloat_Pack8()/_PyFloat_Unpack8(), and that the current implementation suggests that the routines could simply copy bytes on platforms that use the standard IEEE-754 single and double formats natively. Michael Hudson obliged by creating a `patch to implement this`_. The consensus was that the correct behaviour is that packing a NaN or infinity shouldn't cause an exception. When unpacking, an IEEE-754 platform shouldn't cause an exception, but a non-754 platform should, since there's no sensible value that it can be unpacked to, and errors should never pass silently. Contributing threads: - `marshal / unmarshal <http://mail.python.org/pipermail/python-dev/2005-April/052593.html>`__ .. _patch to implement this: http://python.org/sf/1181301 [TAM] --------------------------------- Location of the sign bit in longs --------------------------------- Michael Hudson asked about the possibility of longs storing the sign bit somewhere other than the current location, suggesting the top bit of ob_digit[0]. Tim Peters suggested that it would be better to give struct _longobject a distinct sign member. This simplifies code, costs no extra bytes for some longs, and 8 extra bytes for others, and shouldn't hurt binary compatibility. Michael coughed up a `longobject patch`_, which seems likely to be checked in. Contributing threads: - `marshal / unmarshal <http://mail.python.org/pipermail/python-dev/2005-April/052593.html>`__ .. _longobject patch: http://python.org/sf/1177779 [TAM] ----------------------- Acceptable diff formats ----------------------- Nick Coghlan asked if context diffs are still favoured for patches. Historically, context diffs were preferred, but it appears that unified diffs are the today's choice. Raymond Hettinger made the sensible suggestion that whichever is most informative for the particular patch should be used, and Bob Ippolito pointed out that if CVS is replaced with subversion, unified diffs will have better support. The `patch submission guidelines`_ will be updated at some point to reflect the preference for unified diffs, although if your diff program doesn't support '-u', then context diffs are ok - plain patches are, of course, not. Contributing threads: - `Unified or context diffs? <http://mail.python.org/pipermail/python-dev/2005-April/052657.html>`__ .. _patch submission guidelines: http://www.python.org/patches/ [TAM] =============== Skipped Threads =============== - python-dev Summary for 2005-03-16 through 2005-03-31 [draft] - [Python-checkins] python/dist/src/Lib/logging handlers.py, 1.19, 1.19.2.1 - [Python-checkins] python/dist/src/Modules mathmodule.c, 2.74, 2.75 - Weekly Python Patch/Bug Summary - Mail.python.org - New bug, directly assigned, okay? - inconsistency when swapping obj.__dict__ with a dict-like object... - Pickling instances of nested classes - args attribute of Exception objects -- Tim Lesher <tlesher@gmail.com>

====================== Summary Announcements ======================
Executive summary: Hudson goes wild fixing obscure bugs.
--------------------------- New python-dev summary team ---------------------------
This summary marks the first by the team of Steve Bethard, Tim Lesher, and Tony Meyer. We're trying a collaborative approach to the summaries: each fortnight, we'll be getting together in a virtual smoke-filled back room to divide up the interesting threads.
Both your process and results are excellent. Raymond Hettinger

Tim Lesher <tlesher@gmail.com> writes:
Here's the first draft of the python-dev summary for the first half of April. Please send any corrections or suggestions to the summarizers.
====================== Summary Announcements ======================
--------------------------- New python-dev summary team ---------------------------
This summary marks the first by the team of Steve Bethard, Tim Lesher, and Tony Meyer.
Nice work! An update:
--------------------------------- Improving GilState API Robustness ---------------------------------
Michael Hudson noted that his changes to thread handling in the readline module appeared to trigger `bug 1176893`_ ("Readline segfault"). However, he believed the problem lay in the GilState API, rather than in his changes: PyGilState_Release crashes if PyEval_InitThreads wasn't called, even if the code you're writing doesn't use multiple threads.
He proposed several solutions, none of which met with resounding approbation,
Nevertheless, I've checked one of them in :) After reading a fair bit of code, and docs, I went for option 2) in the linked mail.
and Tim Peters noted that `PEP 311`_, Simplified Global Interpreter Lock Acquisition for Extensions, "specifically disowns responsibility for worrying about whether Py_Initialize and PyEval_InitThreads have been called."
I think this reading is a bit of a stretch of the wording of the PEP. It also contradicts the documentation ("regardless of the current state of Python"). Finally, the current behaviour has a strong whiff of being accidental.
-------------------- Marshalling Infinity --------------------
Scott David Daniels kicked off a very long thread by asking what (un)marshal should do with floating point NaNs. The current behaviour (as with any NaN, infinity, or signed zero) is undefined: a platform-dependant accident, because Python is written to C89, which has no such concepts. Tim Peters pointed out all code for (de)serialing C doubles should go through _PyFloat_Pack8()/_PyFloat_Unpack8(), and that the current implementation suggests that the routines could simply copy bytes on platforms that use the standard IEEE-754 single and double formats natively. Michael Hudson obliged by creating a `patch to implement this`_.
I hope to check this in soon. Note that the patch is in two pieces, one to marshal floats in binary format and one ...
The consensus was that the correct behaviour is that packing a NaN or infinity shouldn't cause an exception. When unpacking, an IEEE-754 platform shouldn't cause an exception, but a non-754 platform should, since there's no sensible value that it can be unpacked to, and errors should never pass silently.
... to do this bit.
--------------------------------- Location of the sign bit in longs ---------------------------------
Michael Hudson asked about the possibility of longs storing the sign bit somewhere other than the current location, suggesting the top bit of ob_digit[0]. Tim Peters suggested that it would be better to give struct _longobject a distinct sign member. This simplifies code, costs no extra bytes for some longs, and 8 extra bytes for others, and shouldn't hurt binary compatibility.
Michael coughed up a `longobject patch`_, which seems likely to be checked in.
I'm actually in less of a rush to get this one in :) (Hmm, had a busy couple of weeks, didn't I? :)
Contributing threads:
- `marshal / unmarshal <http://mail.python.org/pipermail/python-dev/2005-April/052593.html>`__
? Cheers, mwh -- <wzZzy> we should write an os <itamar> YES * itamar starts a sourceforge project -- from Twisted.Quotes

On Mon, Apr 18, 2005, Tim Lesher wrote:
Here's the first draft of the python-dev summary for the first half of April. Please send any corrections or suggestions to the summarizers.
<applause!> Good show! One suggestion: might want to order threads in order of relevance to random python-dev readers (the bit that triggered this comment was seeing the unified vs. context diffs thread so far down). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR

Tim Lesher wrote:
Here's the first draft of the python-dev summary for the first half of April. Please send any corrections or suggestions to the summarizers.
====================== Summary Announcements ======================
--------------------------- New python-dev summary team ---------------------------
This summary marks the first by the team of Steve Bethard, Tim Lesher, and Tony Meyer. We're trying a collaborative approach to the summaries: each fortnight, we'll be getting together in a virtual smoke-filled back room to divide up the interesting threads. Then we'll stitch together the summaries in roughly the same form as you've seen in the past. We'll mark each editor's entries with his initials.
Woohoo! Once again, thanks for doing this guys.
Thanks to Brett Cannon for sixty-one excellent python-dev summaries. Also, thanks for providing scripts to help get the new summaries off the ground! We're looking forward to the contributions you'll make to the Python core, now that the summaries aren't taking up all your time.
Gee, no pressure. =) [SNIP]
------------------------------- Security capabilities in Python -------------------------------
The issue of security came up again, and Ka-Ping Yee suggested that in Python's restricted execution mode secure proxies can be created by using lexical scoping. He posted `some code`_ for revealing only certain "facets" of an object by using a function to declare a proxy class that used function local variables to build the proxy. Thus to
"... that used a function's local variables ..." [SNIP]
--------------------------------- Improving GilState API Robustness ---------------------------------
Michael Hudson noted that his changes to thread handling in the readline module appeared to trigger `bug 1176893`_ ("Readline segfault"). However, he believed the problem lay in the GilState API, rather than in his changes: PyGilState_Release crashes if PyEval_InitThreads wasn't called, even if the code you're writing doesn't use multiple threads.
He proposed several solutions, none of which met with resounding approbation, and Tim Peters noted that `PEP 311`_, Simplified Global Interpreter Lock Acquisition for Extensions, "specifically disowns responsibility for worrying about whether Py_Initialize and PyEval_InitThreads have been called."
Bob Ippolito wondered whether just calling PyEval_InitThreads directly in Py_Initialize might be a better idea. No objections were raised, so long as the underlying OS locking mechanisms weren't overly expensive; some initial benchmarks indicated that this approach was viable, at least on Linux and OS X.
Contributing threads:
- `threading (GilState) question <http://mail.python.org/pipermail/python-dev/2005-April/052562.html>`__
.. _bug 1176893: http://sourceforge.net/tracker/index.php?func=detail&aid=1176893&group_id=5470&atid=105470
For any tracker item, the easiest way to do a URL is to use the python.org shortcut: http://www.python.org/sf/##### . So the above would be http://www.python.org/sf/1176893 .
.. _PEP 311: http://www.python.org/peps/pep-0311.html
[TDL]
---------------------------------------- Unicode byte order mark decoding ----------------------------------------
Evan Jones saw that the UTF-16 decoder discards the byte-order mark (BOM) from Unicode files, while the UTF-8 decoder doesn't. Although the BOM isn't really required in UTF-8 files, many Unicode-generating applications, especially on Microsoft platforms, add it.
Walter Dörwald created a patch_ to add a UTF-8-Sig codec that generates a BOM on writing and skips it on reading, but after a long discussion on the history of the Unicode, Microsoft's influence over its
"... of Unicode and Microsoft's influence ..." [SNIP]
--------------- Developers List ---------------
Raymond Hettinger has started a `project to track developers`_ and the (tracker and commit) privileges they have, and who gave them the privileges, and why (for example, was it for a one-shot project). Removing inactive developers should improve clarity, institutional memory, security, and makes everything tidier. Raymond has begun contacting recently inactive developers to check whether they still require the privileges they have.
Contributing threads:
- `Developer list update <http://mail.python.org/pipermail/python-dev/2005-April/052540.html>`__
.. _project to track developers: http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/dist/src/Misc...
[TAM]
-------------------- Marshalling Infinity --------------------
Scott David Daniels kicked off a very long thread by asking what (un)marshal should do with floating point NaNs. The current behaviour (as with any NaN, infinity, or signed zero) is undefined: a platform-dependant accident, because Python is written to C89, which has no such concepts. Tim Peters pointed out all code for (de)serialing C doubles should go through _PyFloat_Pack8()/_PyFloat_Unpack8(), and that the current implementation suggests that the routines could simply copy bytes on platforms that use the standard IEEE-754 single and double formats natively. Michael Hudson obliged by creating a `patch to implement this`_.
The consensus was that the correct behaviour is that packing a NaN or
"... behavious of packing a NaN ..." [SNIP] Well done guys! Very impressed; succinct, clear, and a ton less errors then I used to put into the first draft. =) When you are happy with the draft just email me the plaintext and I will get it up on python.org for you. -Brett

Tim Lesher sagte:
Here's the first draft of the python-dev summary for the first half of April. Please send any corrections or suggestions to the summarizers. [...] ---------------------------------------- Unicode byte order mark decoding ----------------------------------------
Evan Jones saw that the UTF-16 decoder discards the byte-order mark (BOM) from Unicode files, while the UTF-8 decoder doesn't. Although the BOM isn't really required in UTF-8 files, many Unicode-generating applications, especially on Microsoft platforms, add it.
Walter Dörwald created a patch_ to add a UTF-8-Sig codec that generates a BOM on writing and skips it on reading, but after a long discussion on the history of the Unicode, Microsoft's influence over its evolution, the consensus was that BOM and signature handling belong at a higher level (for example, a stream API) than the codec.
All codecs provide a stream API, so there is no higher level. Bye, Walter Dörwald
participants (6)
-
Aahz
-
Brett C.
-
Michael Hudson
-
Raymond Hettinger
-
Tim Lesher
-
Walter Dörwald