[Python-Dev] DRAFT: python-dev summary for 2006-11-01 to 2006-11-15

Steven Bethard steven.bethard at gmail.com
Thu Nov 23 07:48:44 CET 2006

Here's the summary for the first half of November. Try not to spend it
all in one place! ;-)

As always, corrections and comments are greatly appreciated.


Python 2.5 malloc families

Just a reminder that if you find your extension module is crashing
with Python 2.5 in malloc/free, there is a high chance that you have a
mismatch in malloc "families". Unlike previous versions, Python 2.5 no
longer allows sloppiness here -- if you allocate with the ``PyMem_*``
functions, you must free with the ``PyMem_*`` functions, and
similarly, if you allocate with the ``PyObject_*`` functions, you must
free with the ``PyObject_*`` functions.

Contributing thread:

- `2.5 portability problems


Path algebra and related functions

Mike Orr started work on a replacement for `PEP 355`_ that would
better group the path-related functions currently in ``os``,
``os.path``, ``shutil`` and other modules. He proposed to start with a
`directory-tuple Path class`_ that would have allowed code like::

    # equivalent to
    # os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")
    os.path.Path(__FILE__)[:-2] + "lib"

where a Path object would act like a tuple of directories, and could
be easily sliced and reordered as such.

As an alternative, glyph proposed using `Twisted's filepath module`_
which was already being used in a large body of code. He showed some
common pitfalls, like that the existence on Windows of "CON" and "NUL"
in *every* directory can make paths invalid, and indicated how
FilePath solved these problems.

Fredrik Lundh suggested a reorganization where functions that
manipulate path *names* would reside in ``os.path``, and functions
that manipulate *objects* identified by a path would reside in ``os``.
The ``os.path`` module would gain a path wrapper object, which would
allow "path algebra" manipulations, e.g. ``path1 + path2``. The ``os``
module would gain some of the ``os.path`` and ``shutil`` functions
that were manipulating real filesystem objects and not just the path
names. Most people seemed to like this approach, because it correctly
targeted the "algebraic" features at the areas where chained
operations were most common: path name operations, not filesystem

Some of the conversation moved on to the `Python 3000 list`_.

.. _PEP 355: http://www.python.org/dev/peps/pep-0355/
.. _directory-tuple Path class: http://wiki.python.org/moin/AlternativePathClass
.. _Twisted's filepath module:
.. _Python 3000 list: http://mail.python.org/mailman/listinfo/python-3000

Contributing threads:

- `Path object design
- `Mini Path object
- `[Python-3000] Mini Path object

Replacing urlparse

A few more bugs in ``urlparse`` were turned up, and `earlier
discussions about replacing urlparse`_ were briefly revisited. Paul
Jimenez asked about `uriparse module`_ and was told that due to the
constant problems with ``urlparse``, people were concerned about
including the "incorrect" library again, so requirements were a little
stringent. Martin v. Löwis gave him some guidance on a few specific
points, and Nick Coghlan promised to try to post his `urischemes
module`_ (a derivative of Paul's `uriparse module`_) to the `Python
Package Index`_.

.. _earlier discussions about replacing urlparse:
.. _uriparse module: http://bugs.python.org/1462525
.. _urischemes module: http://bugs.python.org/1500504
.. _Python Package Index: http://www.python.org/pypi

Contributing threads:

- `patch 1462525 or similar solution?
- `Path object design

Importing .py, .pyc and .pyo files

Martin v. Löwis brought up `Osvaldo Santana's patch`_ which would have
made Python search for both .pyc and .pyo files regardless of whether
or not the optimize flag, "-OO", was set (like zipimporter does).
Without this patch, when "-OO" was given, Python never looked for .pyc
files. Some people thought that an extra ``stat()`` call or directory
listing to check for the other file would be too expensive, but no one
profiled the various versions of the code so the cost was unclear.
People were leaning towards removing the extra functionality from
zipimporter so that at least it was consistent with the rest of

Giovanni Bajo suggested that .pyo file support should be dropped
completely, with .pyc files being compiled at various levels of
optimization depending on the command line flags. To make sure all
your .pyc files were compiled at the same level of optimization, you'd
use a new "-I" flag to indicate that all files should be recompiled,
e.g. ``python -I -OO app.py``.

Armin Rigo suggested only loading files with a .py extension. Python
would still generate .pyc files as a means of caching bytecode for
speed reasons, but it would never import them without a corresponding
.py file around. For people wanting to ship just bytecode, the cached
.pyc files could be renamed to .py files and then those could be
shipped and imported.

There was some support for Armin's solution, but it was not overwhelming.

.. _Osvaldo Santana's patch: http://bugs.python.org/1346572

Contributing thread:

- `Importing .pyc in -O mode and vice versa

The buffer protocol and communicating binary format information

The discussion of extending the buffer protocol to more binary formats
continued this fortnight. Though the PIL_ had been used as an example
of a library that could benefit from an extended buffer protocol,
Fredrik Lundh indicated that future versions of the PIL_ would make
the binary data model completely opaque, and instead provide a
view-style API like::

    view = object.acquire_view(region, supported formats)
    ... access data in view ...

Along these lines, the discussion turned away from the particular C
formats used in ``ctypes``, ``numpy``, ``array``, etc. and more
towards the best way to communicate format information between these
modules. Though it seemed like people were not completely happy with
the proposed API of the new buffer protocol, the discussion seemed to
skirt around any concrete suggestions for better APIs.

In the end, the only thing that seemed certain was that a new buffer
protocol could only be successful if it were implemented on all of the
appropriate stdlib modules: ``ctypes``, ``array``, ``struct``, etc.

.. _PIL: http://www.pythonware.com/products/pil/

Contributing threads:

- `PEP: Adding data-type objects to Python
- `PEP: Extending the buffer protocol to share array information.
- `idea for data-type (data-format) PEP

__dir__, part 2

Tomer Filiba continued his `previous investigations`_ into adding a
``__dir__()`` method to allow customization of the ``dir()`` builtin.
He moved most of the current ``dir()`` logic into
``object.__dir__()``, with some additional logic necessary for modules
and types being moved to ``ModuleType.__dir__()`` and
``type.__dir__()`` respectively. He posted a `patch for his
implementation`_ and it got approval for Python 2.6.

There was a brief discussion about whether or not it was okay for an
object to lie about its members, with Fredrik Lundh suggesting that
you should only be allowed to *add* to the result that ``dir()``
produces. Nick Coghlan pointed out that when a class overrides
``__getattribute__()``, attributes that the default ``dir()``
implementation sees can be blocked, in which case removing members
from the result of ``dir()`` might be quite appropriate.

.. _previous investigations:
.. _patch for his implementation: http://bugs.python.org/1591665

Contributing thread:

- `__dir__, part 2

Invalid read errors and valgrind

Using valgrind, Herman Geza found that he was getting some "Invalid
read" read errors in PyObject_Free which weren't identified as
acceptable in Misc/README.valgrind. Tim Peters and Martin v. Löwis
explained that these are okay if they are reads from
Py_ADDRESS_IN_RANGE. If the address given is Python's own memory, a
valid arena index is read. Otherwise, garbage is read (though this
read will never fail since Python always reads from the page where the
about-to-be-freed block is located). The arenas are then checked to
see whether the result was garbage or not.

Neal Norwitz promised to try to update Misc/README.valgrind with this

Contributing thread:

- `valgrind <http://mail.python.org/pipermail/python-dev/2006-November/069884.html>`__

SCons and cross-compilation

Martin v. Löwis reviewed a `patch for cross-compilation`_ which
proposed to use SCons_ instead of distutils because updating distutils
to work for cross-compilation would have involved some fairly major
changes. Distutils had certain notions of where to look for header
files and how to invoke the compiler which were incorrect for
cross-compilation, and which were difficult to change. While accepting
the patch would not have required SCons_ to be added to Python proper
(which a number of people opposed), people didn't like the idea of
having to update SCons configuration in addition to already having to
update setup.py, Modules/Setup and the PCbuild area. The patch was
therefore rejected.

.. _patch for cross-compilation: http://bugs.python.org/841454
.. _SCons: http://www.scons.org/

Contributing thread:

- `Using SCons for cross-compilation

Individual interpreter locks

Robert asked about having a separate lock for each interpreter
instance instead of the global interpreter lock (GIL). Brett Cannon
and Martin v. Löwis explained that a variety of objects are shared
between interpreters, including:

* extension modules
* type objects (including exception types)
* singletons like ``None``, ``True``, ``()``, strings of length 1, etc.
* many things in the sys module

A single lock for each interpreter would not be sufficient for
handling access to such shared objects.

Contributing thread:

- `Feature Request: Py_NewInterpreter to create separate GIL (branch)

Passing floats to file.seek

Python's implementation of ``file.seek`` was converting floats to
ints. `Robert Church suggested a patch`_ that would convert floats to
long longs and thus support files larger than 2GiB. Martin v. Löwis
proposed instead to use the ``__index__()`` API to support the large
files and to raise an exception for float arguments. Martin's approach
was approved, with a warning instead of an exception for Python 2.6.

.. _Robert Church suggested a patch: http://bugs.python.org/1067760

Contributing thread:

- `Passing floats to file.seek

The datetime module and timezone objects

Fredrik Lundh asked about including a ``tzinfo`` object implementation
for the ``datetime`` module, along the lines of the ``UTC``,
``FixedOffset`` and ``LocalTimezone`` classes from the `library
reference`_. A number of people reported having copied those classes
into their own code repeatedly, and so Fredrik got the go-ahead to put
them into Python 2.6.

.. _library reference: http://docs.python.org/lib/datetime-tzinfo.html

Contributing thread:

- `ready-made timezones for the datetime module

Deferred Threads
- `Summer of Code: zipfile?
- `Results of the SOC projects

Previous Summaries
- `The "lazy strings" patch [was: PATCH submitted: Speed up + for
string concatenation, now as fast as "".join(x) idiom]

Skipped Threads
- `RELEASED Python 2.3.6, FINAL
- `[Tracker-discuss] Getting Started
- `Status of pairing_heap.py?
- `Inconvenient filename in sandbox/decimal-c/new_dt
- `test_ucn fails for trunk on x86 Ubuntu Edgy
- `Weekly Python Patch/Bug Summary
- `Last chance to join the Summer of PyPy!
- `[Python-checkins] r52692 - in python/trunk: Lib/mailbox.py
Misc/NEWS <http://mail.python.org/pipermail/python-dev/2006-November/069919.html>`__
- `PyFAQ: help wanted with thread article
- `Arlington sprint this Saturday
- `Suggestion/ feature request

More information about the Python-Dev mailing list