The current memory layout for dictionaries is
unnecessarily inefficient. It has a sparse table of
24-byte entries containing the hash value, key pointer,
and value pointer.
Instead, the 24-byte entries should be stored in a
dense table referenced by a sparse table of indices.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as:
entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]
Only the data layout needs to change. The hash table
algorithms would stay the same. All of the current
optimizations would be kept, including key-sharing
dicts and custom lookup functions for string-only
dicts. There is no change to the hash functions, the
table search order, or collision statistics.
The memory savings are significant (from 30% to 95%
compression depending on the how full the table is).
Small dicts (size 0, 1, or 2) get the most benefit.
For a sparse table of size t with n entries, the sizes are:
curr_size = 24 * t
new_size = 24 * n + sizeof(index) * t
In the above timmy/barry/guido example, the current
size is 192 bytes (eight 24-byte entries) and the new
size is 80 bytes (three 24-byte entries plus eight
1-byte indices). That gives 58% compression.
Note, the sizeof(index) can be as small as a single
byte for small dicts, two bytes for bigger dicts and
up to sizeof(Py_ssize_t) for huge dict.
In addition to space savings, the new memory layout
makes iteration faster. Currently, keys(), values, and
items() loop over the sparse table, skipping-over free
slots in the hash table. Now, keys/values/items can
loop directly over the dense table, using fewer memory
accesses.
Another benefit is that resizing is faster and
touches fewer pieces of memory. Currently, every
hash/key/value entry is moved or copied during a
resize. In the new layout, only the indices are
updated. For the most part, the hash/key/value entries
never move (except for an occasional swap to fill a
hole left by a deletion).
With the reduced memory footprint, we can also expect
better cache utilization.
For those wanting to experiment with the design,
there is a pure Python proof-of-concept here:
http://code.activestate.com/recipes/578375
YMMV: Keep in mind that the above size statics assume a
build with 64-bit Py_ssize_t and 64-bit pointers. The
space savings percentages are a bit different on other
builds. Also, note that in many applications, the size
of the data dominates the size of the container (i.e.
the weight of a bucket of water is mostly the water,
not the bucket).
Raymond
I've received some enthusiastic emails from someone who wants to
revive restricted mode. He started out with a bunch of patches to the
CPython runtime using ctypes, which he attached to an App Engine bug:
http://code.google.com/p/googleappengine/issues/detail?id=671
Based on his code (the file secure.py is all you need, included in
secure.tar.gz) it seems he believes the only security leaks are
__subclasses__, gi_frame and gi_code. (I have since convinced him that
if we add "restricted" guards to these attributes, he doesn't need the
functions added to sys.)
I don't recall the exploits that Samuele once posted that caused the
death of rexec.py -- does anyone recall, or have a pointer to the
threads?
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
It appears there's no obvious link from bugs.python.org to the contributor
agreement - you need to go via the unintuitive link Foundation ->
Contribution Forms (and from what I've read, you're prompted when you add a
patch to the tracker).
I'd suggest that if the "Contributor Form Received" field is "No" in user
details, there be a link to http://www.python.org/psf/contrib/.
Tim Delaney
Hello.
I would like to discuss on the language summit a potential inclusion
of cffi[1] into stdlib. This is a project Armin Rigo has been working
for a while, with some input from other developers. It seems that the
main reason why people would prefer ctypes over cffi these days is
"because it's included in stdlib", which is not generally the reason I
would like to hear. Our calls to not use C extensions and to use an
FFI instead has seen very limited success with ctypes and quite a lot
more since cffi got released. The API is fairly stable right now with
minor changes going in and it'll definitely stablize until Python 3.4
release. Notable projects using it:
* pypycore - gevent main loop ported to cffi
* pgsql2cffi
* sdl-cffi bindings
* tls-cffi bindings
* lxml-cffi port
* pyzmq
* cairo-cffi
* a bunch of others
So relatively a lot given that the project is not even a year old (it
got 0.1 release in June). As per documentation, the advantages over
ctypes:
* The goal is to call C code from Python. You should be able to do so
without learning a 3rd language: every alternative requires you to
learn their own language (Cython, SWIG) or API (ctypes). So we tried
to assume that you know Python and C and minimize the extra bits of
API that you need to learn.
* Keep all the Python-related logic in Python so that you don’t need
to write much C code (unlike CPython native C extensions).
* Work either at the level of the ABI (Application Binary Interface)
or the API (Application Programming Interface). Usually, C libraries
have a specified C API but often not an ABI (e.g. they may document a
“struct” as having at least these fields, but maybe more). (ctypes
works at the ABI level, whereas Cython and native C extensions work at
the API level.)
* We try to be complete. For now some C99 constructs are not
supported, but all C89 should be, including macros (and including
macro “abuses”, which you can manually wrap in saner-looking C
functions).
* We attempt to support both PyPy and CPython, with a reasonable path
for other Python implementations like IronPython and Jython.
* Note that this project is not about embedding executable C code in
Python, unlike Weave. This is about calling existing C libraries from
Python.
so among other things, making a cffi extension gives you the same
level of security as writing C (and unlike ctypes) and brings quite a
bit more flexibility (API vs ABI issue) that let's you wrap arbitrary
libraries, even those full of macros.
Cheers,
fijal
.. [1]: http://cffi.readthedocs.org/en/release-0.5/
hi, everyone:
I want to compile python 3.3 with bz2 support on RedHat 5.5 but fail to do that. Here is how I do it:
1、download bzip2 and compile it(make、make -f Makefile_libbz2_so、make install)
2、chang to python 3.3 source directory : ./configure --with-bz2=/usr/local/include
3、make
4、make install
after installation complete, I test it:
[root@localhost Python-3.3.0]# python3 -c "import bz2"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.3/bz2.py", line 21, in <module>
from _bz2 import BZ2Compressor, BZ2Decompressor
ImportError: No module named '_bz2'
By the way, RedHat 5.5 has a built-in python 2.4.3. Would it be a problem?
Hi all,
I've been looking for a Github mirror for Python, and found two:
* https://github.com/python-git/python has a lot of forks/watches/starts
but seems to be very out of date (last updated 4 years ago)
* https://github.com/python-mirror/python doesn't appear to be very popular
but is updated daily
Are some of you the owners of these repositories? Should we consolidate to
a single "semi-official" mirror?
Eli
Hi,
This PEP proposed to add a __locallookup__ slot to type objects,
which is used by _PyType_Lookup and super_getattro instead of peeking
in the tp_dict of classes. The PEP text explains why this is needed.
Differences with the previous version:
* Better explanation of why this is a useful addition
* type.__locallookup__ is no longer optional.
* I've added benchmarking results using pybench.
(using the patch attached to issue 18181)
Ronald
PEP: 447
Title: Add __locallookup__ method to metaclass
Version: $Revision$
Last-Modified: $Date$
Author: Ronald Oussoren <ronaldoussoren(a)mac.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Jun-2013
Post-History: 2-Jul-2013, 15-Jul-2013, 29-Jul-2013
Abstract
========
Currently ``object.__getattribute__`` and ``super.__getattribute__`` peek
in the ``__dict__`` of classes on the MRO for a class when looking for
an attribute. This PEP adds an optional ``__locallookup__`` method to
a metaclass that can be used to override this behavior.
Rationale
=========
It is currently not possible to influence how the `super class`_ looks
up attributes (that is, ``super.__getattribute__`` unconditionally
peeks in the class ``__dict__``), and that can be problematic for
dynamic classes that can grow new methods on demand.
The ``__locallookup__`` method makes it possible to dynamicly add
attributes even when looking them up using the `super class`_.
The new method affects ``object.__getattribute__`` (and
`PyObject_GenericGetAttr`_) as well for consistency.
Background
----------
The current behavior of ``super.__getattribute__`` causes problems for
classes that are dynamic proxies for other (non-Python) classes or types,
an example of which is `PyObjC`_. PyObjC creates a Python class for every
class in the Objective-C runtime, and looks up methods in the Objective-C
runtime when they are used. This works fine for normal access, but doesn't
work for access with ``super`` objects. Because of this PyObjC currently
includes a custom ``super`` that must be used with its classes.
The API in this PEP makes it possible to remove the custom ``super`` and
simplifies the implementation because the custom lookup behavior can be
added in a central location.
The superclass attribute lookup hook
====================================
Both ``super.__getattribute__`` and ``object.__getattribute__`` (or
`PyObject_GenericGetAttr`_ in C code) walk an object's MRO and peek in the
class' ``__dict__`` to look up attributes. A way to affect this lookup is
using a method on the meta class for the type, that by default looks up
the name in the class ``__dict__``.
In Python code
--------------
A meta type can define a method ``__locallookup__`` that is called during
attribute resolution by both ``super.__getattribute__`` and ``object.__getattribute``::
class MetaType(type):
def __locallookup__(cls, name):
try:
return cls.__dict__[name]
except KeyError:
raise AttributeError(name) from None
The ``__locallookup__`` method has as its arguments a class and the name of the attribute
that is looked up. It should return the value of the attribute without invoking descriptors,
or raise `AttributeError`_ when the name cannot be found.
The `type`_ class provides a default implementation for ``__locallookup__``, that
looks up the name in the class dictionary.
Example usage
.............
The code below implements a silly metaclass that redirects attribute lookup to uppercase
versions of names::
class UpperCaseAccess (type):
def __locallookup__(cls, name):
return cls.__dict__[name.upper()]
class SillyObject (metaclass=UpperCaseAccess):
def m(self):
return 42
def M(self):
return "fourtytwo"
obj = SillyObject()
assert obj.m() == "fortytwo"
In C code
---------
A new slot ``tp_locallookup`` is added to the ``PyTypeObject`` struct, this slot
corresponds to the ``__locallookup__`` method on `type`_.
The slot has the following prototype::
PyObject* (*locallookupfunc)(PyTypeObject* cls, PyObject* name);
This method should lookup *name* in the namespace of *cls*, without looking at superclasses,
and should not invoke descriptors. The method returns ``NULL`` without setting an exception
when the *name* cannot be found, and returns a new reference otherwise (not a borrowed reference).
Use of this hook by the interpreter
-----------------------------------
The new method is required for metatypes and as such is defined on `type_`. Both
``super.__getattribute__`` and ``object.__getattribute__``/`PyObject_GenericGetAttr`_
(through ``_PyType_Lookup``) use the this ``__locallookup__`` method when walking
the MRO.
Other changes to the implementation
-----------------------------------
The change for `PyObject_GenericGetAttr`_ will be done by changing the private function
``_PyType_Lookup``. This currently returns a borrowed reference, but must return a new
reference when the ``__locallookup__`` method is present. Because of this ``_PyType_Lookup``
will be renamed to ``_PyType_LookupName``, this will cause compile-time errors for all out-of-tree
users of this private API.
The attribute lookup cache in ``Objects/typeobject.c`` is disabled for classes that have a
metaclass that overrides ``__locallookup__``, because using the cache might not be valid
for such classes.
Performance impact
------------------
The pybench output below compares an implementation of this PEP with the regular
source tree, both based on changeset a5681f50bae2, run on an idle machine an
Core i7 processor running Centos 6.4.
Even though the machine was idle there were clear differences between runs,
I've seen difference in "minimum time" vary from -0.1% to +1.5%, with simular
(but slightly smaller) differences in the "average time" difference.
.. ::
-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.4.0a0 (default, Jul 29 2013, 13:01:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.perf_counter
* timer: resolution=1e-09, implementation=clock_gettime(CLOCK_MONOTONIC)
-------------------------------------------------------------------------------
Benchmark: pep447.pybench
-------------------------------------------------------------------------------
Rounds: 10
Warp: 10
Timer: time.perf_counter
Machine Details:
Platform ID: Linux-2.6.32-358.114.1.openstack.el6.x86_64-x86_64-with-centos-6.4-Final
Processor: x86_64
Python:
Implementation: CPython
Executable: /tmp/default-pep447/bin/python3
Version: 3.4.0a0
Compiler: GCC 4.4.7 20120313 (Red Hat 4.4.7-3)
Bits: 64bit
Build: Jul 29 2013 14:09:12 (#default)
Unicode: UCS4
-------------------------------------------------------------------------------
Comparing with: default.pybench
-------------------------------------------------------------------------------
Rounds: 10
Warp: 10
Timer: time.perf_counter
Machine Details:
Platform ID: Linux-2.6.32-358.114.1.openstack.el6.x86_64-x86_64-with-centos-6.4-Final
Processor: x86_64
Python:
Implementation: CPython
Executable: /tmp/default/bin/python3
Version: 3.4.0a0
Compiler: GCC 4.4.7 20120313 (Red Hat 4.4.7-3)
Bits: 64bit
Build: Jul 29 2013 13:01:34 (#default)
Unicode: UCS4
Test minimum run-time average run-time
this other diff this other diff
-------------------------------------------------------------------------------
BuiltinFunctionCalls: 45ms 44ms +1.3% 45ms 44ms +1.3%
BuiltinMethodLookup: 26ms 27ms -2.4% 27ms 27ms -2.2%
CompareFloats: 33ms 34ms -0.7% 33ms 34ms -1.1%
CompareFloatsIntegers: 66ms 67ms -0.9% 66ms 67ms -0.8%
CompareIntegers: 51ms 50ms +0.9% 51ms 50ms +0.8%
CompareInternedStrings: 34ms 33ms +0.4% 34ms 34ms -0.4%
CompareLongs: 29ms 29ms -0.1% 29ms 29ms -0.0%
CompareStrings: 43ms 44ms -1.8% 44ms 44ms -1.8%
ComplexPythonFunctionCalls: 44ms 42ms +3.9% 44ms 42ms +4.1%
ConcatStrings: 33ms 33ms -0.4% 33ms 33ms -1.0%
CreateInstances: 47ms 48ms -2.9% 47ms 49ms -3.4%
CreateNewInstances: 35ms 36ms -2.5% 36ms 36ms -2.5%
CreateStringsWithConcat: 69ms 70ms -0.7% 69ms 70ms -0.9%
DictCreation: 52ms 50ms +3.1% 52ms 50ms +3.0%
DictWithFloatKeys: 40ms 44ms -10.1% 43ms 45ms -5.8%
DictWithIntegerKeys: 32ms 36ms -11.2% 35ms 37ms -4.6%
DictWithStringKeys: 29ms 34ms -15.7% 35ms 40ms -11.0%
ForLoops: 30ms 29ms +2.2% 30ms 29ms +2.2%
IfThenElse: 38ms 41ms -6.7% 38ms 41ms -6.9%
ListSlicing: 36ms 36ms -0.7% 36ms 37ms -1.3%
NestedForLoops: 43ms 45ms -3.1% 43ms 45ms -3.2%
NestedListComprehensions: 39ms 40ms -1.7% 39ms 40ms -2.1%
NormalClassAttribute: 86ms 82ms +5.1% 86ms 82ms +5.0%
NormalInstanceAttribute: 42ms 42ms +0.3% 42ms 42ms +0.0%
PythonFunctionCalls: 39ms 38ms +3.5% 39ms 38ms +2.8%
PythonMethodCalls: 51ms 49ms +3.0% 51ms 50ms +2.8%
Recursion: 67ms 68ms -1.4% 67ms 68ms -1.4%
SecondImport: 41ms 36ms +12.5% 41ms 36ms +12.6%
SecondPackageImport: 45ms 40ms +13.1% 45ms 40ms +13.2%
SecondSubmoduleImport: 92ms 95ms -2.4% 95ms 98ms -3.6%
SimpleComplexArithmetic: 28ms 28ms -0.1% 28ms 28ms -0.2%
SimpleDictManipulation: 57ms 57ms -1.0% 57ms 58ms -1.0%
SimpleFloatArithmetic: 29ms 28ms +4.7% 29ms 28ms +4.9%
SimpleIntFloatArithmetic: 37ms 41ms -8.5% 37ms 41ms -8.7%
SimpleIntegerArithmetic: 37ms 41ms -9.4% 37ms 42ms -10.2%
SimpleListComprehensions: 33ms 33ms -1.9% 33ms 34ms -2.9%
SimpleListManipulation: 28ms 30ms -4.3% 29ms 30ms -4.1%
SimpleLongArithmetic: 26ms 26ms +0.5% 26ms 26ms +0.5%
SmallLists: 40ms 40ms +0.1% 40ms 40ms +0.1%
SmallTuples: 46ms 47ms -2.4% 46ms 48ms -3.0%
SpecialClassAttribute: 126ms 120ms +4.7% 126ms 121ms +4.4%
SpecialInstanceAttribute: 42ms 42ms +0.6% 42ms 42ms +0.8%
StringMappings: 94ms 91ms +3.9% 94ms 91ms +3.8%
StringPredicates: 48ms 49ms -1.7% 48ms 49ms -2.1%
StringSlicing: 45ms 45ms +1.4% 46ms 45ms +1.5%
TryExcept: 23ms 22ms +4.9% 23ms 22ms +4.8%
TryFinally: 32ms 32ms -0.1% 32ms 32ms +0.1%
TryRaiseExcept: 17ms 17ms +0.9% 17ms 17ms +0.5%
TupleSlicing: 49ms 48ms +1.1% 49ms 49ms +1.0%
WithFinally: 48ms 47ms +2.3% 48ms 47ms +2.4%
WithRaiseExcept: 45ms 44ms +0.8% 45ms 45ms +0.5%
-------------------------------------------------------------------------------
Totals: 2284ms 2287ms -0.1% 2306ms 2308ms -0.1%
(this=pep447.pybench, other=default.pybench)
Alternative proposals
---------------------
``__getattribute_super__``
..........................
An earlier version of this PEP used the following static method on classes::
def __getattribute_super__(cls, name, object, owner): pass
This method performed name lookup as well as invoking descriptors and was necessarily
limited to working only with ``super.__getattribute__``.
Reuse ``tp_getattro``
.....................
It would be nice to avoid adding a new slot, thus keeping the API simpler and
easier to understand. A comment on `Issue 18181`_ asked about reusing the
``tp_getattro`` slot, that is super could call the ``tp_getattro`` slot of all
methods along the MRO.
That won't work because ``tp_getattro`` will look in the instance
``__dict__`` before it tries to resolve attributes using classes in the MRO.
This would mean that using ``tp_getattro`` instead of peeking the class
dictionaries changes the semantics of the `super class`_.
References
==========
* `Issue 18181`_ contains a prototype implementation
Copyright
=========
This document has been placed in the public domain.
.. _`Issue 18181`: http://bugs.python.org/issue18181
.. _`super class`: http://docs.python.org/3/library/functions.html#super
.. _`NotImplemented`: http://docs.python.org/3/library/constants.html#NotImplemented
.. _`PyObject_GenericGetAttr`: http://docs.python.org/3/c-api/object.html#PyObject_GenericGetAttr
.. _`type`: http://docs.python.org/3/library/functions.html#type
.. _`AttributeError`: http://docs.python.org/3/library/exceptions.html#AttributeError
.. _`PyObjC`: http://pyobjc.sourceforge.net/
.. _`classmethod`: http://docs.python.org/3/library/functions.html#classmethod
Hi all,
I have raised a tracker item and PEP for adding a statistics module to the standard library:
http://bugs.python.org/issue18606http://www.python.org/dev/peps/pep-0450/
There has been considerable discussion on python-ideas, which is now reflected by the PEP. I've signed the Contributor Agreement, and submitted a patch containing updated code and tests. The tests aren't yet integrated with the test runner but are runnable manually.
Can I request that people please look at this issue, with an aim to ruling on the PEP and (hopefully) adding the module to 3.4 before feature freeze? If it is accepted, I am willing to be primary maintainer for this module in the future.
Thanks,
--
Steven
Dear Python developers community,
Recently I found out that it not possible to debug python code if it
is a part of zip-module.
Python version being used is 3.3.0
Well known GUI debuggers like Eclipse+PyDev or PyCharm are unable to
start debugging and give the following warning:
---
pydev debugger: CRITICAL WARNING: This version of python seems to be
incorrectly compiled (internal generated filenames are not absolute)
pydev debugger: The debugger may still function, but it will work
slower and may miss breakpoints.
---
So I started my own investigation of this issue and results are the following.
At first I took traditional python debugger 'pdb' to analyze how it
behaves during debugging of zipped module.
'pdb' showed me some backtaces and filename part for stack entries
looks malformed.
I expected something like
'full-path-to-zip-dir/my_zipped_module.zip/subdir/test_module.py'
but realy it looks like 'full-path-to-current-dir/subdir/test_module.py'
Source code in pdb.py and bdb.py (which one are a part of python
stdlib) gave me the answer why it happens.
The root cause are inside Bdb.format_stack_entry() + Bdb.canonic()
Please take a look at the following line inside 'format_stack_entry' method:
filename = self.canonic(frame.f_code.co_filename)
For zipped module variable 'frame.f_code.co_filename' holds _relative_
file path started from the root of zip archive like
'subdir/test_module.py'
And as result Bdb.canonic() method gives what we have -
'full-path-to-current-dir/subdir/test_module.py'
---
So my final question.
Could anyone confirm that it is a bug in python core subsystem which
one is responsible for loading zipped modules,
or something is wrong with my zipped module ?
If it is a bug could you please submit it into official python bugtracker ?
I never did it before and afraid to do something wrong.
Thanks,
Vitaly Murashev
Hi,
this has been subject to a couple of threads on python-dev already, for
example:
http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986http://thread.gmane.org/gmane.comp.python.devel/141037/focus=141046
It originally came out of issues 13429 and 16392.
http://bugs.python.org/issue13429http://bugs.python.org/issue16392
Here's an initial attempt at a PEP for it. It is based on the (unfinished)
ModuleSpec PEP, which is being discussed on the import-sig mailing list.
http://mail.python.org/pipermail/import-sig/2013-August/000688.html
Stefan
PEP: 4XX
Title: Redesigning extension modules
Version: $Revision$
Last-Modified: $Date$
Author: Stefan Behnel <stefan_ml at behnel.de>
BDFL-Delegate: ???
Discussions-To: ???
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.4
Post-History: 23-Aug-2013
Resolution:
Abstract
========
This PEP proposes a redesign of the way in which extension modules interact
with the interpreter runtime. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave.
An implication of this PEP is that extension modules can use arbitrary
types for their module implementation and are no longer restricted to
types.ModuleType. This makes it easy to support properties at the module
level and to safely store arbitrary global state in the module that is
covered by normal garbage collection and supports reloading and
sub-interpreters.
Motivation
==========
Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed. For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialisation. This means that it knows neither the __file__ it is being
loaded from nor its package (i.e. its fully qualified module name, FQMN).
This hinders relative imports and resource loading. In Py3, it's also not
being added to sys.modules, which means that a (potentially transitive)
re-import of the module will really try to reimport it and thus run into an
infinite loop when it executes the module init function again. And without
the FQMN, it is not trivial to correctly add the module to sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of a FQMN and correct
file path hinders the compilation of __init__.py modules, i.e. packages,
especially when relative imports are being used at module init time.
Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or reloading and it is neither
easy nor efficient with the current infrastructure to support these
features. This PEP also addresses these issues.
The current process
===================
Currently, extension modules export an initialisation function named
"PyInit_modulename", named after the file name of the shared library. This
function is executed by the import machinery and must return either NULL in
the case of an exception, or a fully initialised module object. The
function receives no arguments, so it has no way of knowing about its
import context.
During its execution, the module init function creates a module object
based on a PyModuleDef struct. It then continues to initialise it by adding
attributes to the module dict, creating types, etc.
In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the FQMN of the module object. This is not entirely safe as it relies on
the module init function creating its own module object first, but this
assumption usually holds in practice.
The main problem in this process is the missing support for passing state
into the module init function, and for safely passing state through to the
module creation code.
The proposal
============
The current extension module initialisation will be deprecated in favour of
a new initialisation scheme. Since the current scheme will continue to be
available, existing code will continue to work unchanged, including binary
compatibility.
Extension modules that support the new initialisation scheme must export a
new public symbol "PyModuleCreate_modulename", where "modulename" is the
name of the shared library. This mimics the previous naming convention for
the "PyInit_modulename" function.
This symbol must resolve to a C function with the following signature::
PyObject* (*PyModuleTypeCreateFunction)(PyObject* module_spec)
The "module_spec" argument receives a "ModuleSpec" instance, as defined in
PEP 4XX (FIXME). (All names are obviously up for debate and bike-shedding
at this point.)
When called, this function must create and return a type object, either a
Python class or an extension type that is allocated on the heap. This type
will be instantiated as module instance by the importer.
There is no requirement for this type to be exactly or a subtype of
types.ModuleType. Any type can be returned. This follows the current
support for allowing arbitrary objects in sys.modules and makes it easier
for extension modules to define a type that exactly matches their needs for
holding module state.
The constructor of this type must have the following signature::
def __init__(self, module_spec):
The "module_spec" argument receives the same object as the one passed into
the module type creation function.
Implementation
==============
XXX - not started
Reloading and Sub-Interpreters
==============================
To "reload" an extension module, the module create function is executed
again and returns a new module type. This type is then instantiated as by
the original module loader and replaces the previous entry in sys.modules.
Once the last references to the previous module and its type are gone, both
will be subject to normal garbage collection.
Sub-interpreter support is an inherent property of the design. During
import in the sub-interpreter, the module create function is executed
and returns a new module type that is local to the sub-interpreter. Both
the type and its module instance are subject to garbage collection in the
sub-interpreter.
Open questions
==============
It is not immediately obvious how extensions should be handled that want to
register more than one module in their module init function, e.g. compiled
packages. One possibility would be to leave the setup to the user, who
would have to know all FQMNs anyway in this case (or could construct them
from the module spec of the current module), although not the import
file path. A C-API could be provided to register new module types in the
current interpreter, given a user provided ModuleSpec.
There is no inherent requirement for the module creation function to
actually return a type. It could return a arbitrary callable that creates a
'modulish' object when called. Should there be a type check in place that
makes sure that what it returns is a type? I don't currently see a need for
this.
Copyright
=========
This document has been placed in the public domain.