[Python-Dev] Rewrite of import in Python source (sans docs) is complete

Sat Jan 6 01:34:18 CET 2007

Finally, after a few months worth of work, I have finally gotten far enough
in my import rewrite that I am willing to stick my neck out and say it is
semantically complete!  You can find it in the sandbox under import_in_py.

So, details of this implementation.  I implemented PEP 302 importers/loaders
for built-in, frozen, extension, .py, and .pyc files along with rewriting
the steps __import__ goes through to do an import.  I also developed an API
for .py/.pyc file handling so that there is a generic filesystem
importer/loader and a separate handler for .py/.pyc files.  This should
allow for (relatively) easy selective overriding of just how .py/.pyc files
are stored (e.g., introducing a database backend) or how variants on
.py/.pyc files are handled (e.g., Quixote's .ptl format).

This code has extensive tests and so I am fairly confident that it does what
is expected of an import rewrite.  There are actually more lines in the test
file than the implementation.  There is also a mock implementation used for
testing.  Was interesting doing this in such a test-driven, XP style of only
coding what I needed.

I have run this code through the entire regression test suite and that is
where you find out subtle differences between this implementation and the
built-in import (you can see for yourself with the regrtest.sh shell
script).  First test_pkg will fail because currently the new import adds a
__loader__ attribute on all modules (that might change for security reasons)
and test_pkg is an old, stdout comparing test.  Second, test_runpy fails
because I have not implemented get_code on the filesystem loader which is
required by runpy.  Both are shallow issues that can be dealt with.

Third, and the hardest difference to deal with, is that you will get some
warnings that print out that you normally don't see.  This is because
warnings.warn and its stack_level argument don't have the effect people are
used to when importing a deprecated module.  Before you could set
stack_level to 2 and it would look like it came from the import statement.
But now, with import written in Python and thus on the call stack compared
to being in C and thus not showing up, two levels back is still in the
import code.  I really don't know how this should be dealt with short of
saying that the rule of thumb with 2 stack levels back for a warning does
not work when done at the import level.

It is not blazing fast at the moment.  Some things, like the built-in and
frozen importers/loaders could be rewritten in C without huge issue.  I am
also sure I have made some stupid design decisions at various points in the
code.  But there is benchmarking code in the sandbox called importbench and
it showed up a  10x speed slowdown on a Linux box I was using in mid to late
December when doing a fresh import of certain types (I don't remember
exactly which kind off the top of my head).

Because of this current slowness I don't know if people want to rush into
trying to make this the default import implementation quite yet or if this
is not too big of a thing since the common case of pulling out of
sys.modules is not that much slower.  I know I am currently not planning on
devoting the time to bootstrap it in as I have my security work to finish
first along with other Python stuff that seems more pressing to me.  And
since (I think) I don't need to bootstrap it in order to finish my security
work I can't justify spending work time on it.  But I can rearrange
priorities if people really want to pursue this (especially if I can get
some help with it).

As with the module's name, it is currently named 'importer', but that is bad
since it conflicts with the idea of importers from PEP 302.  I was thinking
importlib, but I wanted to wait and see what other people thought.

Don't know if you guys are okay with me checking this in without having it
vetted by the community first like we prefer all new modules to do.  I have
not done the LaTeX docs yet.

I think that is all of the details that I can think of.  I am still working
towards implementing the security needed so that an application that embeds
Python can execute arbitrary code securely.  Giving a talk at PyCon on the
topic for anyone interested.

Special thanks needs to go to Paul Moore who I talked to through most of the
design of the code.  Nick Coghlan also provided some handy feedback.  And
Neal Norwitz for bugging about wanting something like this done.  Plus
thanks to everyone who has shown support.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20070105/c4f7bebe/attachment.htm