[Python-Dev] Rewrite of import in Python source (sans docs) is complete

Calvin Spealman ironfroggy at gmail.com
Sun Jan 14 22:23:09 CET 2007


I am really looking into get into hacking on CPython and I'm keenly
interested in your security work (my top reason for hoping i can make
PyCon. keeping fingers crossed!), so if you need help with this to
focus on other things, I'd be delighted to try my hand at the task. Do
you have some docs up anywhere of what directionyou hope this to go in
from here?

On 1/5/07, Brett Cannon <brett at python.org> wrote:
> Finally, after a few months worth of work, I have finally gotten far enough
> in my import rewrite that I am willing to stick my neck out and say it is
> semantically complete!  You can find it in the sandbox under import_in_py.
>
> So, details of this implementation.  I implemented PEP 302 importers/loaders
> for built-in, frozen, extension, .py, and .pyc files along with rewriting
> the steps __import__ goes through to do an import.  I also developed an API
> for .py/.pyc file handling so that there is a generic filesystem
> importer/loader and a separate handler for .py/.pyc files.  This should
> allow for (relatively) easy selective overriding of just how .py/.pyc files
> are stored ( e.g., introducing a database backend) or how variants on
> .py/.pyc files are handled (e.g., Quixote's .ptl format).
>
> This code has extensive tests and so I am fairly confident that it does what
> is expected of an import rewrite.  There are actually more lines in the test
> file than the implementation.  There is also a mock implementation used for
> testing.  Was interesting doing this in such a test-driven, XP style of only
> coding what I needed.
>
> I have run this code through the entire regression test suite and that is
> where you find out subtle differences between this implementation and the
> built-in import (you can see for yourself with the regrtest.sh shell
> script).  First test_pkg will fail because currently the new import adds a
> __loader__ attribute on all modules (that might change for security reasons)
> and test_pkg is an old, stdout comparing test.  Second, test_runpy fails
> because I have not implemented get_code on the filesystem loader which is
> required by runpy.  Both are shallow issues that can be dealt with.
>
> Third, and the hardest difference to deal with, is that you will get some
> warnings that print out that you normally don't see.  This is because
> warnings.warn and its stack_level argument don't have the effect people are
> used to when importing a deprecated module.  Before you could set
> stack_level to 2 and it would look like it came from the import statement.
> But now, with import written in Python and thus on the call stack compared
> to being in C and thus not showing up, two levels back is still in the
> import code.  I really don't know how this should be dealt with short of
> saying that the rule of thumb with 2 stack levels back for a warning does
> not work when done at the import level.
>
> It is not blazing fast at the moment.  Some things, like the built-in and
> frozen importers/loaders could be rewritten in C without huge issue.  I am
> also sure I have made some stupid design decisions at various points in the
> code.  But there is benchmarking code in the sandbox called importbench and
> it showed up a  10x speed slowdown on a Linux box I was using in mid to late
> December when doing a fresh import of certain types (I don't remember
> exactly which kind off the top of my head).
>
> Because of this current slowness I don't know if people want to rush into
> trying to make this the default import implementation quite yet or if this
> is not too big of a thing since the common case of pulling out of
> sys.modules is not that much slower.  I know I am currently not planning on
> devoting the time to bootstrap it in as I have my security work to finish
> first along with other Python stuff that seems more pressing to me.  And
> since (I think) I don't need to bootstrap it in order to finish my security
> work I can't justify spending work time on it.  But I can rearrange
> priorities if people really want to pursue this (especially if I can get
> some help with it).
>
> As with the module's name, it is currently named 'importer', but that is bad
> since it conflicts with the idea of importers from PEP 302.  I was thinking
> importlib, but I wanted to wait and see what other people thought.
>
> Don't know if you guys are okay with me checking this in without having it
> vetted by the community first like we prefer all new modules to do.  I have
> not done the LaTeX docs yet.
>
> I think that is all of the details that I can think of.  I am still working
> towards implementing the security needed so that an application that embeds
> Python can execute arbitrary code securely.  Giving a talk at PyCon on the
> topic for anyone interested.
>
>  Special thanks needs to go to Paul Moore who I talked to through most of
> the design of the code.  Nick Coghlan also provided some handy feedback.
> And Neal Norwitz for bugging about wanting something like this done.  Plus
> thanks to everyone who has shown support.
>
> -Brett
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com
>
>
>


-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://ironfroggy-code.blogspot.com/


More information about the Python-Dev mailing list