New subject: requirements for moving __import__ over to importlib?

Feb. 7, 2012

      I'm going to start this off with the caveat that
hg.python.org/sandbox/bcannon#bootstrap_importlib is not completely at
feature parity, but getting there shouldn't be hard. There is a FAILING
file that has a list of the tests that are not passing because importlib
bootstrapping and a comment as to why (I think) they are failing. But no
switch would ever happen until the test suite passes.

Anyway, to start this conversation I'm going to open with why I think
removing most of the C code in Python/import.c and replacing it with
importlib/_bootstrap.py is a positive thing.

One is maintainability. Antoine mentioned how if change occurs everyone is
going to have to be able to fix code  in importlib, and that's the point! I
don't know about the rest of you but I find Python code easier to work with
than C code (and if you don't you might be subscribed to the wrong mailing
list =). I would assume the ability to make changes or to fix bugs will be
a lot easier with importlib than import.c. So maintainability should be
easier when it comes to imports.

Two is APIs. PEP 302 introduced this idea of an API for objects that can
perform imports so that people can control it, enhance it, introspect it,
etc. But as it stands right now, import.c implements none of PEP 302 for
any built-in import mechanism. This mostly stems from positive thing #1 I
just mentioned. but since I was able to do this code from scratch I was
able to design for (and extend) PEP 302 compliance in order to make sure
the entire import system was exposed cleanly. This means it is much easier
now to write a custom importer for quirky syntax, a different storage
mechanism, etc.

Third is multi-VM support. IronPython, Jython, and PyPy have all said they
would love importlib to become the default import implementation so that
all VMs have the same implementation. Some people have even said they will
use importlib regardless of what CPython does simply to ease their coding
burden, but obviously that still leads to the possibility of subtle
semantic differences that would go away if all VMs used the same
implementation. So switching would lead to one less possible semantic
difference between the various VMs.

So, that is the positives. What are the negatives? Performance, of course.

Now I'm going to be upfront and say I really did not want to have this
performance conversation now as I have done *NO* profiling or analysis of
the algorithms used in importlib in order to tune performance (e.g. the
function that handles case-sensitivity, which is on the critical path for
importing source code, has a platform check which could go away if I
instead had platform-specific versions of the function that were assigned
to a global variable at startup). I also know that people have a bad habit
of latching on to micro-benchmark numbers, especially for something like
import which involves startup or can easily be measured. I mean I wrote
importlib.test.benchmark to help measure performance changes in any
algorithmic changes I might make, but it isn't a real-world benchmark like
what Unladen Swallow gave us (e.g. the two start-up benchmarks that use
real-world apps -- hg and bzr -- aren't available on Python 3 so only
normal_startup and nosite_startup can be used ATM).

IOW I really do not look forward to someone saying "importlib is so much
slower at importing a module containing ``pass``" when (a) that never
happens, and (b) most programs do not spend their time importing but
instead doing interesting work.

For instance, right now importlib does ``python -c "import decimal"``
(which, BTW, is the largest module in the stdlib) 25% slower on my machine
with a pydebug build (a non-debug build would probably be in my favor as I
have more Python objects being used in importlib and thus more sanity
checks). But if you do something (very) slightly more interesting like
``python -m calendar`` where is a slight amount of work then importlib is
currently only 16% slower. So it all depends on how we measure (as usual).

So, if there is going to be some baseline performance target I need to hit
to make people happy I would prefer to know what that (real-world)
benchmark is and what the performance target is going to be on a non-debug
build. And if people are not worried about the performance then I'm happy
with that as well. =)

requirements for moving __import__ over to importlib?

tags

participants (11)

requirements for moving import over to importlib?