[Python-Dev] __file__

Brett Cannon brett at python.org
Sun Feb 28 21:21:47 CET 2010


On Sun, Feb 28, 2010 at 05:07, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Michael Foord wrote:
> >> Can't it look for a .py file in the source directory first (1st stat)?
> >> When it's there check for the .pyc in the cache directory (2nd stat,
> >> magic number encoded in filename), if it's not check for .pyc in the
> >> source directory (2nd stat + read for magic number check).  Or am I
> >> missing a subtlety?
> >
> > The problem is doing this little dance for every path on sys.path.
>
> To unpack this a little bit for those not quite as familiar with the
> import system (and to make it clear for my own benefit!): for a
> top-level module/package, each path on sys.path needs to be eliminated
> as a possible location before the interpreter can move on to check the
> next path in the list.
>
> So the important number is the number of stat calls on a "miss" (i.e.
> when the requested module/package is not present in a directory).
> Currently, with builtin support for bytecode only files, there are 3
> checks (package directory, py source file, pyc/pyo bytecode file) to be
> made for each path entry.
>

Actually it's four: name/__init__.py, name/__init__.pyc, name.py, and then
name.pyc. And just so people have terminology to go with all of this, this
search is what the finder does to say whether it can or cannot handle the
requested module.


>
> The PEP proposes to reduce that to only two in the case of a miss, by
> checking for the cached pyc only if the source file is present (there
> would still be three checks for a "hit", but that only happens at most
> once per module lookup).
>

Just to be explicit, Nick is talking about name/__init__.py and name.py
(note the skipping of looking for any .pyc files). At that point only the
loader needs to check for the bytecode in the __pycache__ directory.


>
> While the PEP is right in saying that a bytecode-only import hook could
> be added, I believe it would actually be a little tricky to write one
> that didn't severely degrade the performance of either normal imports or
> bytecode-only imports. Keeping it in the core import, but turning it off
> by default seems much less likely to have unintended performance
> consequences when it is switched back on.
>

It all depends on how it is implemented. If the bytecode-only importer stats
a directory to check for the existence of any source in order to decide not
to handle it, that is an extra stat call, but that is only once per
sys.path/__path__ location by the path hook, not every attempted import.

Now if I ever manage to find the time to break up the default importers and
expose them then it should be no more then adding the bytecode-only importer
to the chained finder that already exists (it essentially chains source and
extension modules).


>
> Another option is to remove bytecode-only support from the default
> filesystem importer, but keep it for zipimport (since the stat call
> savings don't apply in the latter case).
>

That's a very nice option. That would isolate it into a single importer that
doesn't impact general performance for everyone else.

-Brett




>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100228/7da9f804/attachment.html>


More information about the Python-Dev mailing list