[Python-Dev] Proto-PEP regarding writing bytecode files

Thu, 23 Jan 2003 20:41:34 +0000

Skip Montanaro <skip@pobox.com> writes:

>     Paul> This is of particular interest to me, as it will potentially
>     Paul> affect how people write import hooks. For example, I can't
>     Paul> immediately see how this proposal will interact with the new
>     Paul> zipimport module. 
>
> I thought the concensus was that zip files shouldn't normally contain
> bytecode anyway.  If so, then perhaps it could always be treated as a
> read-only directory.  (Nothing about zip files is in the pep at present.
> Feel free to suggest text.  I'm largely unfamiliar with them and haven't
> done much to keep abreast of the saga of the import hook.)

Zip files can and usually should contain bytecode. If bytecode is
there, it will be read as normal.

> In the absence of this proposal, what happens today if a source file is
> located in a zip file and there is no bytecode with it?  Does a bytecode
> file get written?  If so, is it actually inserted into the zip file?

There is no writing. Zipfiles are treated as read-only.

>     Paul> In particular, the PEP needs to say something about what, if any,
>     Paul> requirements it places on the writers of import hooks ("If an
>     Paul> import hook attempts to cache compiled bytecode, in a similar way
>     Paul> to the builtin filesystem support, then it needs to check the
>     Paul> environment variable, and...")
>
> I'll save your message, but I'd like to defer this issue for the time being.

That's entirely reasonable, but it needs to be looked at at some
point. Can I suggest an "Open issues" section for now?

> I don't understand.  Are you saying imp.find_module()'s results are
> undefined or that it simply doesn't exist?

PEP 302 says that imp.find_module() will be documented as acting as if
hooks did not exist at all. Specifically, the wording becomes "they
expose the basic *unhooked* built-in import mechanism". Read the PEP
for details, but basically, the imp module API isn't really correct in
the presence of hooks, so the PEP proposes a new hook-aware API, and
redefines the existing API as ignoring hooks.

It's not ideal, but the idea was that it's better than deprecating the
existing API, just to cater for the (rare) case where hooks are
involved.

> I still believe the module's __file__ attribute should reflect where the
> bytecode was found.

I don't disagree, just pointing out that for a hook, it's the hook's
responsibility to do this.

>     Paul> So you need to expand on this: "If people want to locate a
>     Paul> module's source code, they will not be able to except by using
>     Paul> imp.find_module(module), which does not take account of import
>     Paul> hooks.  Whether this is good enough will depend upon the
>     Paul> application".
>
> Feel free. ;-) I'll get to it eventually, but if you can come up with
> something plausible I'll gladly use it.

I'll see what I can do.

In all honesty, though, the whole issue of locating a module's source
code is an open one. There's a new module in 2.3a1 (can't recall the
name) which is designed to help with the problem. Guido wrote it when
the problem became evident with zip imports. It should probably be
updated to cope with this change, and the PEP should refer to it.

The usual reason for looking for the module source is so that people
can find data files stored relative to the module. I don't do this, so
I'm not sure what the requirements might be - relative to the bytecode
or the source?

>     Paul> [But in any case, PEP 302 points out about __file__ that "This
>     Paul> must be a string, but it may be a dummy value, for example
>     Paul> "<frozen>"" so anyone using __file__ already needs to be aware of
>     Paul> border cases involving non-filesystem modules]
>
> I think this is wrong.  If you can provide a valid way for the user to
> locate the bytecode you are obligated to do so.  The only situation where I
> think a dummy value is appropriate is if the source code came from a string.

Again I don't disagree, but it depends on what you mean by
"locate". If the code gets loaded from a SQL database, a value of
__file__ which goes database_name/table_name/id_code may be an
entirely reasonable "location", and indeed the only reasonable thing
to put in __file__, but you wouldn't get very far doing
os.listdir(os.path.dirname(__file__)) on it!

None of this is news for import hooks, though. The Python language
definition makes no guarantees that modules exist in filesystems, or
that things like sys.path, module.__file__, or whatever, are
filenames. But it's a common assumption which does no harm in 99% of
cases (and in the other 1%, you're probably not going to do better
than silently ignoring the issue)

> I'm not sure I see a lot of conflict, but I'm more likely to be one of those
> people who ignores import hooks.  I haven't used one yet, so it's unlikely
> I'll start tomorrow. ;-)

Oh, go on. Feel the seduction of the dark side :-)

Paul.
-- 
This signature intentionally left blank