[Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP)
Guido van Rossum
guido at python.org
Thu Feb 16 16:07:41 CET 2006
On 2/16/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > Do you have unit tests for everything? I believe I fixed a bug in the
> > code that reads a bytecode file (it wasn't skipping the timestamp).
[Hey, I thought I sent that just to you. Is python-dev really
interested in this?]
> I haven't worked the filesystem based tests into the unit tests yet, and even
> the manual tests I was using managed to leave out compiled bytecode files (as
> you noticed). I'll fix that.
>
> Given I do my testing on Linux, I probably still would have forgotten the 'rb'
> mode definitions on the relevant calls to open() though. . .
But running the unit tests on Windows would have revealed the problem.
> > +++ pep-0338.txt (working copy)
> > - The optional argument ``init_globals`` may be used to pre-populate
> > + The optional argument ``init_globals`` may be a dictionary used to pre-populate
> > the globals dictionary before the code is executed. The supplied
> > dictionary will not be modified.
>
> I just realised that anything that's a legal argument to "dict.update" will
> work. I'll fix the function description in the PEP (and the docs patch as well).
I'm not sure that's a good idea -- you'll never be able to switch to a
different implementation then.
> > --- runpy.py Wed Feb 15 15:56:07 2006
> > def get_data(self, pathname):
> > ! # XXX Unfortunately PEP 302 assumes text data :-(
> > ! return open(pathname).read()
>
> Hmm.
>
> The PEP itself requests that a string be returned from get_data(), but doesn't
> require that the file be opened in text mode. Perhaps the PEP 302 emulation
> should use binary mode here? Otherwise there could be strange data corruption
> bugs on Windows.
But PEP 302 shows as its only example reading from a file with a .txt
extension. Adding spurious \r characters is also data corruption. We
should probably post to python-dev a request for clarification of PEP
302, but in the mean time I vote for text mode.
> > --- 337,349 ----
> >
> > # This helper is needed as both the PEP 302 emulation and the
> > # main file execution functions want to read compiled files
> > + # XXX marshal can also raise EOFError; perhaps that should be
> > + # turned into ValueError? Some callers expect ValueError.
> > def _read_compiled_file(compiled_file):
> > magic = compiled_file.read(4)
> > if magic != imp.get_magic():
> > raise ValueError("File not compiled for this Python version")
> > + compiled_file.read(4) # Throw away timestamp
> > return marshal.load(compiled_file)
>
> I'm happy to convert EOFError to ValueError here if you'd prefer (using the
> string representation of the EOFError as the message in the ValueError).
>
> Or did you mean changing the behaviour in marshal itself?
No -- the alternative is to catch EOFError in _read_compiled_file()'s
caller, but that seems worse. You should check marshal.c if it can
raise any *other* errors (perhaps OverflowError?).
Also, *perhaps* it makes more sense to return None instead of raising
ValueError? Since you're always catching it? (Or are you?)
> > --- 392,407 ----
> > loader = _get_loader(mod_name)
> > if loader is None:
> > raise ImportError("No module named " + mod_name)
> > + # XXX get_code() is an *optional* loader feature. Is that okay?
> > code = loader.get_code(mod_name)
>
> If the loader doesn't provide access to the code object or the source code,
> then runpy can't really do anything useful with that module (e.g. if its a C
> builtin module). Given that PEP 302 states that if you provide get_source()
> you should also provide get_code(), this check should be sufficient to let
> runpy.run_module get to everything it can handle.
OK. But a loader could return None from get_code() -- do you check for
that? (I don't have the source handy here.)
> A case could be made for converting the attribute error to an ImportError, I
> guess. . .
I'm generally not keen on that; leave it.
> > filename = _get_filename(loader, mod_name)
> > if run_name is None:
> > run_name = mod_name
> > + # else:
> > + # XXX Should we also set sys.modules[run_name] = sys.modules[mod_name]?
> > + # I know of code that does "import __main__". It should probably
> > + # get the substitute __main__ rather than the original __main__,
> > + # if run_name != mod_name
> > return run_module_code(code, init_globals, run_name,
> > filename, loader, as_script)
>
> Hmm, good point. How about a different solution, though: in run_module_code, I
> could create a new module object and put it temporarily in sys.modules, and
> then remove it when done (restoring the original module if necessary).
That might work too.
What happens when you execute "foo.py" as __main__ and then (perhaps
indirectly) something does "import foo"? Does a second copy of foo.py
get loaded by the regular loader?
> That would mean any module with code that looks up "sys.modules[__name__]"
> would still work when run via runpy.run_module or runpy.run_file.
Yeah, except if they do that, they're likely to also *assign* to that.
Well, maybe that would just work, too...
> I also realised that sys.argv[0] should be restored to its original value, too.
Yup.
> I'd then change the "as_script" flag to "alter_sys" and have it affect both of
> the above operations (and grab the import lock to avoid other import or
> run_module_code operations seeing the altered version of sys.modules).
Makes sense.
I do wonder if runpy.py isn't getting a bit over-engineered -- it
seems a lot of the functionality isn't actually necessary to implement
-m foo.bar, and the usefulness in other circumstances is as yet
unproven. What do you think of taking a dose of YAGNI here?
(Especially since I notice that most of the public APIs are very thin
layers over exec or execfile -- people can just use those directly.)
> > --- 439,457 ----
> >
> > Returns the resulting top level namespace dictionary
> > First tries to run as a compiled file, then as a source file
> > + XXX That's not the same algorithm as used by regular import;
> > + if the timestamp in the compiled file is not equal to the
> > + source file's mtime, the compiled file is ignored
> > + (unless there is no source file -- then the timestamp
> > + is ignored)
>
> They're doing different things though - the import process uses that algorithm
> to decide which filename to use (.pyo, .pyc or .py). This code in run_file is
> trying to decide whether the supplied filename points to a compiled file or a
> source file without a tight coupling to the specific file extension used (e.g.
> so it works for Unix Python scripts that rely on the shebang line to identify
> which interpreter to use to run them).
>
> I'll add a comment to that effect.
Ah, good point. So you never go from foo.py to foo.pyc, right?
> Another problem that occurred to me is that the module isn't thread safe at
> the moment. The PEP 302 emulation isn't protected by the import lock, and the
> changes to sys.argv in run_module_code will be visible across threads (and may
> clobber each other or the original if multiple threads invoke the function).
Another reason to consider cutting it down to only what's needed by
-m; -m doesn't need thread-safety (I think).
> On that front, I'll change _get_path_loader to acquire and release the import
> lock, and the same for run_module_code when "alter_sys" is set to True.
OK, just be very, very careful. The import lock is not a regular mutex
and if you don't release it you're stuck forever. Just use
try/finally...
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list