[Python-Dev] Proto-PEP regarding writing bytecode files
Skip Montanaro
skip@pobox.com
Wed, 22 Jan 2003 14:26:15 -0600
Folks,
Here's a first stab at a PEP about controlling generation of bytecode
files. Feedback appreciated.
Skip
----------------------------------------------------------------------------
PEP: NNN
Title: Controlling generation of bytecode files
Version: $Revision: $
Last-Modified: $Date: $
Author: Skip Montanaro
Status: Active
Type: Draft
Content-Type: text/x-rst
Created: 22-Jan-2003
Post-History:
Abstract
========
This PEP outlines a mechanism for controlling the generation and
location of compiled Python bytecode files. This idea originally
arose as a patch request [1]_ and evolved into a discussion thread on
the python-dev mailing list [2]_. The introduction of an environment
variable will allow people installing Python or Python-based
third-party packages to control whether or not bytecode files
should be generated, and if so, where they should be written.
Proposal
========
Add a new environment variable, PYCROOT, to the mix of environment
variables which Python understands. Its interpretation is:
- If not present or present but with an empty string value, Python
bytecode is generated in exactly the same way as is currently done.
- If present and it refers to an existing directory, bytecode
files are written into a directory structure rooted at that
location.
- If present and it does not refer to an existing directory,
generation of bytecode files is suppressed altogether.
sys.path is not modified.
If PYCROOT is set and valid, during module lookup, the bytecode file
will be looked for first in the same directory as the source file,
then in the directory formed by prefixing the source file's directory
with the PYCROOT directory, e.g., in a Unix environment:
os.path.join(os.environ["PYCROOT"], os.path.split(sourcefile)[0])
(Under Windows the above operation, while conceptually similar, will
almost certainly differ in detail.)
Rationale
=========
In many environments it is not possible for non-root users to write
into the directory containing the source file. Most of the time, this
is not a problem except for reduced performance. In some cases it can
be an annoyance, if nothing else. [3]_ In other situations where
bytecode files are writable, it can be a source of file corruption if
multiple processes attempt to write the same bytecode file at the same
time. [4]_
In environments with ramdisks available, it may be desirable from a
performance standpoint to write bytecode files to a directory on such
a disk.
Alternatives
============
The only other alternative proposed so far [1]_ seems to be to add a
-R flag to the interpreter to disable writing bytecode files
altogether. This proposal subsumes that.
Issues
======
- When looking for a bytecode file should the directory holding the
source file be considered as well, or just the location implied by
PYCROOT? If so, which should be searched first? It seems to me
that if a module lives in /usr/local/lib/python2.3/mod.py and was
installed by root without PYCROOT set, you'd want to use the
bytecode file there if it was up-to-date without ever considering
os.environ["PYCROOT"] + "/usr/local/lib/python2.3/". Only if you
need to write out a bytecode file would anything turn up there.
- Operation on multi-root file systems (e.g., Windows). On Windows
each drive is fairly independent. If PYCROOT is set to C:\TEMP and
a module is located in D:\PYTHON22\mod.py, where should the bytecode
file be written? I think a scheme similar to what Cygwin uses
(treat drive letters more-or-less as directory names) would work in
practice, but I have no direct experience to draw on. The above
might cause C:\TEMP\D\PYTHON22\mod.pyc to be written.
What if PYCROOT doesn't include a drive letter? Perhaps the current
drive at startup should be assumed.
- Interpretation of a module's __file__ attribute. I believe the
__file__ attribute of a module should reflect the true location of
the bytecode file. If people want to locate a module's source code,
they should use imp.find_module(module).
- Security - What if root has PYCROOT set? Yes, this can present a
security risk, but so can many things the root user does. The root
user should probably not set PYCROOT except during installation.
Still, perhaps this problem can be minimized. When running as root
the interpreter should check to see if PYCROOT refers to a
world-writable directory. If so, it could raise an exception or
warning and reset PYCROOT to the empty string. Or, see the next
item.
- More security - What if PYCROOT refers to a general directory (say,
/tmp)? In this case, perhaps loading of a preexisting bytecode file
should occur only if the file is owned by the current user or root.
(Does this matter on Windows?)
- Runtime control - should there be a variable in sys (say,
sys.pycroot) which takes on the value of PYCROOT (or an empty string
or None) and which is modifiable on-the-fly? Should sys.pycroot be
initialized from PYCROOT and then PYCROOT ignored (that is, what if
they differ)?
- Should there be a command-line flag for the interpreter instead of
or in addition to an environment variable? This seems like it would
be less flexible. During Python installation, the user frequently
doesn't have ready access to the interpreter command line. Using an
environment variable makes it easier to control behavior.
- Should PYCROOT be interpreted differently during installation than
at runtime? I have no idea. (Maybe it's just a stupid thought, but
the thought occurred to me, so I thought I'd mention it.)
Examples
========
In all the examples which follow, the urllib module is used as an
example. Unless otherwise indicated, it lives in
/usr/local/lib/python2.3/urllib.py and /usr/local/lib/python2.3 is not
writable by the current, non-root user.
- PYCROOT is set to /tmp. /usr/local/lib/python2.3/urllib.pyc exists,
but is out-of-date. When urllib is imported, the generated bytecode
file is written to /tmp/usr/local/lib/python2.3/urllib.pyc.
Intermediate directories will be created as needed.
- PYCROOT is not set. No urllib.pyc file is found. When urllib is
imported, no bytecode file is written.
- PYCROOT is set to /tmp. No urllib.pyc file is found. When urllib
is imported, the generated bytecode file is written to
/tmp/usr/local/lib/python2.3/urllib.pyc, again, creating
intermediate directories as needed.
References
==========
.. [1] patch 602345, Option for not writing py.[co] files, Klose
(http://www.python.org/sf/602345)
.. [2] python-dev thread, Disable writing .py[co], Norwitz
(http://mail.python.org/pipermail/python-dev/2003-January/032270.html)
.. [3] Debian bug report, Mailman is writing to /usr in cron, Wegner
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)
.. [4] python-dev thread, Parallel pyc construction, Dubois
(http://mail.python.org/pipermail/python-dev/2003-January/032060.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End: