[Python-Dev] Proto-PEP regarding writing bytecode files

Skip Montanaro skip@pobox.com
Wed, 22 Jan 2003 14:26:15 -0600


Folks,

Here's a first stab at a PEP about controlling generation of bytecode
files.  Feedback appreciated.

Skip

----------------------------------------------------------------------------

PEP: NNN
Title: Controlling generation of bytecode files
Version: $Revision: $
Last-Modified: $Date: $
Author: Skip Montanaro
Status: Active
Type: Draft
Content-Type: text/x-rst
Created: 22-Jan-2003
Post-History: 


Abstract
========

This PEP outlines a mechanism for controlling the generation and
location of compiled Python bytecode files.  This idea originally
arose as a patch request [1]_ and evolved into a discussion thread on
the python-dev mailing list [2]_.  The introduction of an environment
variable will allow people installing Python or Python-based
third-party packages to control whether or not bytecode files
should be generated, and if so, where they should be written.


Proposal
========

Add a new environment variable, PYCROOT, to the mix of environment
variables which Python understands.  Its interpretation is:

- If not present or present but with an empty string value, Python
  bytecode is generated in exactly the same way as is currently done.

- If present and it refers to an existing directory, bytecode
  files are written into a directory structure rooted at that
  location.

- If present and it does not refer to an existing directory,
  generation of bytecode files is suppressed altogether.

sys.path is not modified.

If PYCROOT is set and valid, during module lookup, the bytecode file
will be looked for first in the same directory as the source file,
then in the directory formed by prefixing the source file's directory
with the PYCROOT directory, e.g., in a Unix environment:

    os.path.join(os.environ["PYCROOT"], os.path.split(sourcefile)[0])

(Under Windows the above operation, while conceptually similar, will
almost certainly differ in detail.)


Rationale
=========

In many environments it is not possible for non-root users to write
into the directory containing the source file.  Most of the time, this
is not a problem except for reduced performance.  In some cases it can
be an annoyance, if nothing else. [3]_ In other situations where
bytecode files are writable, it can be a source of file corruption if
multiple processes attempt to write the same bytecode file at the same
time. [4]_

In environments with ramdisks available, it may be desirable from a
performance standpoint to write bytecode files to a directory on such
a disk.


Alternatives
============

The only other alternative proposed so far [1]_ seems to be to add a
-R flag to the interpreter to disable writing bytecode files
altogether.  This proposal subsumes that.


Issues
======

- When looking for a bytecode file should the directory holding the
  source file be considered as well, or just the location implied by
  PYCROOT?  If so, which should be searched first?  It seems to me
  that if a module lives in /usr/local/lib/python2.3/mod.py and was
  installed by root without PYCROOT set, you'd want to use the
  bytecode file there if it was up-to-date without ever considering
  os.environ["PYCROOT"] + "/usr/local/lib/python2.3/".  Only if you
  need to write out a bytecode file would anything turn up there.

- Operation on multi-root file systems (e.g., Windows).  On Windows
  each drive is fairly independent.  If PYCROOT is set to C:\TEMP and
  a module is located in D:\PYTHON22\mod.py, where should the bytecode
  file be written?  I think a scheme similar to what Cygwin uses
  (treat drive letters more-or-less as directory names) would work in
  practice, but I have no direct experience to draw on.  The above
  might cause C:\TEMP\D\PYTHON22\mod.pyc to be written.

  What if PYCROOT doesn't include a drive letter?  Perhaps the current
  drive at startup should be assumed.

- Interpretation of a module's __file__ attribute.  I believe the
  __file__ attribute of a module should reflect the true location of
  the bytecode file.  If people want to locate a module's source code,
  they should use imp.find_module(module).

- Security - What if root has PYCROOT set?  Yes, this can present a
  security risk, but so can many things the root user does.  The root
  user should probably not set PYCROOT except during installation.
  Still, perhaps this problem can be minimized.  When running as root
  the interpreter should check to see if PYCROOT refers to a
  world-writable directory.  If so, it could raise an exception or
  warning and reset PYCROOT to the empty string.  Or, see the next
  item.

- More security - What if PYCROOT refers to a general directory (say,
  /tmp)?  In this case, perhaps loading of a preexisting bytecode file
  should occur only if the file is owned by the current user or root.
  (Does this matter on Windows?)

- Runtime control - should there be a variable in sys (say,
  sys.pycroot) which takes on the value of PYCROOT (or an empty string
  or None) and which is modifiable on-the-fly?  Should sys.pycroot be
  initialized from PYCROOT and then PYCROOT ignored (that is, what if
  they differ)?

- Should there be a command-line flag for the interpreter instead of
  or in addition to an environment variable?  This seems like it would
  be less flexible.  During Python installation, the user frequently
  doesn't have ready access to the interpreter command line.  Using an
  environment variable makes it easier to control behavior.

- Should PYCROOT be interpreted differently during installation than
  at runtime?  I have no idea.  (Maybe it's just a stupid thought, but
  the thought occurred to me, so I thought I'd mention it.)



Examples
========

In all the examples which follow, the urllib module is used as an
example.  Unless otherwise indicated, it lives in
/usr/local/lib/python2.3/urllib.py and /usr/local/lib/python2.3 is not
writable by the current, non-root user.

- PYCROOT is set to /tmp.  /usr/local/lib/python2.3/urllib.pyc exists,
  but is out-of-date.  When urllib is imported, the generated bytecode
  file is written to /tmp/usr/local/lib/python2.3/urllib.pyc.
  Intermediate directories will be created as needed.

- PYCROOT is not set. No urllib.pyc file is found.  When urllib is
  imported, no bytecode file is written.

- PYCROOT is set to /tmp.  No urllib.pyc file is found.  When urllib
  is imported, the generated bytecode file is written to
  /tmp/usr/local/lib/python2.3/urllib.pyc, again, creating
  intermediate directories as needed.



References
==========

.. [1] patch 602345, Option for not writing py.[co] files, Klose
   (http://www.python.org/sf/602345)

.. [2] python-dev thread, Disable writing .py[co], Norwitz
   (http://mail.python.org/pipermail/python-dev/2003-January/032270.html)

.. [3] Debian bug report, Mailman is writing to /usr in cron, Wegner
   (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)

.. [4] python-dev thread, Parallel pyc construction, Dubois
   (http://mail.python.org/pipermail/python-dev/2003-January/032060.html)


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End: