[Python-ideas] Draft PEP for virtualenv in the stdlib

Carl Meyer carl at oddbird.net
Mon Oct 24 20:21:07 CEST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Vinay Sajip and I are working on a PEP for making "virtual Python
environments" a la virtualenv [1] a built-in feature of Python 3.3.

This idea was first proposed on python-dev by Ian Bicking in February
2010 [2]. It was revived at PyCon 2011 and has seen discussion on
distutils-sig [3] and more recently again on python-dev [4] [5].

Given all this (mostly positive) prior discussion, we may be at a point
where further discussion should happen on python-dev rather than
python-ideas. But in order to observe the proper PEP 1 process, I'm
posting the draft PEP here first for pre-review and comment before I
send it to the PEP editors and post it on python-dev.

Full text of the draft PEP is pasted below, and also available on
Bitbucket [6].


[1] http://virtualenv.org
[2] http://mail.python.org/pipermail/python-dev/2010-February/097787.html
[3] http://mail.python.org/pipermail/distutils-sig/2011-March/017498.html
[4] http://mail.python.org/pipermail/python-dev/2011-June/111903.html
[5] http://mail.python.org/pipermail/python-dev/2011-October/113883.html
[6] https://bitbucket.org/carljm/pythonv-pep/src/


PEP: XXX
Title: Python Virtual Environments
Version: $Revision$
Last-Modified: $Date$
Author: Carl Meyer <carl at oddbird.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Jun-2011
Python-Version: 3.3
Post-History: 14-Jun-2011


Abstract
========

This PEP proposes to add to Python a mechanism for lightweight
"virtual environments" with their own site directories, optionally
isolated from system site directories.  Each virtual environment has
its own Python binary (allowing creation of environments with various
Python versions) and can have its own independent set of installed
Python packages in its site directories.


Motivation
==========

The utility of Python virtual environments has already been well
established by the popularity of existing third-party
virtual-environment tools, primarily Ian Bicking's `virtualenv`_.
Virtual environments are already widely used for dependency management
and isolation, ease of installing and using Python packages without
system-administrator access, and automated testing of Python software
across multiple Python versions, among other uses.

Existing virtual environment tools suffer from lack of support from
the behavior of Python itself.  Tools such as `rvirtualenv`_, which do
not copy the Python binary into the virtual environment, cannot
provide reliable isolation from system site directories.  Virtualenv,
which does copy the Python binary, is forced to duplicate much of
Python's ``site`` module and manually copy an ever-changing set of
standard-library modules into the virtual environment in order to
perform a delicate boot-strapping dance at every startup. The
``PYTHONHOME`` environment variable, Python's only existing built-in
solution for virtual environments, requires copying the entire
standard library into every environment; not a lightweight solution.

A virtual environment mechanism integrated with Python and drawing on
years of experience with existing third-party tools can be lower
maintenance, more reliable, and more easily available to all Python
users.

.. _virtualenv: http://www.virtualenv.org

.. _rvirtualenv: https://github.com/kvbik/rvirtualenv


Specification
=============

When the Python binary is executed, it attempts to determine its
prefix (which it stores in ``sys.prefix``), which is then used to find
the standard library and other key files, and by the ``site`` module
to determine the location of the site-package directories.  Currently
the prefix is found (assuming ``PYTHONHOME`` is not set) by first
walking up the filesystem tree looking for a marker file (``os.py``)
that signifies the presence of the standard library, and if none is
found, falling back to the build-time prefix hardcoded in the binary.

This PEP proposes to add a new first step to this search.  If an
``env.cfg`` file is found either adjacent to the Python executable, or
one directory above it, this file is scanned for lines of the form
``key = value``. If a ``home`` key is found, this signifies that the
Python binary belongs to a virtual environment, and the value of the
``home`` key is the directory containing the Python executable used to
create this virtual environment.

In this case, prefix-finding continues as normal using the value of
the ``home`` key as the effective Python binary location, which
results in ``sys.prefix`` being set to the system installation prefix,
while ``sys.site_prefix`` is set to the directory containing
``env.cfg``.

(If ``env.cfg`` is not found or does not contain the ``home`` key,
prefix-finding continues normally, and ``sys.site_prefix`` will be
equal to ``sys.prefix``.)

The ``site`` and ``sysconfig`` standard-library modules are modified
such that site-package directories ("purelib" and "platlib", in
``sysconfig`` terms) are found relative to ``sys.site_prefix``, while
other directories (the standard library, include files) are still
found relative to ``sys.prefix``.

Thus, a Python virtual environment in its simplest form would consist
of nothing more than a copy of the Python binary accompanied by an
``env.cfg`` file and a site-packages directory.  Since the ``env.cfg``
file can be located one directory above the executable, a typical
virtual environment layout, mimicking a system install layout, might
be::

    env.cfg
    bin/python3
    lib/python3.3/site-packages/


Isolation from system site-packages
- -----------------------------------

In a virtual environment, the ``site`` module will normally still add
the system site directories to ``sys.path`` after the virtual
environment site directories.  Thus system-installed packages will
still be importable, but a package of the same name installed in the
virtual environment will take precedence.

If the ``env.cfg`` file also contains a key ``include-system-site``
with a value of ``false`` (not case sensitive), the ``site`` module
will omit the system site directories entirely. This allows the
virtual environment to be entirely isolated from system site-packages.


Creating virtual environments
- -----------------------------

This PEP also proposes adding a new ``venv`` module to the standard
library which implements the creation of virtual environments.  This
module would typically be executed using the ``-m`` flag::

    python3 -m venv /path/to/new/virtual/environment

Running this command creates the target directory (creating any parent
directories that don't exist already) and places an ``env.cfg`` file
in it with a ``home`` key pointing to the Python installation the
command was run from.  It also creates a ``bin/`` (or ``Scripts`` on
Windows) subdirectory containing a copy of the ``python3`` executable,
and the ``pysetup3`` script from the ``packaging`` standard library
module (to facilitate easy installation of packages from PyPI into the
new virtualenv).  And it creates an (initially empty)
``lib/pythonX.Y/site-packages`` subdirectory.

If the target directory already exists an error will be raised, unless
the ``--clear`` option was provided, in which case the target
directory will be deleted and virtual environment creation will
proceed as usual.

If ``venv`` is run with the ``--no-site-packages`` option, the key
``include-system-site = false`` is also included in the created
``env.cfg`` file.

Multiple paths can be given to ``venv``, in which case an identical
virtualenv will be created, according to the given options, at each
provided path.


API
- ---

The high-level method described above will make use of a simple API
which provides mechanisms for third-party virtual environment creators
to customize environment creation according to their needs.

The ``venv`` module will contain an ``EnvBuilder`` class which accepts
the following keyword arguments on instantiation::

   * ``nosite`` - A Boolean value indicating that isolation of the
     environment from the system Python is required (defaults to
     ``False``).

   * ``clear`` - A Boolean value which, if True, will delete any
     existing target directory instead of raising an exception
     (defaults to ``False``).

The returned env-builder is an object which is expected to have a
single method, ``create``, which takes as required argument the path
(absolute or relative to the current directory) of the target
directory which is to contain the virtual environment. The ``create``
method will either create the environment in the specified directory,
or raise an appropriate exception.

Creators of third-party virtual environment tools will be free to use
the provided ``EnvBuilder`` class as a base class.

The ``venv`` module will also provide a module-level function as a
convenience::

    def create(env_dir, nosite=False, clear=False):
        builder = EnvBuilder(nosite=nosite, clear=clear)
        builder.create(env_dir)

The ``create`` method of the ``EnvBuilder`` class illustrates the
hooks available for customization:

    def create(self, env_dir):
        """
        Create a virtualized Python environment in a directory.

        :param env_dir: The target directory to create an environment in.

        """
        env_dir = os.path.abspath(env_dir)
        context = self.create_directories(env_dir)
        self.create_configuration(context)
        self.setup_python(context)
        self.setup_packages(context)
        self.setup_scripts(context)

Each of the methods ``create_directories``, ``create_configuration``,
``setup_python``, ``setup_packages`` and ``setup_scripts`` can be
overridden.  The functions of these methods are::

   * ``create_directories`` - creates the environment directory and
     all necessary directories, and returns a context object. This is
     just a holder for attributes (such as paths), for use by the
     other methods.

   * ``create_configuration`` - creates the ``env.cfg`` configuration
     file in the environment.

   * ``setup_python`` - creates a copy of the Python executable (and,
     under Windows, DLLs) in the environment.

   * ``setup_packages`` - A placeholder method which can be overridden
     in third party implementations to pre-install packages in the
     virtual environment.

   * ``setup_scripts`` - A placeholder methd which can be overridden
     in third party implementations to pre-install scripts (such as
     activation and deactivation scripts) in the virtual environment.

The ``DistributeEnvBuilder`` subclass in the reference implementation
illustrates how these last two methods can be used in practice. It's
not envisaged that ``DistributeEnvBuilder`` will be actually added to
Python core, but it makes the reference implementation more
immediately useful for testing and exploratory purposes.

   * The ``setup_packages`` method installs Distribute in the target
     environment. This is needed at the moment in order to actually
     install most packages in an environment, since most packages are
     not yet packaging / setup.cfg based.

   * The ``setup_scripts`` method installs activation and pysetup3
     scripts in the environment. This is also done in a configurable
     way: A ``scripts`` property on the builder is expected to provide
     a buffer which is a base64-encoded zip file. The zip file
     contains directories "common", "linux2", "darwin", "win32", each
     containing scripts destined for the bin directory in the
     environment. The contents of "common" and the directory
     corresponding to ``sys.platform`` are copied after doing some
     text replacement of placeholders:

        * ``__VIRTUAL_ENV__`` is replaced with absolute path of the
          environment directory.

        * ``__VIRTUAL_PROMPT__`` is replaced with the environment
          prompt prefix.

        * ``__BIN_NAME__`` is replaced with the name of the bin
          directory.

        * ``__ENV_PYTHON__`` is replaced with the absolute path of the
          environment's executable.

No doubt the process of PEP review will show up any customization
requirements
which have not yet been considered.


Open Questions
==============

Why not modify sys.prefix?
- --------------------------

Any virtual environment tool along these lines is proposing a split
between two different meanings (among others) that are currently both
wrapped up in ``sys.prefix``: the answers to the questions "Where is
the standard library?" and "Where is the site-packages location where
third-party modules should be installed?"

This split could be handled by introducing a new value for either the
former question or the latter question.  Either option potentially
introduces some backwards-incompatibility with software written to
assume the other meaning for ``sys.prefix``.

Since it was unable to modify `distutils`, `virtualenv`_ has to
re-point ``sys.prefix`` at the virtual environment, which requires
that it also provide a symlink from inside the virtual environment to
the Python header files, and that it copy some portions of the
standard library into the virtual environment.

The `documentation`__ for ``sys.prefix`` describes it as "A string
giving the site-specific directory prefix where the platform
independent Python files are installed," and specifically mentions the
standard library and header files as found under ``sys.prefix``.  It
does not mention ``site-packages``.

__ http://docs.python.org/dev/library/sys.html#sys.prefix

It is more true to this documented definition of ``sys.prefix`` to
leave it pointing to the system installation (which is where the
standard library and header files are found), and introduce a new
value in ``sys`` (``sys.site_prefix``) to point to the prefix for
``site-packages``.

The justification for reversing this choice would be if it can be
demonstrated that the bulk of third-party code referencing
``sys.prefix`` is, in fact, using it to find ``site-packages``, and
not the standard library or header files or anything else.  The most
notable case is probably `setuptools`_ and its fork `distribute`_,
which do use ``sys.prefix`` to build up a list of site directories for
pre-flight checking where ``pth`` files can usefully be placed.  It
would be trivial to modify these tools (currently only `distribute`_
is Python 3 compatible) to check ``sys.site_prefix`` and fall back to
``sys.prefix`` if it doesn't exist. If Distribute is modified in this
way and released before Python 3.3 is released with the ``venv``
module, there would be no likely reason for an older version of
Distribute to ever be installed in a virtual environment.

In terms of other third-party usage, a `Google Code Search`_ turns up
what appears to be a roughly even mix of usage between packages using
``sys.prefix`` to build up a site-packages path and packages using it
to e.g. eliminate the standard-library from code-execution
tracing. Either choice that's made here will require one or the other
of these uses to be updated.

Another argument for reversing this choice and modifying
``sys.prefix`` to point at the virtual environment is that virtualenv
currently does this, and it doesn't appear to have caused major
problems.

.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
.. _distribute: http://packages.python.org/distribute/
.. _Google Code Search:
http://www.google.com/codesearch#search/&q=sys\.prefix&p=1&type=cs


What about include files?
- -------------------------

For example, ZeroMQ installs zmq.h and zmq_utils.h in $VE/include,
whereas SIP (part of PyQt4) installs sip.h by default in
$VE/include/pythonX.Y. With virtualenv, everything works because the
PythonX.Y include is symlinked, so everything that's needed is in
$VE/include. At the moment pythonv doesn't do anything with include
files, besides creating the include directory; this might need to
change, to copy/symlink $VE/include/pythonX.Y. I guess this would go
into ``venv.py``.

As in Python there's no abstraction for a site-specific include
directory, other than for platform-specific stuff, then the user
expectation would seem to be that all include files anyone could ever
want should be found in one of just two locations, with sysconfig
labels "include" & "platinclude".

There's another issue: what if includes are Python-version-specific?
For example, SIP installs by default into $VE/include/pythonX.Y rather
than $VE/include, presumably because there's version-specific stuff in
there - but even if that's not the case with SIP, it could be the case
with some other package. And the problem that gives is that you can't
just symlink the include/pythonX.Y directory, but actually have to
provide a writable directory and symlink/copy the contents from the
system include/pythonX.Y. Of course this is not hard to do, but it
does seem inelegant. OTOH it's really because there's no supporting
concept in Python/sysconfig.


Interface with packaging tools
- ------------------------------

Some work will be needed in packaging tools (Python 3.3 packaging,
Distribute) to support implementation of this PEP. For example:

* How Distribute and packaging use sys.prefix and/or sys.site_prefix.
Clearly,
  in practice we'll need to use Distribute for a while, until packages have
  migrated over to usage of setup.cfg.

* How packaging and Distribute set up shebang lines in scripts which they
  install in virtual environments.


Add a script?
- -------------

Perhaps a ``pyvenv`` script should be added as a more convienent and
discoverable alternative to ``python -m venv``.


Testability and Source Build Issues
- -----------------------------------

In order to be able to test the ``venv`` module in the Python
regression test suite, some anomalies in how sysconfig data is
configured in source builds will need to be removed. For example,
sysconfig.get_paths() in a source build gives (partial output):

{
 'include': '/home/vinay/tools/pythonv/Include',
 'libdir': '/usr/lib  ; or /usr/lib64 on a multilib system',
 'platinclude': '/home/vinay/tools/pythonv',
 'platlib': '/usr/local/lib/python3.3/site-packages',
 'platstdlib': '/usr/local/lib/python3.3',
 'purelib': '/usr/local/lib/python3.3/site-packages',
 'stdlib': '/usr/local/lib/python3.3'
}


Activation and Utility Scripts
- ------------------------------

Virtualenv currently provides shell "activation" scripts as a user
convenience, to put the virtual environment's Python binary first on
the shell PATH. This is a maintenance burden, as separate activation
scripts need to be provided and maintained for every supported
shell. For this reason, this PEP proposes to leave such scripts to be
provided by third-party extensions; virtual environments created by
the core functionality would be used by directly invoking the
environment's Python binary.

If we are going to rely on external code to provide these
conveniences, we need to check with existing third-party projects in
this space (virtualenv, zc.buildout) and ensure that the proposed API
meets their needs.

(Virtualenv would be fine with the proposed API; it would become a
relatively thin wrapper with a subclass of the env builder that adds
shell activation and automatic installation of ``pip`` inside the
virtual environment).


Ensuring that sys.site_prefix and sys.site_exec_prefix are always set?
- ----------------------------------------------------------------------

Currently the reference implementation's modifications to standard
library code use the idiom ``getattr(sys, "site_prefix",
sys.prefix)``. Do we want this to be the long-term pattern, or should
the sys module ensure that the ``site_*`` attributes are always set to
something (by default the same as the regular prefix attributes), even
if ``site.py`` does not run?


Reference Implementation
========================

The in-progress reference implementation is found in `a clone of the
CPython Mercurial repository`_.  To test it, build and install it (the
virtual environment tool currently does not run from a source tree).
- From the installed Python, run ``bin/python3 -m venv
/path/to/new/virtualenv`` to create a virtual environment.

The reference implementation (like this PEP!) is a work in
progress.

.. _a clone of the CPython Mercurial repository:
https://bitbucket.org/vinay.sajip/pythonv


References
==========


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6lrJMACgkQ8W4rlRKtE2dz4wCgqxtiHQr3ZEH/s1h069e15bu7
c70AoOSTd7drIp1g6z2QiuDKoTok6TRw
=9XEL
-----END PGP SIGNATURE-----



More information about the Python-ideas mailing list