[Distutils] Finally making distlib handle spaces

Radon Rosborough radon.neon at gmail.com
Sat May 20 10:45:24 EDT 2017


Hi all,

Since at least 2011, virtualenv has not supported spaces in paths.
This has bitten many people, including myself, and caused numerous
issues over the years [1] [2] [3] [4] [5] [6] [7].

However, as was discussed in [8], the issue lies not with virtualenv
but with distlib, via pip. It would be possible for pip to use the
existing distlib interface to hack around the problem, but I believe
the current behavior of distlib is erroneous when it comes to spaces
in paths. I therefore believe it would be more appropriate to fix the
problem in distlib.

Two separate patches [9] [10] that solve the problem in distlib were
posted in January by Harald Nordgren. However, they were declined
pending a discussion on distutils-sig [11]. As far as I can tell, no
such discussion was ever started. However, the issue remains, and we
have a clear solution proposal to consider, so I'd like to kick it off
now. In the remainder of this email, I'll explain the problem and
surrounding context in detail, and why I think the solution proposed
by Harald (or some variation) is a good path forward for distlib. I
look forward to hearing your thoughts on the matter.

==========================
The behavior of virtualenv
==========================

The following is written for:

    $ python --version
    Python 2.7.13
    $ pip --version
    pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
    $ virtualenv --version
    15.1.0

Creating a virtualenv is done as follows:

    $ virtualenv venv
    New python executable in /private/tmp/path with spaces/venv/bin/python2.7
    Also creating executable in /private/tmp/path with spaces/venv/bin/python
    Installing setuptools, pip, wheel...done.

This creates a directory structure looking as follows under the venv
directory:

    ├── bin
    │  ├── activate
    │  ├── activate.csh
    │  ├── activate.fish
    │  ├── activate_this.py
    │  ├── easy_install
    │  ├── easy_install-2.7
    │  ├── pip
    │  ├── pip2
    │  ├── pip2.7
    │  ├── python -> ./bin/python2.7
    │  ├── python-config
    │  ├── python2 -> ./bin/python2.7
    │  ├── python2.7
    │  └── wheel
    ├── include
    │  └── ...
    ├── lib
    │  └── ...
    └── ...

The idea is that one can call the pip and python executables
inside the virtualenv, instead of the system ones. Like so:

    $ venv/bin/python --version
    Python 2.7.13
    $ venv/bin/pip --version
    zsh: venv/bin/pip: bad interpreter: "/private/tmp/path: no such
file or directory

Unfortunately, as you can see, pip doesn't work at all! Why does this
happen? While the python executable is a native binary, pip is
actually just a Python script, which is specified to be run by the
accompanying virtualenv python executable. Here are the contents:

    #!"/private/tmp/path with spaces/venv/bin/python2.7"

    # -*- coding: utf-8 -*-
    import re
    import sys

    from pip import main

    if __name__ == '__main__':
        sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
        sys.exit(main())

The issue is that the python binary is specified using a shebang,
which is known [12] [13] to be fragile, OS-dependent, and error-prone
regarding paths that are long or that contain spaces or non-ASCII
characters. In particular, the quoting of the shebang does not work on
most operating systems, including macOS, which is what I ran this test
on.

======================
virtualenv and distlib
======================

The issue is complicated by the fact that there are several different
libraries at play. To perform the installation of pip and wheel into
the virtualenv, virtualenv calls into pip [14]. The 'pip install'
command then uses the subroutine library 'wheel.py', which generates
the stub scripts using distlib's ScriptMaker [15]. It is actually
distlib which generates the shebang, although this can be overridden
by setting the 'executable' property of the ScriptMaker object [16]
[17]. Any patch to fix the virtualenv problem would therefore need to
be in either pip (as a consumer of the shebang-generation interface)
or distlib (as the provider of that interface). The problem cannot be
addressed by virtualenv without doing something like using pip/distlib
to generate the scripts and then fixing them after the fact (this has
been proposed [26] [29], but I consider it a hack).

==================
Proposed solutions
==================

There has been extended discussion about this issue, especially in
[2]. Essentially, the solutions proposed fall into four categories:

(1) Don't change anything; end users can work around the issue. For
    example, they can place their virtualenvs in a different directory
    than their project, or change their username to avoid having
    spaces or non-ASCII characters.

(2) Don't fix the bug, but add a warning to virtualenv. If we
    absolutely can't fix the bug (which I strongly believe not to be
    the case), then this would be the next best thing to do. See [27]
    [30] [31].

(3) Attempt to escape the shebang, for example by using backslashes or
    quotes. The latter has already been implemented. Unfortunately,
    this does not work on most operating systems, since shebangs are
    interpreted by the kernel in a rather primitive way.

(4) Patch either pip or distlib to use a different strategy for
    dispatching the python binary to be used. For example, make the
    pip script into a shell script with '#!/usr/bin/env sh' that then
    invokes the real pip script using the appropriate python binary [9];
    this works because even POSIX sh supports escaping arbitrary
    characters in executable paths. Alternatively, use a clever hack
    to make the pip script executable as both a Python script and as a
    shell script [10]. This idea originated from [19].

I argue that approaches (1), (2), and (3) are inadequate, and that (4)
or some variation thereof is the best path forward for distlib.
Regarding (4), note that in addition to the second pull request [10],
there is another, possibly cleaner implementation of the same idea at
[18], by the same author. Since this issue has been discussed since
2011 without being fixed, I obviously need to justify my position.

===================================
Justifications and counterarguments
===================================

> There's an easy workaround for the end user, so there's no need to
> change things on the pip/distlib side.

The workaround may not necessarily be "easy". For example, there have
been reports of this problem being unavoidable on macOS in certain
circumstances, since the drive name ("Macintosh HD") has a space in it
[1]. People working in corporate environments may not be able to
simply move their virtualenvs to a different directory (possibly one
outside their home directory, if their username has a space or
non-ASCII character in it). These issues really do happen, and really
do inconvenience people [20].

However, setting these cases aside, the workaround is indeed not too
hard for most people. The question is: do we really want virtualenv to
have this behavior? To me, it honestly seems embarrassing that such an
important tool, and a cornerstone of the Python ecosystem, can't
handle something as simple as spaces in directory names, even more
than six years after the bug was first reported. If there were a bug
in the development version of Git that prevented it from working at
all when you had your repository in a path with spaces, nobody would
even consider shipping a release! In 2017, people use Unicode and
spaces in their filesystems, and they expect their tools to handle
this.

In my opinion, end-user workarounds should be reserved for those cases
where it is absolutely impossible for the software to solve the
problem itself without introducing even worse problems. This is simply
not the case with the distlib/pip path bug. Patches that present
elegant ways to solve the issue without breaking backward
compatibility are available [9] [10]. There is no risk of breaking
programs that rely on the previous behavior, since the previous
behavior was a complete inability for virtualenv to function *at all*,
even for 'pip --version'!

> People who use long paths and/or paths with spaces and/or paths with
> non-ASCII characters in them are rare.

No, that's not true [1] [2] [3] [4] [5] [6] [7] [23] [24] [28].
Considering the tiny fraction of people running into bugs who actually
report them, I'd say quite a few people run into this issue. It's
pretty rare that I'm able to scrounge up this number of issues
all leading to a single bug. (I've also run into it, on multiple
occasions over several years, by the way.)

> The issue with spaces is a limitation of the way in which the kernel
> parses shebangs, and this can't be fixed.

This is technically true [13], but irrelevant [21]. The implications
of this fact are that virtualenv cannot solve this issue *if it
continues to directly use shebangs for interpreter dispatch*. However,
there are many other ways to dispatch interpreters [9] [10], none of
which have the same limitations of the shebang. Thus the limitations
of shebangs have no bearing on this issue, unless it turns out that
it's completely impossible to dispatch scripts in any other way while
maintaining compatibility with all supported platforms, which seems
extremely unlikely to me.

> We don't want to make the code more complicated unless there's a
> good reason for it.

I agree. In my opinion, the complete failure of virtualenv to operate
in a common real-world use case is a good reason. However, beyond
this, we can be even more conservative. In Harald's alternate patch
[18], the polyglot script hack is only used in situations where the
path is too long for a shebang (conditional on the OS) or it contains
spaces. Thus, it's only possible for the patch to improve things,
since it is only activated in cases where virtualenv was completely
nonfunctional before.

Look at software like Bash. It has implicit wordsplitting, which makes
dealing with spaces in paths difficult. However, it is *extremely*
well-understood that it is a best practice to disable wordsplitting by
quoting arguments [22]. That is, modern software is expected to handle
spaces and other special characters in paths. Failing to do so leads
to hard-to-trace bugs and potential security vulnerabilities.

> We don't want to change the default behavior of distlib. pip should
> handle its shebang needs using the ScriptMaker.executable property,
> which was provided for this purpose.

Now we are getting somewhere. This is one of two solutions proposed
which would actually fix the bug. However, I feel that it is the wrong
way to go about it.

In my opinion, distlib (and specifically ScriptMaker) are providing an
interface. This interface specifies that when you use distlib to
generate a script, and then run the script, it will invoke the
executable that you told it to. This is not currently the case when
there are spaces in the path leading to the executable. In my opinion,
this is a *bug* in that distlib fails to fulfill its specification.
Yes, other applications that use distlib could work around this bug,
but it is fundamentally a bug in distlib, and should be fixed there.
The ability to support spaces in paths should not be a special feature
that must be hacked in (separately, resulting in code duplication and
increased maintenance burden) to every application that wants to use
distlib to generate executables.

If we really want to go that way, though, Harald has made pull
requests for both pip [25] and virtualenv [26].

==========
Next steps
==========

I'd love to hear people's thoughts on the issues raised here. If we
decide to go forward with patching distlib, I think the logical next
step would be to look at Harald's patches and see if they need to
modified to work correctly on all the systems that Python/pip support.
As I mentioned earlier, the last comment on [9] was a request for a
discussion to be started on distutils-sig, as this email intends. What
do people think about [9], [10], [18]?

Best regards,
Radon Rosborough

[1]: http://stackoverflow.com/q/15472430/3538165
[2]: https://github.com/pypa/virtualenv/issues/53
[3]: https://github.com/pypa/virtualenv/issues/994
[4]: https://github.com/pypa/pip/issues/923
[5]: https://bugs.python.org/issue20622
[6]: https://groups.google.com/forum/#!topic/virtualenvwrapper/vqW3fedTgKc
[7]: https://bugs.launchpad.net/virtualenv/+bug/241581
[8]: https://github.com/pypa/virtualenv/issues/53#issuecomment-302005361
[9]: https://bitbucket.org/pypa/distlib/pull-requests/31
[10]: https://bitbucket.org/pypa/distlib/pull-requests/32
[11]: https://bitbucket.org/pypa/distlib/pull-requests/31/#comment-29795586
[12]: https://bitbucket.org/pypa/distlib/pull-requests/31/#comment-29734103
[13]: https://lists.gnu.org/archive/html/bug-bash/2008-05/msg00052.html
[14]: https://github.com/pypa/virtualenv/blob/c1ef9e29bfda9f5b128476d0c6d865ffe681b3fb/virtualenv.py#L856-L882
[15]: https://github.com/pypa/pip/blob/7c4353e6834a38b36a34234db865e13a589c9e5f/pip/wheel.py#L383-L427
[16]: https://bitbucket.org/pypa/distlib/src/216bd857431686493d37eb081614d91c1356fa79/distlib/scripts.py#-141:143
[17]: http://pythonhosted.org/distlib/tutorial.html#specifying-a-custom-executable-for-shebangs
[18]: https://bitbucket.org/pypa/distlib/pull-requests/33
[19]: https://hg.mozilla.org/mozilla-central/file/tip/mach
[20]: https://bitbucket.org/pypa/distlib/pull-requests/31/#comment-29735496
[21]: https://github.com/pypa/virtualenv/issues/53#issuecomment-302019457
[22]: http://mywiki.wooledge.org/BashGuide/Practices#Quoting
[23]: https://codedump.io/share/7xYRbcZCNihE/1/using-virtualenv-with-spaces-in-a-path
[24]: https://github.com/pypa/virtualenv/issues/997
[25]: https://github.com/pypa/pip/pull/4237
[26]: https://github.com/pypa/virtualenv/pull/1004
[27]: https://github.com/pypa/virtualenv/issues/1039
[28]: https://github.com/pypa/virtualenv/issues/1014
[29]: https://github.com/pypa/virtualenv/pull/910
[30]: http://bugs.python.org/issue28446
[31]: https://github.com/pypa/virtualenv/issues/53#issuecomment-282466994


More information about the Distutils-SIG mailing list