Hi all, Since at least 2011, virtualenv has not supported spaces in paths. This has bitten many people, including myself, and caused numerous issues over the years [1] [2] [3] [4] [5] [6] [7]. However, as was discussed in [8], the issue lies not with virtualenv but with distlib, via pip. It would be possible for pip to use the existing distlib interface to hack around the problem, but I believe the current behavior of distlib is erroneous when it comes to spaces in paths. I therefore believe it would be more appropriate to fix the problem in distlib. Two separate patches [9] [10] that solve the problem in distlib were posted in January by Harald Nordgren. However, they were declined pending a discussion on distutils-sig [11]. As far as I can tell, no such discussion was ever started. However, the issue remains, and we have a clear solution proposal to consider, so I'd like to kick it off now. In the remainder of this email, I'll explain the problem and surrounding context in detail, and why I think the solution proposed by Harald (or some variation) is a good path forward for distlib. I look forward to hearing your thoughts on the matter. ========================== The behavior of virtualenv ========================== The following is written for: $ python --version Python 2.7.13 $ pip --version pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7) $ virtualenv --version 15.1.0 Creating a virtualenv is done as follows: $ virtualenv venv New python executable in /private/tmp/path with spaces/venv/bin/python2.7 Also creating executable in /private/tmp/path with spaces/venv/bin/python Installing setuptools, pip, wheel...done. This creates a directory structure looking as follows under the venv directory: ├── bin │ ├── activate │ ├── activate.csh │ ├── activate.fish │ ├── activate_this.py │ ├── easy_install │ ├── easy_install-2.7 │ ├── pip │ ├── pip2 │ ├── pip2.7 │ ├── python -> ./bin/python2.7 │ ├── python-config │ ├── python2 -> ./bin/python2.7 │ ├── python2.7 │ └── wheel ├── include │ └── ... ├── lib │ └── ... └── ... The idea is that one can call the pip and python executables inside the virtualenv, instead of the system ones. Like so: $ venv/bin/python --version Python 2.7.13 $ venv/bin/pip --version zsh: venv/bin/pip: bad interpreter: "/private/tmp/path: no such file or directory Unfortunately, as you can see, pip doesn't work at all! Why does this happen? While the python executable is a native binary, pip is actually just a Python script, which is specified to be run by the accompanying virtualenv python executable. Here are the contents: #!"/private/tmp/path with spaces/venv/bin/python2.7" # -*- coding: utf-8 -*- import re import sys from pip import main if __name__ == '__main__': sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0]) sys.exit(main()) The issue is that the python binary is specified using a shebang, which is known [12] [13] to be fragile, OS-dependent, and error-prone regarding paths that are long or that contain spaces or non-ASCII characters. In particular, the quoting of the shebang does not work on most operating systems, including macOS, which is what I ran this test on. ====================== virtualenv and distlib ====================== The issue is complicated by the fact that there are several different libraries at play. To perform the installation of pip and wheel into the virtualenv, virtualenv calls into pip [14]. The 'pip install' command then uses the subroutine library 'wheel.py', which generates the stub scripts using distlib's ScriptMaker [15]. It is actually distlib which generates the shebang, although this can be overridden by setting the 'executable' property of the ScriptMaker object [16] [17]. Any patch to fix the virtualenv problem would therefore need to be in either pip (as a consumer of the shebang-generation interface) or distlib (as the provider of that interface). The problem cannot be addressed by virtualenv without doing something like using pip/distlib to generate the scripts and then fixing them after the fact (this has been proposed [26] [29], but I consider it a hack). ================== Proposed solutions ================== There has been extended discussion about this issue, especially in [2]. Essentially, the solutions proposed fall into four categories: (1) Don't change anything; end users can work around the issue. For example, they can place their virtualenvs in a different directory than their project, or change their username to avoid having spaces or non-ASCII characters. (2) Don't fix the bug, but add a warning to virtualenv. If we absolutely can't fix the bug (which I strongly believe not to be the case), then this would be the next best thing to do. See [27] [30] [31]. (3) Attempt to escape the shebang, for example by using backslashes or quotes. The latter has already been implemented. Unfortunately, this does not work on most operating systems, since shebangs are interpreted by the kernel in a rather primitive way. (4) Patch either pip or distlib to use a different strategy for dispatching the python binary to be used. For example, make the pip script into a shell script with '#!/usr/bin/env sh' that then invokes the real pip script using the appropriate python binary [9]; this works because even POSIX sh supports escaping arbitrary characters in executable paths. Alternatively, use a clever hack to make the pip script executable as both a Python script and as a shell script [10]. This idea originated from [19]. I argue that approaches (1), (2), and (3) are inadequate, and that (4) or some variation thereof is the best path forward for distlib. Regarding (4), note that in addition to the second pull request [10], there is another, possibly cleaner implementation of the same idea at [18], by the same author. Since this issue has been discussed since 2011 without being fixed, I obviously need to justify my position. =================================== Justifications and counterarguments ===================================
There's an easy workaround for the end user, so there's no need to change things on the pip/distlib side.
The workaround may not necessarily be "easy". For example, there have been reports of this problem being unavoidable on macOS in certain circumstances, since the drive name ("Macintosh HD") has a space in it [1]. People working in corporate environments may not be able to simply move their virtualenvs to a different directory (possibly one outside their home directory, if their username has a space or non-ASCII character in it). These issues really do happen, and really do inconvenience people [20]. However, setting these cases aside, the workaround is indeed not too hard for most people. The question is: do we really want virtualenv to have this behavior? To me, it honestly seems embarrassing that such an important tool, and a cornerstone of the Python ecosystem, can't handle something as simple as spaces in directory names, even more than six years after the bug was first reported. If there were a bug in the development version of Git that prevented it from working at all when you had your repository in a path with spaces, nobody would even consider shipping a release! In 2017, people use Unicode and spaces in their filesystems, and they expect their tools to handle this. In my opinion, end-user workarounds should be reserved for those cases where it is absolutely impossible for the software to solve the problem itself without introducing even worse problems. This is simply not the case with the distlib/pip path bug. Patches that present elegant ways to solve the issue without breaking backward compatibility are available [9] [10]. There is no risk of breaking programs that rely on the previous behavior, since the previous behavior was a complete inability for virtualenv to function *at all*, even for 'pip --version'!
People who use long paths and/or paths with spaces and/or paths with non-ASCII characters in them are rare.
No, that's not true [1] [2] [3] [4] [5] [6] [7] [23] [24] [28]. Considering the tiny fraction of people running into bugs who actually report them, I'd say quite a few people run into this issue. It's pretty rare that I'm able to scrounge up this number of issues all leading to a single bug. (I've also run into it, on multiple occasions over several years, by the way.)
The issue with spaces is a limitation of the way in which the kernel parses shebangs, and this can't be fixed.
This is technically true [13], but irrelevant [21]. The implications of this fact are that virtualenv cannot solve this issue *if it continues to directly use shebangs for interpreter dispatch*. However, there are many other ways to dispatch interpreters [9] [10], none of which have the same limitations of the shebang. Thus the limitations of shebangs have no bearing on this issue, unless it turns out that it's completely impossible to dispatch scripts in any other way while maintaining compatibility with all supported platforms, which seems extremely unlikely to me.
We don't want to make the code more complicated unless there's a good reason for it.
I agree. In my opinion, the complete failure of virtualenv to operate in a common real-world use case is a good reason. However, beyond this, we can be even more conservative. In Harald's alternate patch [18], the polyglot script hack is only used in situations where the path is too long for a shebang (conditional on the OS) or it contains spaces. Thus, it's only possible for the patch to improve things, since it is only activated in cases where virtualenv was completely nonfunctional before. Look at software like Bash. It has implicit wordsplitting, which makes dealing with spaces in paths difficult. However, it is *extremely* well-understood that it is a best practice to disable wordsplitting by quoting arguments [22]. That is, modern software is expected to handle spaces and other special characters in paths. Failing to do so leads to hard-to-trace bugs and potential security vulnerabilities.
We don't want to change the default behavior of distlib. pip should handle its shebang needs using the ScriptMaker.executable property, which was provided for this purpose.
Now we are getting somewhere. This is one of two solutions proposed which would actually fix the bug. However, I feel that it is the wrong way to go about it. In my opinion, distlib (and specifically ScriptMaker) are providing an interface. This interface specifies that when you use distlib to generate a script, and then run the script, it will invoke the executable that you told it to. This is not currently the case when there are spaces in the path leading to the executable. In my opinion, this is a *bug* in that distlib fails to fulfill its specification. Yes, other applications that use distlib could work around this bug, but it is fundamentally a bug in distlib, and should be fixed there. The ability to support spaces in paths should not be a special feature that must be hacked in (separately, resulting in code duplication and increased maintenance burden) to every application that wants to use distlib to generate executables. If we really want to go that way, though, Harald has made pull requests for both pip [25] and virtualenv [26]. ========== Next steps ========== I'd love to hear people's thoughts on the issues raised here. If we decide to go forward with patching distlib, I think the logical next step would be to look at Harald's patches and see if they need to modified to work correctly on all the systems that Python/pip support. As I mentioned earlier, the last comment on [9] was a request for a discussion to be started on distutils-sig, as this email intends. What do people think about [9], [10], [18]? Best regards, Radon Rosborough [1]: http://stackoverflow.com/q/15472430/3538165 [2]: https://github.com/pypa/virtualenv/issues/53 [3]: https://github.com/pypa/virtualenv/issues/994 [4]: https://github.com/pypa/pip/issues/923 [5]: https://bugs.python.org/issue20622 [6]: https://groups.google.com/forum/#!topic/virtualenvwrapper/vqW3fedTgKc [7]: https://bugs.launchpad.net/virtualenv/+bug/241581 [8]: https://github.com/pypa/virtualenv/issues/53#issuecomment-302005361 [9]: https://bitbucket.org/pypa/distlib/pull-requests/31 [10]: https://bitbucket.org/pypa/distlib/pull-requests/32 [11]: https://bitbucket.org/pypa/distlib/pull-requests/31/#comment-29795586 [12]: https://bitbucket.org/pypa/distlib/pull-requests/31/#comment-29734103 [13]: https://lists.gnu.org/archive/html/bug-bash/2008-05/msg00052.html [14]: https://github.com/pypa/virtualenv/blob/c1ef9e29bfda9f5b128476d0c6d865ffe681... [15]: https://github.com/pypa/pip/blob/7c4353e6834a38b36a34234db865e13a589c9e5f/pi... [16]: https://bitbucket.org/pypa/distlib/src/216bd857431686493d37eb081614d91c1356f... [17]: http://pythonhosted.org/distlib/tutorial.html#specifying-a-custom-executable... [18]: https://bitbucket.org/pypa/distlib/pull-requests/33 [19]: https://hg.mozilla.org/mozilla-central/file/tip/mach [20]: https://bitbucket.org/pypa/distlib/pull-requests/31/#comment-29735496 [21]: https://github.com/pypa/virtualenv/issues/53#issuecomment-302019457 [22]: http://mywiki.wooledge.org/BashGuide/Practices#Quoting [23]: https://codedump.io/share/7xYRbcZCNihE/1/using-virtualenv-with-spaces-in-a-p... [24]: https://github.com/pypa/virtualenv/issues/997 [25]: https://github.com/pypa/pip/pull/4237 [26]: https://github.com/pypa/virtualenv/pull/1004 [27]: https://github.com/pypa/virtualenv/issues/1039 [28]: https://github.com/pypa/virtualenv/issues/1014 [29]: https://github.com/pypa/virtualenv/pull/910 [30]: http://bugs.python.org/issue28446 [31]: https://github.com/pypa/virtualenv/issues/53#issuecomment-282466994