See below my comment.
2011/11/26 Zbigniew Jędrzejewski-Szmek email@example.com:
Hi, I apologize in advance for the length of this mail.
When a script or a module is executed by invoking python with proper arguments, sys.path is extended. When a path to script is given, the directory containing the script is prepended. When '-m' or '-c' is used, $CWD is prepended. This is documented in http://docs.python.org/dev/using/cmdline.html, so far ok.
sys.path and $PYTHONPATH is like $PATH -- if you can convince someone to put a directory under your control in any of them, you can execute code as this someone. Therefore, sys.path is dangerous and important. Unfortunately, sys.path manipulations are only described very briefly, and without any commentary, in the on-line documentation. python(1) manpage doesn't even mention them.
The problem: each of the commands below is insecure:
python /tmp/script.py (when script.py is safe by itself) ('/tmp' is added to sys.path, so an attacker can override any module imported in /tmp/script.py by writing to /tmp/module.py)
cd /tmp && python -mtimeit -s 'import numpy' 'numpy.test()' (UNIX users are accustomed to being able to safely execute programs in any directory, e.g. ls, or gcc, or something.
Here '' is added to sys.path, so it is not secure to run python is other-user-writable directories.)
cd /tmp/ && python -c 'import numpy; print(numpy.version.version)' (The same as above, '' is added to sys.path.)
cd /tmp && python (The same as above).
IMHO, if this (long-lived) behaviour is necessary, it should at least be prominently documented. Also in the manpage.
Before adding a directory to sys.path as described above, Python actually runs os.path.realpath over it. This means that if the path to a script given on the commandline is actually a symlink, the directory containing the real file will be executed. This behaviour is not really documented (the documentation only says "the directory containing that file is added to the start of sys.path"), but since the integrity of sys.path is so important, it should be, IMHO.
Using realpath instead of the (expected) path specified by the user breaks imports of non-pure-python (mixed .py and .so) modules from modules executed as scripts on Debian. This is because Debian installs architecture-independent python files in /usr/share/pyshared, and symlinks those files into /usr/lib/pymodules/pythonX.Y/. The architecture-dependent .so and python-version-dependent .pyc files are installed in /usr/lib/pymodules/pythonX.Y/. When a script, e.g. /usr/lib/pymodules/pythonX.Y/script.py, is executed, the directory /usr/share/pyshared is prepended to sys.path. If the script tries to import a module which has architecture-dependent parts (e.g. numpy) it first sees the incomplete module in /usr/share/pyshared and fails.
This happens for example in parallel python (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620551) and recently when packaging CellProfiler for Debian.
Again, if this is on purpose, it should be documented.
PEP 395 (Qualified Names for Modules)
PEP 395 proposes another sys.path manipulation. When running a script, the directory tree will be walked upwards as long as there are __init__.py files, and then the first directory without will be added.
This is of course a fine idea, but it makes a scenario, which was previously safe, insecure. More precisely, when executing a script in a directory in a parent directory-writable-by-other-users, the parent directory will be added to sys.path.
So the (safe) operation of downloading an archive with a package, unzipping it in /tmp, changing into the created directory, checking that the script doesn't do anything bad, and running a script is now insecure if there is __init__.py in the archive root.
I guess that it would be useful to have an option to turn off those sys.path manipulations.
Thanks very much for the details explanation. Given this, I believe I can safely give up on CellProfiler packaging until this issue is addressed upstream (either in CellProfiler using an indirection, or in python).