[Python-Dev] ImportError: No module named multiarray (is back)
zbyszek at in.waw.pl
Sat Nov 26 18:54:13 CET 2011
I apologize in advance for the length of this mail.
When a script or a module is executed by invoking python with proper
arguments, sys.path is extended. When a path to script is given, the
directory containing the script is prepended. When '-m' or '-c' is used,
$CWD is prepended. This is documented in
http://docs.python.org/dev/using/cmdline.html, so far ok.
sys.path and $PYTHONPATH is like $PATH -- if you can convince someone to
put a directory under your control in any of them, you can execute code
as this someone. Therefore, sys.path is dangerous and important.
Unfortunately, sys.path manipulations are only described very briefly,
and without any commentary, in the on-line documentation. python(1)
manpage doesn't even mention them.
The problem: each of the commands below is insecure:
python /tmp/script.py (when script.py is safe by itself)
('/tmp' is added to sys.path, so an attacker can override any
module imported in /tmp/script.py by writing to /tmp/module.py)
cd /tmp && python -mtimeit -s 'import numpy' 'numpy.test()'
(UNIX users are accustomed to being able to safely execute
programs in any directory, e.g. ls, or gcc, or something.
Here '' is added to sys.path, so it is not secure to run
python is other-user-writable directories.)
cd /tmp/ && python -c 'import numpy; print(numpy.version.version)'
(The same as above, '' is added to sys.path.)
cd /tmp && python
(The same as above).
IMHO, if this (long-lived) behaviour is necessary, it should at least be
prominently documented. Also in the manpage.
Before adding a directory to sys.path as described above, Python
actually runs os.path.realpath over it. This means that if the path to a
script given on the commandline is actually a symlink, the directory
containing the real file will be executed. This behaviour is not really
documented (the documentation only says "the directory containing that
file is added to the start of sys.path"), but since the integrity of
sys.path is so important, it should be, IMHO.
Using realpath instead of the (expected) path specified by the user
breaks imports of non-pure-python (mixed .py and .so) modules from
modules executed as scripts on Debian. This is because Debian installs
architecture-independent python files in /usr/share/pyshared, and
symlinks those files into /usr/lib/pymodules/pythonX.Y/. The
architecture-dependent .so and python-version-dependent .pyc files are
installed in /usr/lib/pymodules/pythonX.Y/. When a script, e.g.
/usr/lib/pymodules/pythonX.Y/script.py, is executed, the directory
/usr/share/pyshared is prepended to sys.path. If the script tries to
import a module which has architecture-dependent parts (e.g. numpy) it
first sees the incomplete module in /usr/share/pyshared and fails.
This happens for example in parallel python
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620551) and recently
when packaging CellProfiler for Debian.
Again, if this is on purpose, it should be documented.
PEP 395 (Qualified Names for Modules)
PEP 395 proposes another sys.path manipulation. When running a script,
the directory tree will be walked upwards as long as there are
__init__.py files, and then the first directory without will be added.
This is of course a fine idea, but it makes a scenario, which was
previously safe, insecure. More precisely, when executing a script in a
directory in a parent directory-writable-by-other-users, the parent
directory will be added to sys.path.
So the (safe) operation of downloading an archive with a package,
unzipping it in /tmp, changing into the created directory, checking that
the script doesn't do anything bad, and running a script is now insecure
if there is __init__.py in the archive root.
I guess that it would be useful to have an option to turn off those
More information about the Python-Dev