[Python-Dev] ImportError: No module named multiarray (is back)

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Sat Nov 26 18:54:13 CET 2011


Hi,
I apologize in advance for the length of this mail.

sys.path
========
When a script or a module is executed by invoking python with proper 
arguments, sys.path is extended. When a path to script is given, the 
directory containing the script is prepended. When '-m' or '-c' is used, 
$CWD is prepended. This is documented in 
http://docs.python.org/dev/using/cmdline.html, so far ok.

sys.path and $PYTHONPATH is like $PATH -- if you can convince someone to 
put a directory under your control in any of them, you can execute code 
as this someone. Therefore, sys.path is dangerous and important. 
Unfortunately, sys.path manipulations are only described very briefly, 
and without any commentary, in the on-line documentation. python(1) 
manpage doesn't even mention them.

The problem: each of the commands below is insecure:

python /tmp/script.py                 (when script.py is safe by itself)
         ('/tmp' is added to sys.path, so an attacker can override any
          module imported in /tmp/script.py by writing to /tmp/module.py)

cd /tmp && python -mtimeit -s 'import numpy' 'numpy.test()'
         (UNIX users are accustomed to being able to safely execute
          programs in any directory, e.g. ls, or gcc, or something.

          Here '' is added to sys.path, so it is not secure to run
          python is other-user-writable directories.)

cd /tmp/ && python -c 'import numpy; print(numpy.version.version)'
          (The same as above, '' is added to sys.path.)

cd /tmp && python
          (The same as above).

IMHO, if this (long-lived) behaviour is necessary, it should at least be 
prominently documented. Also in the manpage.

Prepending realpath(dirname(scriptname))
========================================
Before adding a directory to sys.path as described above, Python 
actually runs os.path.realpath over it. This means that if the path to a 
script given on the commandline is actually a symlink, the directory 
containing the real file will be executed. This behaviour is not really 
documented (the documentation only says "the directory containing that 
file is added to the start of sys.path"), but since the integrity of 
sys.path is so important, it should be, IMHO.

Using realpath instead of the (expected) path specified by the user 
breaks imports of non-pure-python (mixed .py and .so) modules from 
modules executed as scripts on Debian. This is because Debian installs 
architecture-independent python files in /usr/share/pyshared, and 
symlinks those files into /usr/lib/pymodules/pythonX.Y/. The 
architecture-dependent .so and python-version-dependent .pyc files are 
installed in  /usr/lib/pymodules/pythonX.Y/. When a script, e.g. 
/usr/lib/pymodules/pythonX.Y/script.py, is executed, the directory 
/usr/share/pyshared is prepended to sys.path. If the script tries to 
import a module which has architecture-dependent parts (e.g. numpy) it 
first sees the incomplete module in /usr/share/pyshared and fails.

This happens for example in parallel python 
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620551) and recently 
when packaging CellProfiler for Debian.

Again, if this is on purpose, it should be documented.

PEP 395 (Qualified Names for Modules)
=====================================

PEP 395 proposes another sys.path manipulation. When running a script, 
the directory tree will be walked upwards as long as there are 
__init__.py files, and then the first directory without will be added.

This is of course a fine idea, but it makes a scenario, which was 
previously safe, insecure. More precisely, when executing a script in a 
directory in a parent directory-writable-by-other-users, the parent 
directory will be added to sys.path.

So the (safe) operation of downloading an archive with a package, 
unzipping it in /tmp, changing into the created directory, checking that 
the script doesn't do anything bad, and running a script is now insecure 
if there is __init__.py in the archive root.


I guess that it would be useful to have an option to turn off those 
sys.path manipulations.

Zbyszek



More information about the Python-Dev mailing list