I would like to know if a Python security team does exist. I sent an email
about an imageop issue, and I didn't get any answer. Later I learned that a
security ticket was created, I don't have access to it.
First, I would like to access to these informations. Not only this issue, but
all security related issues. I have some knowledges about security and I can
help to resolve issues and/or estimate the criticity of an issue.
Second, I would like to help to fix all Python security issues. It looks like
Python community isn't very reactive (proactive?) about security. Eg. a DoS
was reported in smtpd server (integrated to Python)... 15 months ago. A patch
is available but it's not applied in Python trunk.
Third, I'm also looking for a document explaining "how Python is secure" (!).
If an user can run arbitrary Python code, we know that it can do anything
(read/remove any file, create/kill any process, read/write anywhere in
memory, etc.). Brett wrote a paper about CPython sandboxing. PyPy is also
working on sandboxing using two interpreters: one has high priviledge and
execute instructions from the second interpreter (after checking the
permissions and arguments). So is there somewhere a document to explain to
current status of Python security?
Victor Stinner aka haypo
Can somebody remind how to check script compatibility with old Python versions?
I can remember PHP_CompatInfo class for PHP that parses a script or directory to
find out the minimum version and extensions required for them to run,
and I wonder
if there was anything like this for Python?
On 12:47 am, victor.stinner(a)haypocalc.com wrote:
This is the most sane contribution I've seen so far :).
>See attached patch: python3_bytes_filename.patch
>Using the patch, you will get:
>- open() support bytes
>- listdir(unicode) -> only unicode, *skip* invalid filenames
> (as asked by Guido)
Forgive me for being a bit dense, but I couldn't find this hunk in the
patch. Do I understand properly that (listdir(bytes) -> bytes)?
If so, this seems basically sane to me, since it provides text behavior
where possible and allows more sophisticated filesystem wrappers (i.e.
Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
separating filenames for display to the user and filenames for exchange
with the FS.
>- remove os.getcwdu()
>- create os.getcwdb() -> bytes
>- glob.glob() support bytes
>- fnmatch.filter() support bytes
>- posixpath.join() and posixpath.split() support bytes
It sounds like maybe there should be some 2to3 fixers in here somewhere,
too? Not necessarily as part of this patch, but somewhere related? I
don't know what they would do, but it does seem quite likely that code
which was previously correct under 2.6 (using bytes) would suddenly be
mixing bytes and unicode with these APIs.
-----BEGIN PGP SIGNED MESSAGE-----
I've been out of town since Friday, but I don't yet see anything in
the 700 billion email messages I'm now catching up on that leads me to
think we need to delay the release. Yay!
I will be on irc later today and will be trolling through the tracker
and buildbots soon. Don't trust email to get an important issue in
front of me today, please use irc or submit a showstopper bug against
2.6 if something /must/ be addressed before today's release.
I'm going to make a test release at around 1600UTC today, just to see
how building the docs and such go. I'm still planning on doing the
final final release at about 2200UTC. If you need to coordinate with
me (e.g. press releases, Windows builds, etc.) please meeting me on
#python-dev on irc.freenode.net.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
-----END PGP SIGNATURE-----
>> On Windows, we might reject bytes filenames for all file operations: open(),
>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
> Since I've seen no objections to this yet: please no. If we offer a
> "lower-level" bytes filename API, it should work for all platforms.
Unfortunately, it can't. You cannot represent all possible file names
in a byte string in Windows (just as you can't do so in a Unicode
string on Unix).
So using byte strings on Windows would work for some files, but fail
for others. In particular, listdir might give you a list of file names
which you then can't open/stat/recurse into.
(of course, you could use UTF-8 as the file system encoding on Windows,
but then you will have to rewrite a lot of C code first)
On Tue, Sep 30, 2008 at 12:42 PM, Terry Reedy <tjreedy(a)udel.edu> wrote:
> Guido van Rossum wrote:
>> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl <g.brandl(a)gmx.net> wrote:
>>> Victor Stinner schrieb:
>>>> On Windows, we might reject bytes filenames for all file operations:
>>>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError)
>>> Since I've seen no objections to this yet: please no. If we offer a
>>> "lower-level" bytes filename API, it should work for all platforms.
>> I'm not sure either way. I've heard it claim that Windows filesystem
>> APIs use Unicode natively. Does Python 3.0 on Windows currently
>> support filenames expressed as bytes? Are they encoded first before
>> passing to the Unicode APIs? Using what encoding?
> In 3.0rc1, the listdir doc needs updating:
> Return a list containing the names of the entries in the directory. The list
> is in arbitrary order. It does not include the special entries '.' and '..'
> even if they are present in the directory. Availability: Unix, Windows.
> On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will
> be a list of Unicode objects."
> s/Unicode/bytes/ at least for Windows.
> [b'countries.txt', b'multeetest.py', b't1.py', b't1.pyc', b't2.py', b'tem',
> b'temp.py', b'temp.pyc', b'temp2.py', b'temp3.py', b'temp4.py', b'test.py',
> b'z', b'z.txt']
> The bytes names do not work however:
> Traceback (most recent call last):
> File "<pyshell#23>", line 1, in <module>
> File "C:\Programs\Python30\lib\io.py", line 284, in __new__
> return open(*args, **kwargs)
> File "C:\Programs\Python30\lib\io.py", line 184, in open
> raise TypeError("invalid file: %r" % file)
> TypeError: invalid file: b'tem'
> Is this what you were asking?
No, that's because bytes is missing from the explicit list of
allowable types in io.open. Victor has a one-line trivial patch for
this. Could you try this though?
>>> import _fileio
--Guido van Rossum (home page: http://www.python.org/~guido/)
I read that Python 2.6 is planned to Wednesday. One bug is is still open and
important for me: Python 2.6/3.0 are unable to use filename as byte strings.
On Windows, all filenames are unicode strings (I guess UTF-16-LE), but on UNIX
for historical reasons, filenames are byte strings. On Linux, you can expect
UTF-8 valid filenames but sometimes (eg. copy from a FAT32 USB key to an ext3
filesystem) you get invalid filename (byte string in a different charset than
your default filesystem encoding (utf8)).
Python functions using filenames
In Python, you have (incomplete list):
- filename producer: os.listdir(), os.walk(), glob.glob()
- filename manipulation: os.path.*()
- access file: open(), os.unlink(), shutil.rmtree()
If you give unicode to producer, they return unicode _or_ byte strings (type
may change for each filename :-/). Guido proposed to break this behaviour:
raise an exception if unicode conversion fails. We may consider an option
like "skip invalid".
If you give bytes to producer, they only return byte strings. Great.
Filename manipulation: in python 2.6/3.0, os.path.*() is not compatible with
the type "bytes". So you can not use os.path.join(<your unicode path>, <bytes
filename>) *nor* os.path.join(<your bytes path>, <bytes filename>) because
os.path.join() (eg. with the posix version) uses path.endswith('/').
Access file: open() rejects the type bytes (it's just a test, open() supports
bytes if you remove the test). As I remember, unlink() is compatible with
bytes. But rmtree() fails because it uses os.path.join() (even if you give
bytes directory, join() fails).
- producer: unicode => *only* unicode // bytes => bytes
- manipulation: support both unicode and bytes but avoid (when it's possible)
to mix bytes and characters
- open(): allow bytes
I implemented these solutions as a patch set attached to the issue #3187:
* posix_path_bytes.patch: fix posixpath.join() to support bytes
* io_byte_filename.patch: open() allows bytes filename
* fnmatch_bytes.patch: patch fnmatch.filter() to accept bytes filenames
* glob1_bytes.patch: fix glob.glob() to accept invalid directory name
Mmmh, there is no patch for stop os.listdir() on invalid filename.
I think that the problem is important because it's a regression from 2.5 to
2.6/3.0. Python 2.5 uses bytes filename, so it was possible to
open/unlink "invalid" unicode strings (since it's not unicode but bytes).
Well, if it's too late for the final versions, this problem should be at least
Test the problem
Example to create invalid filenames on Linux:
$ mkdir /tmp/test
$ cd /tmp/test
$ touch $(echo -e "a\xffb")
$ mkdir $(echo -e "dir\xffname")
$ touch $(echo -e "dir\xffname/file")
>>> import os
>>> open(os.listdir('.')).close() # open file: ok
>>> os.unlink(os.listdir('.')) # remove file: ok
>>> shutil.rmtree(os.listdir('.')) # remove dir: ok
I proposed an ugly type "InvalidFilename" mixing bytes and characters. As
everybody using unicode knows, it's a bad idea :-) (and it introduces a new
Convert bytes to unicode (replace)
unicode_filename = unicode(bytes_filename, charset, "replace")
Ok, you will get valid unicode strings which can be used in os.path.join() &
friends, but open() or unlink() will fails because this filename doesn't
Victor Stinner aka haypo