[issue8514] Create fsencode() and fsdecode() functions in os.path

STINNER Victor report at bugs.python.org
Mon Apr 26 14:00:17 CEST 2010


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

Le lundi 26 avril 2010 13:06:48, vous avez écrit :
> I don't see what environment variables have to do with the file
> system.

A POSIX system only offers *one* function about the encoding: 
nl_langinfo(CODESET) and Python3 uses it for the filenames, environment 
variables and the command line arguments.

Are you suggesting that Python3 should support a encoding different for 
environment variables and the file system? How would the user configure it?

About filenames, Python3 choose the encoding using the locale, but the user 
cannot change it: sys.setfilesystemencoding() is removed by the site module.

> Also note that "mbcs" on Windows is a meta-encoding. The
> implementation of that encoding depends on the locale used by
> the Windows user. It's just a coincidence that this may actually
> work for the environment variables on Windows as well, but there's
> no guarantee.

os.getenv() should raise a TypeError on Windows if key is a byte string.

os.getenv() didn't support byte string. I patched it to support byte string 
(issue #8391, r80421). But I don't like my fix because we should reject 
support byte string *on Windows*. I would like to factorize the type check for 
all operations on the file system and environment variables in 
fsencode()/fsdecode().

> On Unix, you often have the case that the environment variables
> use mixed encodings, e.g. the CGI interface is a good example
> where this happens per definition. The CGI environment can
> includes file system paths, data encoded in Latin-1 (or some
> other encoding), etc.

Since Python3 choosed to store environment variables as unicode string on 
Windows and POSIX, in this specific case you should reconvert the value to 
byte strings using fsencode() and then manipulate byte strings. Because 
Python3 uses surrogateescape, you will get the original byte string values.

My patch should help both cases: people using unicode objects and people using 
the native OS type (bytes on POSIX). As written in my previous message, you 
can still use byte strings if you want. My patch doesn't change that (on POSIX 
systems).

----------
title: Create fs_encode() and fs_decode() functions in os.path -> Create fsencode() and fsdecode() functions in os.path

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8514>
_______________________________________


More information about the Python-bugs-list mailing list