Re: [Python-Dev] Patch for an initial support of bytes filename in Python3

30 Sep 2008

      Hi,
...
This is the most sane contribution I've seen so far :).
Oh thanks.
...
Do I understand properly that (listdir(bytes) -> bytes)?
Yes, os.listdir(bytes)->bytes. It's already the current behaviour.

But with Python3 trunk, os.listdir(str) -> str ... or bytes (if unicode 
conversion fails).
...
If so, this seems basically sane to me, since it provides text behavior
where possible and allows more sophisticated filesystem wrappers (i.e.
Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
separating filenames for display to the user and filenames for exchange
with the FS.
It's the goal of my patch. Let people do what you want with bytes: rename the 
file, try the best charset to display the filename, etc.
...
...
- remove os.getcwdu()
- create os.getcwdb() -> bytes
- glob.glob() support bytes
- fnmatch.filter() support bytes
- posixpath.join() and posixpath.split() support bytes
It sounds like maybe there should be some 2to3 fixers in here somewhere,
too?
IMHO a programmer should not use bytes for filenames. Only specific programs 
used to fix a broken system (eg. convmv program), a backup program, etc. 
should use bytes. So the "default" type (type and not charset) for filenames 
should be str in Python3.

If my patch would be applied, 2to3 have to replace getcwdu() to getcwd(). 
That's all.
...
Not necessarily as part of this patch, but somewhere related?  I 
don't know what they would do, but it does seem quite likely that code
which was previously correct under 2.6 (using bytes) would suddenly be
mixing bytes and unicode with these APIs.
It looks like 2to3 convert all text '...' or u'...' to unicode (str). So 
converted programs will use str for filenames.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/