[Python-Dev] Patch for an initial support of bytes filename in Python3

Tue Sep 30 15:54:20 CEST 2008

Hi,

> This is the most sane contribution I've seen so far :).

Oh thanks.

> Do I understand properly that (listdir(bytes) -> bytes)?

Yes, os.listdir(bytes)->bytes. It's already the current behaviour.

But with Python3 trunk, os.listdir(str) -> str ... or bytes (if unicode 
conversion fails).

> If so, this seems basically sane to me, since it provides text behavior
> where possible and allows more sophisticated filesystem wrappers (i.e.
> Twisted's FilePath, Will McGugan's "FS") to do more tricky things,
> separating filenames for display to the user and filenames for exchange
> with the FS.

It's the goal of my patch. Let people do what you want with bytes: rename the 
file, try the best charset to display the filename, etc.

> >- remove os.getcwdu()
> >- create os.getcwdb() -> bytes
> >- glob.glob() support bytes
> >- fnmatch.filter() support bytes
> >- posixpath.join() and posixpath.split() support bytes
>
> It sounds like maybe there should be some 2to3 fixers in here somewhere,
> too?

IMHO a programmer should not use bytes for filenames. Only specific programs 
used to fix a broken system (eg. convmv program), a backup program, etc. 
should use bytes. So the "default" type (type and not charset) for filenames 
should be str in Python3.

If my patch would be applied, 2to3 have to replace getcwdu() to getcwd(). 
That's all.

> Not necessarily as part of this patch, but somewhere related?  I 
> don't know what they would do, but it does seem quite likely that code
> which was previously correct under 2.6 (using bytes) would suddenly be
> mixing bytes and unicode with these APIs.

It looks like 2to3 convert all text '...' or u'...' to unicode (str). So 
converted programs will use str for filenames.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/