[Python-Dev] PEP 277 (unicode filenames): please review

Martin v. Loewis martin@v.loewis.de
13 Aug 2002 00:34:54 +0200

Jack Jansen <Jack.Jansen@oratrix.com> writes:

> This is the bit I still don't like (at least, if I'm not mistaken I
> commented on it a while ago too). A routine could be doing an
> os.list() expecting strings, but suddenly someone passes it a
> unicode directoryname and the return value would change.

Sure, but within reasonable limitations, "nothing bad" would happen:
those file names most likely use only ASCII, so the default encoding
treats them nicely whereever they appear.

> I would much prefer an optional encoding argument whereby you give the
> encoding in which you want the return value. Default would be the
> local filesystem encoding. If you pass unicode you will get direct
> unicode on XP/2K, and a converted string on other platforms (but
> always unicode).

I would not like that. First of all, it isn't any more portable than
PEP 277: on Unix, to implement that feature, you'll have to know the
encoding of filenames on disk first - which alone is tricky.

Furthermore, it is easy to implement that on top of PEP 277: just
write a wrapper than encodes the result.

> Oh yes, the same reasoning would hold for readlink(), getcwd() and
> any other call that returns filenames.

These are more tricky, indeed. Fortunately, they are not in the domain
of PEP 277: readlink is not supported on Windows, and getcwd not
considered in the PEP. If that is an issue, I'd add a "return_unicode"
flag to getcwd.

Allowing the application to specify an encoding at the file system API
is not really helpful, as the encoding at the file system API is
usually mandated by the application.