[Python-Dev] a suggestion ... Re: PEP 383 (again)
Zooko O'Whielacronx
zooko at zooko.com
Tue Apr 28 21:50:55 CEST 2009
On Apr 28, 2009, at 13:01 PM, Thomas Breuel wrote:
> (2) Should the default UTF-8 encoder for file system operations be
> allowed to generate illegal byte sequences?
>
> I think that's a definite no; if I set the encoding for a device to
> UTF-8, I never want Python to try to write illegal UTF-8 strings to
> my device.
...
> If people really want the option of (3c), then I think encoders
> related to the file system should by default reject those strings
> as illegal because the potential problems from writing them are
> just too serious. Printing routines and UI routines could display
> them without error (but some clear indication), of course.
For what it is worth, sometimes we have to write bytes to a POSIX
filesystem even though those bytes are not the encoding of any string
in the filesystem's "alleged encoding". The reason is that it is
common for there to be filenames which are not the encodings of
anything in the filesystem's alleged encoding, and the user expects
my tool (Tahoe-LAFS [1]) to copy that name to a distributed storage
grid and then copy it back unchanged. Even though, I re-iterate,
that name is *not* a valid encoding of anything in the current encoding.
This doesn't argue that this behavior has to be the *default*
behavior, but it is sometimes necessary.
It's too bad that POSIX is so far behind Mac OS X in this respect.
(Also so far behind Windows, but I use Mac as the example to show how
it is possible to build a better system on top of POSIX.) Hopefully
David Wheeler's proposals to tighten the requirements in Linux
filesystems will catch on: [2].
Regards,
Zooko
[1] http://allmydata.org
[2] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
More information about the Python-Dev
mailing list