[Python-Dev] Filename as byte string in python 2.6 or 3.0?
glyph at divmod.com
glyph at divmod.com
Wed Oct 1 03:27:26 CEST 2008
On 30 Sep, 09:37 pm, guido at python.org wrote:
>On Tue, Sep 30, 2008 at 11:42 AM, <glyph at divmod.com> wrote:
>>There are other ways to glean this knowledge; for example, looking at
>>the
>>'iocharset' or 'nls' mount options supplied to mount various
>>filesystems.
>I know we could do a better job, but absent anyone who knows what
>they're doing we've chosen a fairly conservative approach. I certainly
>hope that someone will contribute some mean encoding-guessing code to
>the stdlib that users can use. I'm not sure if I'll ever endorse doing
>this automatically in io.open(), though I'd be fine with a convention
>like passing encoding="guess".
I think the conservative approach is actually correct, or rather, as
close to correct as it is possible to get in this mess. Inspecting
these fantastically obscure options is only likely to be helpful in a
tool which tries to correct filesystem encoding errors on legacy data.
I wouldn't even know about them if I hadn't written several such tools
(well, just little scripts, really) in the past. I was just verifying
that I wasn't missing some "right way" which would let someone else do
the guesswork for me.
In reality, you have two options for filesystem encoding on Linux:
* UTF-8
* fall in a well and die
The OS will happily let you create a completely nonsensical environment
where no application can possibly do anything reasonable: set LC_ALL to
KOI8R, mount your USB keychain as Shift_JIS and your windows partition
as ISO-8859-8. Of course nobody would actually _do_ this, because they
want things to work, so everything is gradually evolving to a default of
UTF-8 everywhere. In practice, however, there are still problems with
CIFS/SMB shares where other clients have different ideas about encoding.
I've experienced this most commonly when sharing with Macs, which have
very particular and different ideas about normalization, as has already
been discussed in this thread.
More information about the Python-Dev
mailing list