[Python-Dev] Filename as byte string in python 2.6 or 3.0?

glyph at divmod.com glyph at divmod.com
Wed Oct 1 03:27:26 CEST 2008

On 30 Sep, 09:37 pm, guido at python.org wrote:
>On Tue, Sep 30, 2008 at 11:42 AM,  <glyph at divmod.com> wrote:
>>There are other ways to glean this knowledge; for example, looking at 
>>'iocharset' or 'nls' mount options supplied to mount various 

>I know we could do a better job, but absent anyone who knows what
>they're doing we've chosen a fairly conservative approach. I certainly
>hope that someone will contribute some mean encoding-guessing code to
>the stdlib that users can use. I'm not sure if I'll ever endorse doing
>this automatically in io.open(), though I'd be fine with a convention
>like passing encoding="guess".

I think the conservative approach is actually correct, or rather, as 
close to correct as it is possible to get in this mess.  Inspecting 
these fantastically obscure options is only likely to be helpful in a 
tool which tries to correct filesystem encoding errors on legacy data. 
I wouldn't even know about them if I hadn't written several such tools 
(well, just little scripts, really) in the past.  I was just verifying 
that I wasn't missing some "right way" which would let someone else do 
the guesswork for me.

In reality, you have two options for filesystem encoding on Linux:

  * UTF-8
  * fall in a well and die

The OS will happily let you create a completely nonsensical environment 
where no application can possibly do anything reasonable: set LC_ALL to 
KOI8R, mount your USB keychain as Shift_JIS and your windows partition 
as ISO-8859-8.  Of course nobody would actually _do_ this, because they 
want things to work, so everything is gradually evolving to a default of 
UTF-8 everywhere.  In practice, however, there are still problems with 
CIFS/SMB shares where other clients have different ideas about encoding. 
I've experienced this most commonly when sharing with Macs, which have 
very particular and different ideas about normalization, as has already 
been discussed in this thread.

More information about the Python-Dev mailing list