[Python-Dev] Python-3.0, unicode, and os.environ

Adam Olsen rhamph at gmail.com
Fri Dec 12 07:19:27 CET 2008


On Thu, Dec 11, 2008 at 10:41 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> Adam Olsen wrote:
>> On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>> Unfortunately, even programmers experienced in I18N like Martin, and
>>> those with intuition-that-has-the-force-of-law<wink> like Guido,
>>> express deliberate disbelief on this point.  They say that filesystem
>>> names and environment variable values are text, which is true from the
>>> semantic viewpoint but can't be fully supported by any implementation.
>>
>> With all the focus on backup tools and file managers I think we've
>> lost perspective.  They're an important use case, but hardly the
>> dominant one.
>>
>> Please, as a user, if your app is creating new files, do NOT use
>> bytes!  You have no excuse for creating garbage, and garbage doesn't
>> help the user any.  Getting the encoding right, use the unicode APIs,
>> and don't pass the buck on to everything else.
>>
> Uhmmm.... That's good advice but doesn't solve any problems :-(.  No
> matter what I create, the filenames will be bytes when the next person
> reads them in.  If my locale is shift-js and the person I'm sharing the
> file with uses utf-8 things won't work.  Even if my locale is utf-8
> (since I come from a European nation) and their locale is utf-16
> (because they're from an Asian nation) the Unicode API won't work.

So you'll open up the dir and find this collection:

??????.txt
????????.png
???????.html
????????.html
???.png
??????.txt
??????.txt
??????.txt

A half-broken setup is still a broken setup.  Eventually you have to
tell people to stop screwing around and pick one encoding.

I doubt that UTF-16 is used very much (other than on windows).  I
haven't found any statistics on what distros use, but did find this
one of the web itself:
http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html

I can't wait for next year's statistics.

-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list