[Python-Dev] Bytes path support

Nikolaus Rath Nikolaus at rath.org
Wed Aug 27 03:39:35 CEST 2014


Nick Coghlan <ncoghlan at gmail.com> writes:
>>>> As some examples of where bilingual computing breaks down:
>>>>
>>>> * My NFS client and server may have different locale settings
>>>> * My FTP client and server may have different locale settings
>>>> * My SSH client and server may have different locale settings
>>>> * I save a file locally and send it to someone with a different locale
> setting
>>>> * I attempt to access a Windows share from a Linux client (or
> vice-versa)
>>>> * I clone my POSIX hosted git or Mercurial repository on a Windows
> client
>>>> * I have to connect my Linux client to a Windows Active Directory
>>>> domain (or vice-versa)
>>>> * I have to interoperate between native code and JVM code
>>>>
>>>> The entire computing industry is currently struggling with this
>>>> monolingual (ASCII/Extended ASCII/EBCDIC/etc) -> bilingual (locale
>>>> encoding/code pages) -> multilingual (Unicode) transition. It's been
>>>> going on for decades, and it's still going to be quite some time
>>>> before we're done.
>>>>
>>>> The POSIX world is slowly clawing its way towards a multilingual model
>>>> that actually works: UTF-8
>>>> Windows (including the CLR) and the JVM adopted a different
>>>> multilingual model, but still one that actually works: UTF-16-LE
>>
>>
>> Nick, I think the first half of your post is one of the clearest
> expositions yet of 'why Python 3' (in particular, the str to unicode
> change).  It is worthy of wider distribution and without much change, it
> would be a great blog post.
>
> Indeed, I had the same idea - I had been assuming users already understood
> this context, which is almost certainly an invalid assumption.
>
> The blog post version is already mostly written, but I ran out of weekend.
> Will hopefully finish it up and post it some time in the next few days
> :)

In that case, maybe it'd be nice to also explain why you use the term
"bilingual" for codepage based encoding. At least to me, a
codepage/locale is pretty monolingual, or alternatively covering a whole
region (e.g. western europe). I figure with bilingual you mean ascii +
something, but that's mostly a guess from my side.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«


More information about the Python-Dev mailing list