Python usage numbers
steve+comp.lang.python at pearwood.info
Sun Feb 12 10:12:57 CET 2012
On Sun, 12 Feb 2012 01:05:35 -0600, Andrew Berg wrote:
> On 2/12/2012 12:10 AM, Steven D'Aprano wrote:
>> It's not just UTF8 either, but nearly all encodings. You can't even
>> expect to avoid problems if you stick to nothing but Windows, because
>> Windows' default encoding is localised: a file generated in (say)
>> Israel or Japan or Germany will use a different code page (encoding) by
>> default than one generated in (say) the US, Canada or UK.
> Generated by what? Windows will store a locale value for programs to
> use, but programs use Unicode internally by default
Which programs? And we're not talking about what they use internally, but
what they write to files.
> (i.e., API calls are
> Unicode unless they were built for old versions of Windows), and the
> default filesystem (NTFS) uses Unicode for file names.
No. File systems do not use Unicode for file names. Unicode is an
abstract mapping between code points and characters. File systems are
written using bytes.
Suppose you're a fan of Russian punk bank Наӥв and you have a directory
of their music. The file system doesn't store the Unicode code points
1053 1072 1253 1074, it has to be encoded to a sequence of bytes first.
NTFS by default uses the UTF-16 encoding, which means the actual bytes
written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading
byte-order mark \xff\xfe).
Windows has two separate APIs, one for "wide" characters, the other for
single bytes. Depending on which one you use, the directory will appear
to be called Наӥв or 0å2.
But in any case, we're not talking about the file name encoding. We're
talking about the contents of files.
> AFAIK, only the
> terminal has a localized code page by default. Perhaps Notepad will
> write text files with the localized code page by default, but that's an
> application choice...
Exactly. And unless you know what encoding the application chooses, you
will likely get an exception trying to read the file.
More information about the Python-list