Python usage numbers
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Feb 12 17:30:32 EST 2012
On Sun, 12 Feb 2012 05:11:30 -0600, Andrew Berg wrote:
> On 2/12/2012 3:12 AM, Steven D'Aprano wrote:
>> NTFS by default uses the UTF-16 encoding, which means the actual bytes
>> written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading
>> byte-order mark \xff\xfe).
>
> That's what I meant. Those bytes will be interpreted consistently across
> all locales.
Right. But, that's not Unicode, it is an encoding of Unicode. Terminology
is important -- if we don't call things by the "right" names (or at least
agreed upon names) how can we communicate?
>> Windows has two separate APIs, one for "wide" characters, the other for
>> single bytes. Depending on which one you use, the directory will appear
>> to be called Наӥв or 0å2.
>
> Yes, and AFAIK, the wide API is the default. The other one only exists
> to support programs that don't support the wide API (generally, such
> programs were intended to be used on older platforms that lack that
> API).
I'm not sure that "default" is the right word, since (as far as I know)
both APIs have different spelling and the coder has to make the choice
whether to call function X or function Y. Perhaps you mean that Microsoft
encourages the wide API and makes the single-byte API available for
legacy reasons?
>> But in any case, we're not talking about the file name encoding. We're
>> talking about the contents of files.
>
> Okay then. As I stated, this has nothing to do with the OS since
> programs are free to interpret bytes any way they like.
Yes, but my point was that even if the developer thinks he can avoid the
problem by staying away from "Unicode files" coming from Linux and OS-X,
he can't avoid dealing with multiple code pages on Windows.
You are absolutely correct that this is *not* a cross-platform issue to
do with the OS, but some people may think it is.
--
Steven
More information about the Python-list
mailing list