[docs] [issue20906] Issues in Unicode HOWTO
report at bugs.python.org
Tue Mar 18 22:38:21 CET 2014
Antoine Pitrou added the comment:
> Agreed. How about "In documentation such as the current article..."
It's better, but how about simply "In this article"?
> I concur with reducing unnecessary abstraction. No sure what you mean
> by "true form". Do you mean show the glyph which the code point
> represents? Or the sequence of bytes? Or display the code point value
> in decimal?
I mean the glyph.
> In the older schemes, "encoding" referred to the one mapping: chars <-->
> numbers in particular binary format. In Unicode, "encoding" refers only to
> the mapping: code point numbers <--> binary format. It does not refer to
> the chars <--> code point mapping. (At least, I think that's the case.
> Regardless, the two mappings need to be rigorously distinguished.)
This is true, but in this HOWTO's context the term "code system" is a confusing distraction, IMHO. For all intents and purposes, iso-8859-1 and friends *are* encodings (and this is how Python actually names them).
> On review, there are many points in the article that muddy this up. For
> example, "Unicode started out using 16-bit characters instead of 8-bit
> characters". Saying "so-an-so-bit characters" about Unicode, in the
> current article, is either wrong, or very confusing.
So it should say "16-bit code points" instead, right?
> The subject of one-chararacter-to-one-code mapping is important
> (normalization etc), though perhaps beyond the current article. But I
> think the article should avoid suggesting that many-to-one or one-to-many
> scenarios are common.
Python tracker <report at bugs.python.org>
More information about the docs