[Python-Dev] bytes / unicode
Terry Reedy
tjreedy at udel.edu
Tue Jun 22 22:19:45 CEST 2010
On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote:
> The thing that I have heard in passing from a couple of folks with
> experience in this area is that some older software in asia would
> present characters differently if they were originally encoded in a
> "japanese" encoding versus a "chinese" encoding, even though they were
> really "the same" characters.
As I tried to say in another post, that to me is similar to wanting to
present English text is different fonts depending on whether spoken by
an American or Brit, or a modern person versus a Renaissance person.
> I do know that Han Unification is a giant political mess
> (<http://en.wikipedia.org/wiki/Han_unification> makes for some
Thanks, I will take a look.
> interesting reading), but my understanding is that it has handled enough
> of the cases by now that one can write software to display asian
> languages and it will basically work with a modern version of unicode.
> (And of course, there's always the private use area, as Stephen Turnbull
> pointed out.)
>
> Regardless, this is another example where keeping around a string isn't
> really enough. If you need to display a japanese character in a distinct
> way because you are operating in the japanese *script*, you need a tag
> surrounding your data that is a hint to its presentation. The fact that
> these presentation hints were sometimes determined by their encoding is
> an unfortunate historical accident.
Yes. The asian languages I know anything about seems to natively have
almost none of the symbols English has, many borrowed from math, that
have been pressed into service for text markup.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list