[Python-Dev] bytes / unicode

Tue Jun 22 22:19:45 CEST 2010

On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote:

> The thing that I have heard in passing from a couple of folks with
> experience in this area is that some older software in asia would
> present characters differently if they were originally encoded in a
> "japanese" encoding versus a "chinese" encoding, even though they were
> really "the same" characters.

As I tried to say in another post, that to me is similar to wanting to 
present English text is different fonts depending on whether spoken by 
an American or Brit, or a modern person versus a Renaissance person.

> I do know that Han Unification is a giant political mess
> (<http://en.wikipedia.org/wiki/Han_unification> makes for some

Thanks, I will take a look.

> interesting reading), but my understanding is that it has handled enough
> of the cases by now that one can write software to display asian
> languages and it will basically work with a modern version of unicode.
> (And of course, there's always the private use area, as Stephen Turnbull
> pointed out.)
>
> Regardless, this is another example where keeping around a string isn't
> really enough. If you need to display a japanese character in a distinct
> way because you are operating in the japanese *script*, you need a tag
> surrounding your data that is a hint to its presentation. The fact that
> these presentation hints were sometimes determined by their encoding is
> an unfortunate historical accident.

Yes. The asian languages I know anything about seems to natively have 
almost none of the symbols English has, many borrowed from math, that 
have been pressed into service for text markup.

-- 
Terry Jan Reedy