char 128? no... 256

Steven Taschuk staschuk at telusplanet.net
Wed Feb 12 14:40:55 EST 2003


Quoth Afanasiy:
  [...]
> Now, even encoding the 'latin-1', 8 bit, is problematic, because symbols
> which are 8 bit in Windows, such as the TradeMark symbol will not encode
> into 8 bit, as the ordinal value in the Unicode object is 8482.
> 
> This is hex 99 on a plain Windows 2000 install, I presume 'latin-1'.
> (Which is iso-8859-1 afaik) [...]

Latin-1 and ISO-8859-1 are indeed the same, but this character set
has no trade mark sign, at 0x99 or elsewhere.  (At 0xAE it has a
registered trade mark sign, a superscripted circled R, but not the
superscripted TM which U+2122 (= 8482 decimal) represents.)

In Unicode, U+0099 is a control character.

So your machine is using some other character set.

> [...] This will show up in webpages designated :
> 
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
>
> This will show up in notepad... and in my non-unicode text editors.

Presumably because your software is using not ISO-8859-1, but your
machine's character set, which has the trademark sign at 0x99.

It is worth noting that Windows has, I'm told, historically
assigned characters to unassigned positions in standard character
sets, and called the augmented, nonstandard character set by the
same name as its standard ancestor.  Needless to say, this causes
all manner of interoperability problems; you might be running up
against one side of such a problem.

> So how would I encode this Unicode character, 8482 so that it would
> show up as a TradeMark symbol on Windows 2000 machines. Windows 2000
> can display a TradeMark symbol in non Unicode applications.

Find out what character set your machine is actually using and use
an encoder for that character set.  Consult IANA to find out the
proper MIME charset name for that character set and use that name
in the content type declaration of your web pages.  (This will, of
course, cause interoperability problems if this mysterious
character set is not widely supported.)

If you only need the character in HTML, use '™' instead of
the literal character.  This yields U+2122 TRADE MARK SIGN as
desired.  It's also good and interoperable.

-- 
Steven Taschuk           |    w_w
staschuk at telusplanet.net | ,-= U
                         |  1 1    Moose





More information about the Python-list mailing list