[XML-SIG] utf8 conversion issue
Tue, 04 Jun 2002 14:55:23 -0400
I recently discovered in PyXML 0.5.2 (more than a bit behind, I know)
that xml.unicode.utf8_iso.code_to_utf9() is returning incorrect values.
For example, the name "tørvåld" does not convert properly; it
*should* be "torvald" - the 'o' with a stroke, the 'a' with a ring
above. However, it gets mangled into "t\xc3\xb8rv\xc3\xa5ld". I took a
look at code_to_utf8() and noticed that it in turn calls utf8chr(),
which does a comparison to see if the ordinal passed in is <128.
Shouldn't it be <256??? Has anyone else wondered about or experienced this?
Also, I noticed that the line doing the actual conversion reads "return
chr(0xc0 | (c>>6)) + chr(0x80 | (c & 0x3f))" where c is the ordinal. I
almost hesitate to ask, but why is it even necessary to bit-or and
-shift? Especially when this seems to yield incorrect results? Am i just
Any input is greatly appreciated.