[XML-SIG] Handling of character entity references

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Tue, 27 May 2003 19:50:27 +0900


Mike Brown <mike@skew.org> writes:
|
| I am trying to say that your application does not have to rely on your 
| 'char' tag hack under Python 2.x because you are now *able* to write it 
| in such a way that it doesn't do something foolish like "print content" 
| when content is a Unicode string and sys.stdout is an ASCII console. :)

I see.  Thank you for the elaboration.

| For example, if you change that print to
| 
| print ''.join([c.encode('ascii', 'ignore') or "&#%d;" % ord(c) for c in 
| content])
| 
| then you will at least be able to see it on your terminal, serialized 
| with all non-ASCII characters represented by NCRs.

Cool.

| If you were writing to a file rather than sys.stdout, you would want to 
| change the 'ascii' in the line to 'EUC-JP' or whatever.

FYI: In Japan, a terminal emulator called kterm is widely used
on X Window System.  The terminal emulator is fully capable of
EUC-JP as well as Shift_JIS and ISO-2022-JP, so that I'm able to
directly write Japanese byte strings to sys.stdout within Python
scripts and our lovely interactive Python interpreter ;-)

Best regards,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>