Jeremy was just playing with the xml.sax package, and decided to print the string returned from parsing "û" (the copyright symbol). Sure enough, he got a traceback:
print u'\251'
Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) and asked me about it. I was a little surprised myself. First, that anyone would use "print" in a SAX handler to start with, and second, that it was so painful. Now, I can chalk this up to not using a reasonable stdout that understands that Unicode needs to be translated to Latin-1 given my font selection. So I looked at the codecs module to provide a usable output stream. The EncodedFile class provides a nice wrapper around another file object, and supports both encoding both ways. Unfortunately, I can't see what "encoding" I should use if I want to read & write Unicode string objects to it. ;( (Marc-Andre, please tell me I've missed something!) I also don't think I can use it with "print", extended or otherwise. The PRINT_ITEM opcode calls PyFile_WriteObject() with whatever it gets, so that's fine. Then it converts the object using PyObject_Str() or PyObject_Repr(). For Unicode objects, the tp_str handler attempts conversion to the default encoding ("ascii" in this case), and raises the traceback we see above. Perhaps a little extra work is needed in PyFile_WriteObject() to allow Unicode objects to pass through if the file is merely file-like, and let the next layer handle the conversion? This would probably break code, and therefore not be acceptable. On the other hand, it's annoying that I can't create a file-object that takes Unicode strings from "print", and doesn't seem intuitive. -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member