[XML-SIG] Re: HTML<->UTF-8 'codec'?

David Primmer dave@primco.org
Fri, 8 Mar 2002 02:29:09 -0800

Thank you! Yes. I didn't know you needed to use codecs.open() and I
figured it had something to do with the serialization. Looks like I have
to use codecs.open() or tack on .decode() everywhere I've used plain
open() calls. 

I don't remember seeing that function used anywhere. This seems like
pretty obscure stuff. I got the defaultencoding setting from an XML
tutorial that noted it as a solution for toxml() barfing all the time.

I worry that I won't have much support if I build my app around Unicode
data. Are there many examples of its use?  Any info on Unicode in python
other than the standard docs? I could only find this:


> -----Original Message-----
> From: Martin v. Loewis

> You try to write this into a file. This should normally not
> work, but you've changed the default encoding, so it unfortunately
> does: saving the Unicode object as UTF-8. Then you read it back as
> variable a, which is a byte string. This byte string happens to be
> three bytes (as defined in UTF-8).

And why doesn't python just know that it should read a Unicode file as
Unicode characters and not as a byte string? Is it the "rb"? Lowest
common denominator? This just seems wrong but maybe it's for backwards
compatibility or something. Seems like python "supports" Unicode and
maybe better than most but it's not fully integrated. Correct me if I'm
being too harsh.

> The reason is that you cannot cut-and-paste UTF-8 bytes using the
> Windows clipboard. You did not describe exactly how you performed the
> "cut'paste", but I assume you've used some UTF-8-unaware editor (or
> perhaps even "type" on a console), then copied the resulting
> characters. 

I sure thought the windows xp clipboard supported Unicode. I was using
plain old notepad, which has been Unicode aware for years and Microsoft
Word 2002, which is also Unicode. How can I paste back and forth between
these two if the clipboard doesn't support it? 

Thanks for clearing that up and getting me working again.