[Baypiggies] Handling unwanted Unicode \u2019 characters in XML
spmcinerney at hotmail.com
Wed Jul 2 01:24:19 CEST 2008
> Are you really sure you need this to be ASCII and not UTF-8? If so,
> why do need it to be true ASCII?
I want it to be ASCII so I can print it, and do regex matching.
Unless I need to move with the times, and start doing Unicode regexes as default.
But I'm using 2.5.2 so I'd really prefer to keep everything in ASCII-land.
It's a pain when you're debugging and print keeps throwing exceptions.
And on this case, the apostrophe was not Unicode to start with.
> > But the ASCII encoding of \u2019 is not very human-readable or useful:
> >>>> u'\u2019'.encode('utf-8')
> > '\xe2\x80\x99'
> That's UTF-8, not ASCII (there's a big difference), and you're seeing
> the repr() of the encoded string, which is of course an ugly escape
> If instead you print the encoded string, you get:
> >>> print u'\u2019'.encode('utf-8')
I don't get that, I get this: 'â' (does it depend on C locale settings? if so, that's not very satisfactory at all):
>>> print u'\u2019'.encode('utf-8')
Need to know now? Get instant answers with Windows Live Messenger.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Baypiggies