[Baypiggies] Handling unwanted Unicode \u2019 characters in XML

Stephen McInerney spmcinerney at hotmail.com
Wed Jul 2 02:08:20 CEST 2008


Matt,

> > it seems like I really want a Unicode version of string.maketrans()  
> > and string.translate(), which is deprecated.
> 
> No, neither of those are deprecated.  The only deprecated functions in  
> the string module are the ones listed here:
> http://docs.python.org/lib/node42.html

Check that URL again: string.translate() IS deprecated, but string.maketrans() is not.
unicode.translate() is not deprecated.
However, unicode.translate() will not take the optional third argument 'deletechars'
which string.translate() did.
Some people have called for it to add this to be backwards-compatible.

So I can't see where to get the functionality I want.
For now, to get me unstuck, I wrote a Unicode regex search-and-replace and I
just iterate that over the entire input XML tree. Crude but gets me out of jail for now.

By the way, the XML is coming in via ElementTree's parse() method. I see some references
in Unicode tutorials to creating a custom codec in order to get the translate()
functionality, but ET doesn't have any hook for supporting that.

(PS Thanks for your article, but it seemed to be about converting from ASCII apostrophes
to Unicode ones, not the reverse, which is more tricky.)

Regards,
Stephen

_________________________________________________________________
The i’m Talkaton. Can 30-days of conversation change the world?
http://www.imtalkathon.com/?source=EML_WLH_Talkathon_ChangeWorld
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20080701/b0c1bb3e/attachment.htm>


More information about the Baypiggies mailing list