[XML-SIG] unicode problems in elementtree

David Stanek dstanek at dstanek.com
Sat May 27 00:34:29 CEST 2006


On Fri, May 26, 2006 at 09:22:41PM +0100, Bryan Lawrence wrote:
> 
> Does elementtree and/or expat need to know the encoding to get this right? 
> (which may be a problem coz this could be from anyone's document in any 
> encoding ...)
> 

I think you will have to tell elementtree what encoding your XML is
in. Otherwise how would it know? I am sure there is a better way,
but I have seen people try to guess encodings like:

  # untested and from my bad memory :-)
  encodings = ['utf-8', 'utf-16',i 'iso-8859-1',]
  for encoding in encodings:
      try:
          unicode(s, encoding)
      except UnicodeError:
          pass
      else:
          break

The encodings list would be a list of common encodings that you may
expect. Again there must be a better way to do this... I would
suggest that you try to set a standard for encodings.

David Stanek

-- 
http://www.traceback.org

GPG keyID #6272EDAF on http://pgp.mit.edu
Key fingerprint = 8BAA 7E11 8856 E148 6833  655A 92E2 3E00 6272 EDAF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20060526/34a02057/attachment.pgp 


More information about the XML-SIG mailing list