[XML-SIG] HTML<->UTF-8 'codec'?
M.-A. Lemburg
mal@lemburg.com
Mon, 22 Oct 2001 15:50:47 +0200
Bill Janssen wrote:
>
> Perhaps you'd be kind enough to review my sample code at
> ftp://ftp.parc.xerox.com/transient/janssen/htmlcodec.py, and advise of
> glaring errors or any interesting improvements that occur to you?
>
> Thanks in advance!
Here are some comments:
First of all, you are encoding Unicode to an 8-bit string, right ?
If so, then you don't need to use Unicode for output.
def encode(self,input,errors='strict'):
output = u''
i = 0
input_len = len(input)
while (i < input_len):
if ord(input[i]) > 0x7F:
output = output + u'&#' + unicode(str(ord(input[i]))) + u';'
Wouldn't this be easier: u"&#%i;" % ord(input[i]) ?!
else:
output = output + unicode(input[i])
i = i + 1
return (str(output), len(output))
This should be return (str(output), i) -- (returnvalue, bytes_consumed).
Same for decode().
A note about the search function: if you give the codec module
a name like 'html_utf_8.py' then you can have the search function
in encodings/__init__.py find it.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/