[Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtml
Fredrik Lundh
fredrik@pythonware.com
Mon, 5 Aug 2002 15:57:10 +0200
Oren Tirosh wrote:
> In its current form I find htmlentitydefs.py pretty useless.
I use it a lot, and find it reasonably useful. sure beats typing in
the HTML character tables myself, or writing a DTD parser.
> Names in the input in arbitrary case will not match the MixedCase
> keys in the entitydefs dictionary
people who use oddball characters may prefer to keep uppercase
letters separate from lowercase letters. if I type "Link=F6ping" using
a named entity, I don't want it to come out as "Link=D6ping".
if you don't care, nothing stops you from using the "lower" string
method.
> and the decimal character reference isn't really more useful than
> the named entity reference.
really? converting a decimal character reference to a unicode character
is trivial, but how do you convert a named entity reference to a unicode
character? (look it up in the htmlentitydefs?)
here's a trivial piece of code that converts the entitydefs dictionary to
a entity->unicode mapping:
entitydefs_unicode =3D {}
for entity, char in entitydefs.items():
if char[:2] =3D=3D "&#":
char =3D unichr(int(char[2:-1]))
else:
char =3D unicode(char, "iso-8859-1")
entitydefs_unicode[entity] =3D char
</F>