Character encoding

Wed Nov 8 10:24:54 EST 2006

mp wrote:
> I have html document titles with characters like >,  , and
> &#135. How do I decode a string with these values in Python?
>
> Thanks
>
>   
This is definitely the most FAQ. It comes up about once a week.

The stream-editing way is like this:

 >>> import SE
 >>> HTM_Decoder = SE.SE ('htm2iso.se') # Include path

>>> test_string = '''I have html document titles with characters like >,  , and
‡. How do I decode a string with these values in Python?'''
>>> print HTM_Decoder (test_string)
I have html document titles with characters like >,  , and
‡. How do I decode a string with these values in Python?

An SE object does files too.

>>> HTM_Decoder ('with_codes.txt', 'translated_codes.txt')  # Include path

You could download SE from -> http://cheeseshop.python.org/pypi/SE/2.3. The translation definitions file "htm2iso.se" is included. If you open it in your editor, you can see how to write your own definition files for other translation tasks you may have some other time.

Regards

Frederic