decode Numeric Character References to unicode
duncan.booth at invalid.invalid
Mon Feb 18 13:09:55 CET 2008
7stud <bbxx789_05ss at yahoo.com> wrote:
> On Feb 18, 4:53 am, 7stud <bbxx789_0... at yahoo.com> wrote:
>> On Feb 18, 3:20 am, William Heymann <k... at aesaeion.com> wrote:
>> > How do I decode a string back to useful unicode that has xml
>> > numeric cha
>> > references in it?
>> > Things like 占 #which is: &_#21344_; (without the
>> > underscores)
>> BeautifulSoup can handle two of the three formats for html entities.
>> For instance, an 'o' with umlaut can be represented in three
>> different ways:
> lol. It's hard to even make posts about this stuff because html
> entities get converted by the forum software. Here are the three
> different formats for an 'o with umlaut' with some underscores added
> to keep the forum software from rendering the characters:
FWIW, your original post was fine, it was just the quoted text in your
followup that was wrong.
I guess that is yet another reason to use a real newsreader or the mailing
list rather than Google Groups.
More information about the Python-list