Python and decimal character entities over 128.
bignose+hates-spam at benfinney.id.au
Fri Jul 11 01:50:31 CEST 2008
I don't have an answer for why Python might be mis-handling the data,
but wanted to make a factual correction:
bsagert at gmail.com writes:
> Some web feeds use decimal character entities that seem to confuse
> Python (or me). For example, the string "doesn't" may be coded as
> "doesn’t" which should produce a right leaning apostrophe.
That character isn't a "right leaning apostrophe"; it has nothing to
do with apostrophes. It is the character called "right single
quotation mark" in <URL:http://www.w3.org/TR/html4/sgml/entities.html>
and in Unicode (code point U+2019).
It's a typographical error to use a quotation mark as an apostrophe.
Use the apostrophe character (U+0027) where an apostrophe is intended,
and quotation mark characters where those are intended.
This is directed, of course, at the person generating that output.
\ “If you go to a costume party at your boss's house, wouldn't |
`\ you think a good costume would be to dress up like the boss's |
_o__) wife? Trust me, it's not.” —Jack Handey |
More information about the Python-list