Python and decimal character entities over 128.

Ben Finney bignose+hates-spam at benfinney.id.au
Thu Jul 10 19:50:31 EDT 2008


I don't have an answer for why Python might be mis-handling the data,
but wanted to make a factual correction:

bsagert at gmail.com writes:

> Some web feeds use decimal character entities that seem to confuse
> Python (or me). For example, the string "doesn't" may be coded as
> "doesn’t" which should produce a right leaning apostrophe.

That character isn't a "right leaning apostrophe"; it has nothing to
do with apostrophes. It is the character called "right single
quotation mark" in <URL:http://www.w3.org/TR/html4/sgml/entities.html>
and in Unicode (code point U+2019).

It's a typographical error to use a quotation mark as an apostrophe.
Use the apostrophe character (U+0027) where an apostrophe is intended,
and quotation mark characters where those are intended.

This is directed, of course, at the person generating that output.

-- 
 \        “If you go to a costume party at your boss's house, wouldn't |
  `\     you think a good costume would be to dress up like the boss's |
_o__)                          wife? Trust me, it's not.” —Jack Handey |
Ben Finney



More information about the Python-list mailing list