Python and decimal character entities over 128.

Ben Finney bignose+hates-spam at
Thu Jul 10 19:50:31 EDT 2008

I don't have an answer for why Python might be mis-handling the data,
but wanted to make a factual correction:

bsagert at writes:

> Some web feeds use decimal character entities that seem to confuse
> Python (or me). For example, the string "doesn't" may be coded as
> "doesn’t" which should produce a right leaning apostrophe.

That character isn't a "right leaning apostrophe"; it has nothing to
do with apostrophes. It is the character called "right single
quotation mark" in <URL:>
and in Unicode (code point U+2019).

It's a typographical error to use a quotation mark as an apostrophe.
Use the apostrophe character (U+0027) where an apostrophe is intended,
and quotation mark characters where those are intended.

This is directed, of course, at the person generating that output.

 \        “If you go to a costume party at your boss's house, wouldn't |
  `\     you think a good costume would be to dress up like the boss's |
_o__)                          wife? Trust me, it's not.” —Jack Handey |
Ben Finney

More information about the Python-list mailing list