Replacing utf-8 characters

Klaus Alexander Seistrup klaus at
Wed Oct 5 22:14:04 CEST 2005

Mike wrote:

> Hi, I am using Python to scrape web pages and I do not have problem 
> unless I run into a site that is utf-8.  It seems & is changed to 
> & when the site is utf-8.
> 	[...]

> Any ideas?

How about using the universal feedparser from to fetch 
and parse the RSS from Reuters?  That's what I do and it works like a 


>>> import feedparser
>>> rss = feedparser.parse('')
>>> for what in ('link', 'title', 'summary'):
...     print rss.entries[0][what]
...     print

Top court seems closely divided on suicide law

During arguments, the justices sharply questioned both sides on whether then-Attorney General John Ashcroft had the power under federal law in 2001 to bar distribution of controlled drugs to assist suicides, regardless of state law.



Klaus Alexander Seistrup
Magnetic Ink, Copenhagen, Denmark

More information about the Python-list mailing list