Help me optimize my feed script.

Carl Banks pavlovevidence at gmail.com
Thu Jun 26 23:19:46 CEST 2008


On Jun 26, 3:30 pm, bsag... at gmail.com wrote:
> I wrote my own feed reader using feedparser.py but it takes about 14
> seconds to process 7 feeds (on a windows box), which seems slow on my
> DSL line. Does anyone see how I can optimize the script below? Thanks
> in advance, Bill
>
> # UTF-8
> import feedparser
>
> rss = [
> 'http://feeds.feedburner.com/typepad/alleyinsider/
> silicon_alley_insider',
> 'http://www.techmeme.com/index.xml',
> 'http://feeds.feedburner.com/slate-97504',
> 'http://rss.cnn.com/rss/money_mostpopular.rss',
> 'http://rss.news.yahoo.com/rss/tech',
> 'http://www.aldaily.com/rss/rss.xml',
> 'http://ezralevant.com/atom.xml'
> ]
> s = '<html>\n<head>\n<title>C:/x/test.htm</title>\n'
>
> s += '<style>\n'\
>      'h3{margin:10px 0 0 0;padding:0}\n'\
>      'a.x{color:black}'\
>      'p{margin:5px 0 0 0;padding:0}'\
>      '</style>\n'
>
> s += '</head>\n<body>\n<br />\n'
>
> for url in rss:
>         d = feedparser.parse(url)
>         title = d.feed.title
>         link = d.feed.link
>         s += '\n<h3><a href="'+ link +'" class="x">'+ title +'</a></h3>\n'
>         # aldaily.com has weird feed
>         if link.find('aldaily.com') != -1:
>                 description = d.entries[0].description
>                 s += description + '\n'
>         for x in range(0,3):
>                 if link.find('aldaily.com') != -1:
>                         continue
>                 title = d.entries[x].title
>                 link = d.entries[x].link
>                 s += '<a href="'+ link +'">'+ title +'</a><br />\n'
>
> s += '<br /><br />\n</body>\n</html>'
>
> f = open('c:/scripts/myFeeds.htm', 'w')
> f.write(s)
> f.close
>
> print
> print 'myFeeds.htm written'

Using the += operator on strings is a common bottleneck in programs.
First thing you should try is to get rid of that.  (Recent versions of
Python have taken steps to optimize it, but still it sometimes doesn't
work, such as if you have more than one reference to the string
alive.)

Instead, create a list like this:

s = []

And append substrings to the list, like this:

s.append('</head>\n<body>\n<br />\n')

Then, when writing the string out (or otherwise using it), join all
the substrings with the str.join method:

f.write(''.join(s))


Carl Banks



More information about the Python-list mailing list