Overcoming python performance penalty for multicore CPU
stefan_ml at behnel.de
Mon Feb 8 09:59:42 CET 2010
Paul Rubin, 04.02.2010 02:51:
> John Nagle writes:
>> Analysis of each domain is
>> performed in a separate process, but each process uses multiple
>> threads to read process several web pages simultaneously.
>> Some of the threads go compute-bound for a second or two at a time as
>> they parse web pages.
> You're probably better off using separate processes for the different
> pages. If I remember, you were using BeautifulSoup, which while very
> cool, is pretty doggone slow for use on large volumes of pages. I don't
> know if there's much that can be done about that without going off on a
> fairly messy C or C++ coding adventure. Maybe someday someone will do
Well, if multi-core performance is so important here, then there's a pretty
simple thing the OP can do: switch to lxml.
More information about the Python-list