Overcoming python performance penalty for multicore CPU
Paul Rubin
no.email at nospam.invalid
Wed Feb 3 20:51:36 EST 2010
John Nagle <nagle at animats.com> writes:
> Analysis of each domain is
> performed in a separate process, but each process uses multiple
> threads to read process several web pages simultaneously.
>
> Some of the threads go compute-bound for a second or two at a time as
> they parse web pages.
You're probably better off using separate processes for the different
pages. If I remember, you were using BeautifulSoup, which while very
cool, is pretty doggone slow for use on large volumes of pages. I don't
know if there's much that can be done about that without going off on a
fairly messy C or C++ coding adventure. Maybe someday someone will do
that.
More information about the Python-list
mailing list