Overcoming python performance penalty for multicore CPU
steve at holdenweb.com
Thu Feb 4 04:50:19 CET 2010
John Nagle wrote:
> Paul Rubin wrote:
>> John Nagle <nagle at animats.com> writes:
>>> Analysis of each domain is
>>> performed in a separate process, but each process uses multiple
>>> threads to read process several web pages simultaneously.
>>> Some of the threads go compute-bound for a second or two at a time as
>>> they parse web pages.
>> You're probably better off using separate processes for the different
>> pages. If I remember, you were using BeautifulSoup, which while very
>> cool, is pretty doggone slow for use on large volumes of pages. I don't
>> know if there's much that can be done about that without going off on a
>> fairly messy C or C++ coding adventure. Maybe someday someone will do
> I already use separate processes for different domains. I could
> live with Python's GIL as long as moving to a multicore server
> doesn't make performance worse. That's why I asked about CPU dedication
> for each process, to avoid thrashing at the GIL.
I believe it's already been said that the GIL thrashing is mostly MacOS
specific. You might also find something in the affinity module
to ensure that each process in your pool runs on only one processor.
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/
More information about the Python-list