Re: [Python-Dev] Summing up
Antoine, This is a pretty good summary that mirrors my thoughts on the GIL matter as well. In the big picture, I do think it's desirable for Python to address the multicore performance issue--namely to not have the performance needlessly thrashed in that environment. The original new GIL addressed this. The I/O convoy effect problem is more subtle. Personally, I think it's an issue that at least merits further study because trying to overlap I/O with computation is a known programming technique that might be useful for people using Python to do message passing, distributed computation, etc. As an example, the multiprocessing module uses threads as part of its queue implementation. Is it impacted by convoying? I honestly don't know. I agree that getting some more real-world experience would be useful. Cheers, Dave
From: Antoine Pitrou
Ok, this is a good opportunity to try to sum up, from my point of view.
The main problem of the old GIL, which was evidenced in Dave's original study (not this year's, but the previous one) *is* fixed unless someone demonstrates otherwise.
It should be noted that witnessing a slight performance degradation on a multi-core machine is not enough to demonstrate such a thing. The degradation could be caused by other factors, such as thread migration, bad OS behaviour, or even locking peculiarities in your own application, which are not related to the GIL. A good test is whether performance improves if you play with sys.setswitchinterval().
Dave's newer study regards another issue, which I must stress is also present in the old GIL algorithm, and therefore must have affected, if it is serious, real-world applications in 2.x. And indeed, the test I recently added to ccbench evidences the huge drop in socket I/Os per second when there's a background CPU thread; this test exercises the same situation as Dave's demos, only with a less trivial CPU workload:
== CPython 2.7b2+.0 (trunk:81274M) == == x86_64 Linux on 'x86_64' ==
--- I/O bandwidth ---
Background CPU task: Pi calculation (Python)
CPU threads=0: 23034.5 packets/s. CPU threads=1: 6.4 ( 0 %) CPU threads=2: 15.7 ( 0 %) CPU threads=3: 13.9 ( 0 %) CPU threads=4: 20.8 ( 0 %)
(note: I've just changed my desktop machine, so these figures are different from what I've posted weeks or months ago)
Regardless of the fact that apparently noone reported it in real-world conditions, we *could* decide that the issue needs fixing. If we decide so, Nir's approach is the most rigorous one: it tries to fix the problem thoroughly, rather than graft an additional heuristic. Nir also has tested his patch on a variety of machines, more so than Dave and I did with our own patches; he is obviously willing to go forward.
Right now, there are two problems with Nir's proposal:
- first, what Nick said: the difficulty of having reliable high-precision cross-platform time sources, which are necessary for the BFS algorithm. Ironically, timestamp counters have their own problems on multi-core machines (they can go out of sync between CPUs). gettimeofday() and clock_gettime() may be precise enough on most Unices, though.
- second, the BFS algorithm is not that well-studied, since AFAIK it was refused for inclusion in the Linux kernel; someone in the python-dev community would therefore have to make sense of, and evaluate, its heuristic.
I also don't consider my own patch a very satisfactory "solution", although it has the reassuring quality of being simple and short (and easy to revert!).
That said, most of us are programmers and we love to invent ways of fixing technical issues. It sometimes leads us to consider some things issues even when they are mostly theoretical. This is why I am lukewarm on this. I think interested people should focus on real-world testing (rather than Dave and I's synthetic tests) of the new GIL, with or without the various patches, and share the results.
Otherwise, Dj Gilcrease's suggestion of waiting for third-party reports is also a very good one.
Regards
Antoine.
On 19/05/10 10:35, David Beazley wrote:
Antoine,
This is a pretty good summary that mirrors my thoughts on the GIL matter as well. In the big picture, I do think it's desirable for Python to address the multicore performance issue--namely to not have the performance needlessly thrashed in that environment. The original new GIL addressed this.
The I/O convoy effect problem is more subtle. Personally, I think it's an issue that at least merits further study because trying to overlap I/O with computation is a known programming technique that might be useful for people using Python to do message passing, distributed computation, etc. As an example, the multiprocessing module uses threads as part of its queue implementation. Is it impacted by convoying? I honestly don't know. I agree that getting some more real-world experience would be useful.
My takeaway from this discussion is that: A. we should leave the new GIL in 3.2 in its current (relatively) simple form for now, keeping the various patches in issue 7946 in our back pocket if someone finds real world examples of the convoying effect discussed there. The idea here being that we shouldn't complicate the implementation without some solid evidence that doing so is actually necessary for real world workloads. B. some more thought should be given to incorporating the new GIL into 2.7. However, this requires two things: - an update to the patch in 7753 to either retain the old GIL for platforms not supported by the new GIL or else to make the new GIL a configure option - Benjamin accepting that patch (as it would likely mean adding another beta release to the cycle) In the absence of an updated version of the 7753 patch, backporting the new GIL to 2.7 isn't really a serious option. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan
B. some more thought should be given to incorporating the new GIL into 2.7. However, this requires two things: - an update to the patch in 7753 to either retain the old GIL for platforms not supported by the new GIL or else to make the new GIL a configure option - Benjamin accepting that patch (as it would likely mean adding another beta release to the cycle)
I think I agree with that, at least for 2.7.0. My fallback plan is to write an extension that does the thread affinity hack, monkeypatching threading.py, only for OS X. People like me who are really annoyed with the continued presence of the multicore bug can just add that to site.py or sitecustomize.py (does that still work?). I'd like to find out first whether you can actually change the thread affinity on OS X after the thread has been started.
In the absence of an updated version of the 7753 patch, backporting the new GIL to 2.7 isn't really a serious option.
Right. At least, not for this revision level. Bill
participants (3)
-
Bill Janssen
-
David Beazley
-
Nick Coghlan