[Python-Dev] "Fixing" the new GIL

Tue Mar 16 10:15:44 CET 2010

On 16Mar2010 08:59, "Martin v. Löwis" <martin at v.loewis.de> wrote:
| Cameron Simpson wrote:
| > On 15Mar2010 09:28, Martin v. L�wis <martin at v.loewis.de> wrote:
| > | > As for the argument that an application with cpu intensive work being
| > | > driven by the IO itself will work itself out...  No it won't, it can
| > | > get into beat patterns where it is handling requests quite rapidly up
| > | > until one that causes a long computation to start comes in.  At that
| > | > point it'll stop performing well on other requests for as long (it
| > | > could be a significant amount of time) as the cpu intensive request
| > | > threads are running.  That is not a graceful degration in serving
| > | > capacity / latency as one would normally expect.  It is a sudden drop
| > | > off.
| > | 
| > | Why do you say that? The other threads continue to be served - and
| > | Python couldn't use more than one CPU, anyway. Can you demonstrate that
| > | in an example?
| > 
| > Real example:
| 
| ... unfortunately without a demonstration. What's the throughput under
| the old GIL? What's the throughput of this application under the new
| GIL? How can I observe the "beat pattern"?

You can't. This isn't an example where I am personally bitten by the GIL.
I may be, but there's plenty of other stuff in terms of tuning or
optimisation in my particular app before I get anywhere near the GIL.

My point is not that it's biting me right now but that earlier you
said the scenario was probably unrealistic, but I have an app which can
probably exhibit exactly the issue under discussion.

| > The idea here is that one has a few threads receiving requests (eg a
| > daemon watching a socket or monitoring a db queue table) which then use
| > the FuncMultiQueue to manage how many actual requests are processed
| > in parallel (yes, a semaphore can cover a lot of this, but not the
| > asynchronous call modes).
| 
| Why do you say processing is in parallel?
| In Python, processing is
| normally never in parallel, but always sequential (potentially
| interleaving). Are you releasing the GIL for processing?

I mean that I can have multiple actual requests being served, and they
are _not_ handled sequentially - the multi queue is to permit more than
one to be in progress at once so that an expensive request need not
inherently delay a following cheap request.

And in my it's-a-filesystem mode there will often be a bottleneck in the
pure-python data scanning part. And thus the CPU bound python active
while still accepting requests in I/O blocked threads.

| > So, suppose a couple of CPU-intensive callables get queued which work for a
| > substantial time, and meanwhile a bunch of tiny tiny cheap requests arrive.
| > Their timely response will be impacted by this issue.
| 
| By how much exactly? What operating system?

Can't tell you - I'm not claiming the current GIL behaviour is biting me
right now; I'm saying your claim that the scenario is unrealistic is a
bit unfair; my app wil probably be just the kind of app to be affected,
even only subtly. And it can hardly be alone in the "daemon" world.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

There's two kinds of climbers...smart ones, and dead ones.      - Don Whillans