[Python-Dev] "Fixing" the new GIL

Nir Aides nir at winpdb.org
Mon Apr 12 21:19:43 CEST 2010


At some point there was a loop, later it remained since I feel it is more
readable than a bunch of nested if-else clauses.
Should probably replace with do {...} while(0)

I was conditioned with electrical shocks in the dungeons of a corporate to
always use for loops.

I uploaded the patch to Rietveld for code review comments:
http://codereview.appspot.com/857049

Nir


On Mon, Apr 12, 2010 at 3:48 PM, Peter Portante
<peter.a.portante at gmail.com>wrote:

>  And why the for(;;) loop in bfs_schedule()? I don’t see a code path that
> would loop there. Perhaps I am missing it ...
>
> -peter
>
>
>
> On 4/12/10 8:37 AM, "Peter Portante" <peter.a.portante at gmail.com> wrote:
>
> Hmm, so I see in bfs_yield():
>
> +    if (tstate != NULL && bfs_thread_switch == tstate) {
> +        COND_RESET(tstate->bfs_cond);
> +        COND_WAIT(tstate->bfs_cond, bfs_mutex);
> +    }
>
> So, to me, the above code says, “Wait for the condition that tstate is
> either NULL, or bfs_thread_switch does not equal tstate”. So the predicate
> is: “(tstate != NULL && bfs_thread_switch == tstate)”.
>
> If the predicate is True before you call COND_WAIT() and True after you
> call COND_WAIT(), either you don’t need to call COND_WAIT() at all, or you
> need to loop until the predicate is False. There is no guarantee that a
> condition wait actually did anything at all. Yes, there will be spurious
> wake ups, but you can’t tell if the thread actually blocked and then woke
> up, or never blocked at all. If it never actually blocks, then that code
> path is not helpful.
>
> On Windows, before this loop in bfs_schedule():
>
> +        COND_RESET(tstate->bfs_cond);
> +        while (bfs_thread != tstate) {
> +            _bfs_timed_wait(tstate, timestamp);
> +            timestamp = get_timestamp();
> +        }
>
> You might want to avoid the call to reset the condition variable if the
> predicate is already False.
>
> -peter
>
>
> On 4/12/10 8:12 AM, "Nir Aides" <nir at winpdb.org> wrote:
>
> Hi Peter,
>
> There is no need for a loop in bfs_yield().
>
>
> On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante <
> peter.a.portante at gmail.com> wrote:
>
> Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/000095399/functions/pthread_cond_timedwait.html<
> http://www.opengroup.org/onlinepubs/000095399/functions/pthread_cond_timedwait.html>
>
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it quickly) of how to implement POSIX semantics on
> Windows:
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html <
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html>
>
>
> Before you can evaluate the effectiveness of any of the proposed scheduling
> schemes, the fundamental uses of mutexes and condition variables, and their
> implementations, must be sound.
>
> -peter
>
>
>
> On 4/11/10 6:50 PM, "Nir Aides" < nir at winpdb.org <http://nir@winpdb.org> >
> wrote:
>
> Hello all,
>
> I would like to kick this discussion back to life with a simplified
> implementation of the BFS scheduler, designed by the Linux kernel hacker Con
> Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt <
> http://ck.kolivas.org/patches/bfs/sched-BFS.txt>
>
> I submitted bfs.patch at  http://bugs.python.org/issue7946 <
> http://bugs.python.org/issue7946> . It is work in progress but is ready
> for some opinion.
>
> On my machine BFS gives comparable performance to gilinter, and seems to
> schedule threads more fairly, predictably, and with lower rate of context
> switching. Its basic design is very simple but nevertheless it was designed
> by an expert in this field, two characteristics which combine to make it
> attractive to this case.
>
> The problem addressed by the GIL has always been *scheduling* threads to
> the interpreter, not just controlling access to it, and therefore the GIL, a
> lock implemented as a simple semaphore was the wrong solution.
>
> The patches by Antoine and David essentially evolve the GIL into a
> scheduler, however both cause thread starvation or high rate of context
> switching in some scenarios:
>
> With Floren't write test ( http://bugs.python.org/issue7946#msg101120 <
> http://bugs.python.org/issue7946#msg101120> ):
> 2 bg threads, 2 cores set to performance, karmic, PyCon patch, context
> switching shoots up to 200K/s.
> 2 bg threads, 1 core, set to on-demand, karmic, idle machine, gilinter
> patch starves one of the bg threads.
> 4 bg threads, 4x1 core xeon, centos 5.3, gilinter patch, all bg threads
> starved, context switching shoots up to 250K/s.
>
> With UDP test ( http://bugs.python.org/file16316/udp-iotest.py <
> http://bugs.python.org/file16316/udp-iotest.py> ), add
> zlib.compress(b'GIL') to the workload:
> both gilinter and PyCon patches starve the IO thread.
>
> The BFS patch currently involves more overhead by reading the time stamp on
> each yield and schedule operations. In addition it still remains to address
> some issues related to timestamps such as getting different time stamp
> readings on different cores on some (older) multi-core systems.
>
> Any thoughts?
>
> Nir
>
>
>
> On Sun, Mar 14, 2010 at 12:46 AM, Antoine Pitrou < solipsis at pitrou.net <
> http://solipsis@pitrou.net> > wrote:
>
>
> Hello,
>
> As some of you may know, Dave Beazley recently exhibited a situation
> where the new GIL shows quite a poor behaviour (the old GIL isn't very
> good either, but still a little better). This issue is followed in
> http://bugs.python.org/issue7946 <http://bugs.python.org/issue7946>
>
> This situation is when an IO-bound thread wants to process a lot of
> incoming packets, while one (or several) CPU-bound thread is also
> running. Each time the IO-bound thread releases the GIL, the CPU-bound
> thread gets it and keeps holding it for at least 5 milliseconds
> (default setting), which limits the number of individual packets which
> can be recv()'ed and processed per second.
>
> I have proposed two mechanisms, based on the same idea: IO-bound
> threads should be able to steal the GIL very quickly, rather than
> having to wait for the whole "thread switching interval" (again, 5 ms
> by default). They differ in how they detect an "IO-bound threads":
>
> - the first mechanism is actually the same mechanism which was
>   embodied in the original new GIL patch before being removed. In this
>   approach, IO methods (such as socket.read() in socketmodule.c)
>   releasing the GIL must use a separate C macro when trying to get the
>   GIL back again.
>
> - the second mechanism dynamically computes the "interactiveness" of a
>   thread and allows interactive threads to steal the GIL quickly. In
>   this approach, IO methods don't have to be modified at all.
>
> Both approaches show similar benchmark results (for the benchmarks
> that I know of) and basically fix the issue put forward by Dave Beazley.
>
> Any thoughts?
>
> Regards
>
> Antoine.
>
>
> ______________________________ _________________
> Python-Dev mailing list
> Python-Dev at python.org <http://Python-Dev@python.org>
> http://mail.python.org/mailman/listinfo/python-dev <
> http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org <
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org>
>
>
>
> ------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <http://Python-Dev@python.org>
> http://mail.python.org/mailman/listinfo/python-dev <
> http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com<
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com>
>
>
>
>
> ------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100412/506cfd8c/attachment.html>


More information about the Python-Dev mailing list