[Python-Dev] "Fixing" the new GIL

Nir Aides nir at winpdb.org
Tue Apr 13 00:43:27 CEST 2010


There is no need for the "forced" switch to be guaranteed. The motivation
for the wait is to increase throughput and decrease latency when OS
schedules next thread to run on same core as yielding thread but yielding
thread is about to continue running into CPU intensive library code. An
occasional spurious wakeup resulting in missed switch will not affect
that. Note
similar logic in new GIL.

I will add the loop to keep the code clean.

Thanks,
Nir



On Mon, Apr 12, 2010 at 10:49 PM, Peter Portante <peter.a.portante at gmail.com
> wrote:

>  That code will not switch to another thread if the condition variable
> either does not actually block the thread on the call (which is allowed by
> the standard to give implementations some flexibility for making it work
> correctly – read the standard reasoning for more information), or the thread
> is woken up without a predicate change (also allowed by the standard for
> similar reasons). Both of those cases are called “spurious wake-ups”.
>
> You may not be able to readily get your implementation to behavior that
> way, but in the wild, we need to account for this behavior because Cpython
> will be run on systems where it will happen. :)
>
> -peter
>
>
>
> On 4/12/10 3:36 PM, "Nir Aides" <nir at winpdb.org> wrote:
>
> Please describe clearly a step by step scenario in which that code will
> fail.
>
>
> On Mon, Apr 12, 2010 at 10:25 PM, Peter Portante <
> peter.a.portante at gmail.com> wrote:
>
> Yes, but unless you loop until the predicate is False, no forced switch is
> guaranteed to occur. You might as well remove the code. If you want to keep
> the code as is, call me when you need a life guard to help debug mystifying
> behaviors. ;) -peter
>
>
>
> On 4/12/10 3:17 PM, "Nir Aides" <nir at winpdb.org <http://nir@winpdb.org> >
> wrote:
>
> The loop-less wait is similar to the one in new GIL. It is used to force a
> switch to next thread in particular scenario and the motivation is explained
> in comment to another if clause a few lines up. Those two if clauses can be
> joined though.
>
>
> On Mon, Apr 12, 2010 at 3:37 PM, Peter Portante <
> peter.a.portante at gmail.com <http://peter.a.portante@gmail.com> > wrote:
>
> Hmm, so I see in
> bfs_yield():
>
> +    if (tstate != NULL && bfs_thread_switch == tstate) {
> +        COND_RESET(tstate->bfs_cond);
> +        COND_WAIT(tstate->bfs_cond, bfs_mutex);
> +    }
>
> So, to me, the above code says, “Wait for the condition that tstate is
> either NULL, or bfs_thread_switch does not equal tstate”. So the predicate
> is: “(tstate != NULL && bfs_thread_switch == tstate)”.
>
> If the predicate is True before you call COND_WAIT() and True after you
> call COND_WAIT(), either you don’t need to call COND_WAIT() at all, or you
> need to loop until the predicate is False. There is no guarantee that a
> condition wait actually did anything at all. Yes, there will be spurious
> wake ups, but you can’t tell if the thread actually blocked and then woke
> up, or never blocked at all. If it never actually blocks, then that code
> path is not helpful.
>
> On Windows, before this loop in bfs_schedule():
>
> +        COND_RESET(tstate->bfs_cond);
> +        while (bfs_thread != tstate) {
> +            _bfs_timed_wait(tstate, timestamp);
> +            timestamp = get_timestamp();
> +        }
>
> You might want to avoid the call to reset the condition variable if the
> predicate is already False.
>
> -peter
>
>
>
> On 4/12/10 8:12 AM, "Nir Aides" <nir at winpdb.org <http://nir@winpdb.org>  <
> http://nir@winpdb.org> > wrote:
>
> Hi Peter,
>
> There is no need for a loop in bfs_yield().
>
>
> On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante <
> peter.a.portante at gmail.com <http://peter.a.portante@gmail.com>  <
> http://peter.a.portante@gmail.com> > wrote:
>
> Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/000095399/functions/pthread_cond_timedwait.html<
> http://www.opengroup.org/onlinepubs/000095399/functions/pthread_cond_timedwait.html>
>
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it quickly) of how to implement POSIX semantics on
> Windows:
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html <
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html>
>
>
> Before you can evaluate the effectiveness of any of the proposed scheduling
> schemes, the fundamental uses of mutexes and condition variables, and their
> implementations, must be sound.
>
> -peter
>
>
>
> On 4/11/10 6:50 PM, "Nir Aides" <
>
> nir at winpdb.org <http://nir@winpdb.org>  <http://nir@winpdb.org> <
> http://nir@winpdb.org> > wrote:
>
> Hello all,
>
> I would like to kick this discussion back to life with a simplified
> implementation of the BFS scheduler, designed by the Linux kernel hacker Con
> Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt <
> http://ck.kolivas.org/patches/bfs/sched-BFS.txt>
>
> I submitted bfs.patch at  http://bugs.python.org/issue7946 <
> http://bugs.python.org/issue7946> . It is work in progress but is ready
> for some opinion.
>
>
> On my machine BFS gives comparable performance to gilinter, and seems to
> schedule threads more fairly, predictably, and with lower rate of context
> switching. Its basic design is very simple but nevertheless it was designed
> by an expert in this field, two characteristics which combine to make it
> attractive to this case.
>
> The problem addressed by the GIL has always been *scheduling* threads to
> the interpreter, not just controlling access to it, and therefore the GIL, a
> lock implemented as a simple semaphore was the wrong solution.
>
> The patches by Antoine and David essentially evolve the GIL into a
> scheduler, however both cause thread starvation or high rate of context
> switching in some scenarios:
>
> With Floren't write test ( http://bugs.python.org/issue7946#msg101120 <
> http://bugs.python.org/issue7946#msg101120> ):
>
> 2 bg threads, 2 cores set to performance, karmic, PyCon patch, context
> switching shoots up to 200K/s.
> 2 bg threads, 1 core, set to on-demand, karmic, idle machine, gilinter
> patch starves one of the bg threads.
> 4 bg threads, 4x1 core xeon, centos 5.3, gilinter patch, all bg threads
> starved, context switching shoots up to 250K/s.
>
> With UDP test ( http://bugs.python.org/file16316/udp-iotest.py <
> http://bugs.python.org/file16316/udp-iotest.py> ), add
> zlib.compress(b'GIL') to the workload:
>
> both gilinter and PyCon patches starve the IO thread.
>
> The BFS patch currently involves more overhead by reading the time stamp on
> each yield and schedule operations. In addition it still remains to address
> some issues related to timestamps such as getting different time stamp
> readings on different cores on some (older) multi-core systems.
>
> Any thoughts?
>
> Nir
>
>
>
> On Sun, Mar 14, 2010 at 12:46 AM, Antoine Pitrou <
>
> solipsis at pitrou.net <http://solipsis@pitrou.net>  <
> http://solipsis@pitrou.net> <http://solipsis@pitrou.net> > wrote:
>
>
> Hello,
>
> As some of you may know, Dave Beazley recently exhibited a situation
> where the new GIL shows quite a poor behaviour (the old GIL isn't very
> good either, but still a little better). This issue is followed in
> http://bugs.python.org/issue7946 <http://bugs.python.org/issue7946>
>
> This situation is when an IO-bound thread wants to process a lot of
> incoming packets, while one (or several) CPU-bound thread is also
> running. Each time the IO-bound thread releases the GIL, the CPU-bound
> thread gets it and keeps holding it for at least 5 milliseconds
> (default setting), which limits the number of individual packets which
> can be recv()'ed and processed per second.
>
> I have proposed two mechanisms, based on the same idea: IO-bound
> threads should be able to steal the GIL very quickly, rather than
> having to wait for the whole "thread switching interval" (again, 5 ms
> by default). They differ in how they detect an "IO-bound threads":
>
> - the first mechanism is actually the same mechanism which was
>   embodied in the original new GIL patch before being removed. In this
>   approach, IO methods (such as socket.read() in socketmodule.c)
>   releasing the GIL must use a separate C macro when trying to get the
>   GIL back again.
>
> - the second mechanism dynamically computes the "interactiveness" of a
>   thread and allows interactive threads to steal the GIL quickly. In
>   this approach, IO methods don't have to be modified at all.
>
> Both approaches show similar benchmark results (for the benchmarks
> that I know of) and basically fix the issue put forward by Dave Beazley.
>
> Any thoughts?
>
> Regards
>
> Antoine.
>
>
> ______________________________
> _________________
> Python-Dev mailing list
> Python-Dev at python.org <http://Python-Dev@python.org>  <
> http://Python-Dev@python.org> <http://Python-Dev@python.org>
> http://mail.python.org/mailman/listinfo/python-dev <
> http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org <
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org>
>
>
>
> ------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <http://Python-Dev@python.org>  <
> http://Python-Dev@python.org> <http://Python-Dev@python.org>
> http://mail.python.org/mailman/listinfo/python-dev <
> http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com<
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com>
>
>
>
>
> ------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <http://Python-Dev@python.org>  <
> http://Python-Dev@python.org>
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com
>
>
>
> ------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org <http://Python-Dev@python.org>
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com
>
>
>
> ------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100413/01bad1cb/attachment-0001.html>


More information about the Python-Dev mailing list