Threading unfairness

Sat May 18 21:08:55 EDT 2002

[Matt Kimball]
> While writing some multithreaded GUI code with ActivePython 2.2.1
> under Windows XP, on a single processor machine, and I noticed that
> my user-interface thread was very unresponsive when I had a background
> computation thread going.  This annoyed me.  It seemed like the main
> thread where the GUI code was running was being starved of CPU time.

It can be very difficult to sort things like that out.

> Since there doesn't seem to be any way to set thread priorities
> with Python,

There is not.  And it wouldn't help if there were <0.5 wink>.

> I "fixed" the problem with inserting frequent 'time.sleep(0)' calls
> in my compute-intensive thread, and that seemed to give my main GUI
> thread more time to execute when a user-interface event occurred.

That works on Windows because the underlying Win32 API Sleep() call takes an
argument of 0 as *meaning* "yield the timeslice".  On other OSes it may or
may not work.  For example, people playing this trick on Solaris usually use
time.sleep(0.01) instead, because time.sleep(0) is a nop there.

> However, it got me wondering just how bad the situation was, so I
> wrote a simple program to test Python threads, and find the maximum
> time that individual threads go without getting any CPU time.
> Interestingly, the problem seems worse with a low number of threads.
> ...
> The code I used to measure this is at
> http://matt.kimball.net/test_thread.py --
> Maybe someone wants to look it over to make sure I'm not doing
> anything funny, and run a few tests on other operating systems to see
> how they differ.

Sorry, I had too much trouble understanding the code, so wrote a different
driver that seems to me much simpler and more informative.  See attached.
Here's a typical run on Win98SE, after killing most background processes
(note:  that's important -- Python isn't the OS, and has no control over
when the OS decides to boot your program to run some other task):

With span 1.0 and 1 threads:
tid   0 meangap   0.01 maxgap  47.98 ms at iter   2313 of 103305

With span 1.0 and 2 threads:
tid   0 meangap   0.04 maxgap  12.41 ms at iter  22985 of  22985
tid   1 meangap   0.05 maxgap  15.06 ms at iter  18494 of  22079

With span 1.0 and 3 threads:
tid   0 meangap   0.07 maxgap   0.73 ms at iter    517 of  14530
tid   1 meangap   0.07 maxgap   1.02 ms at iter  14018 of  14101
tid   2 meangap   0.07 maxgap   1.02 ms at iter  14013 of  14107

With span 1.0 and 4 threads:
tid   0 meangap   0.09 maxgap   1.06 ms at iter    298 of  10765
tid   1 meangap   0.10 maxgap   1.04 ms at iter  10475 of  10506
tid   2 meangap   0.10 maxgap   1.05 ms at iter  10469 of  10513
tid   3 meangap   0.10 maxgap   0.94 ms at iter  10461 of  10509

With span 1.0 and 5 threads:
tid   0 meangap   0.11 maxgap   1.08 ms at iter   2550 of   8853
tid   1 meangap   0.12 maxgap   1.11 ms at iter   4399 of   8422
tid   2 meangap   0.12 maxgap   1.08 ms at iter   2074 of   8425
tid   3 meangap   0.12 maxgap   1.11 ms at iter   8385 of   8424
tid   4 meangap   0.12 maxgap   1.00 ms at iter   8379 of   8425

With span 1.0 and 6 threads:
tid   0 meangap   0.13 maxgap   1.39 ms at iter   5312 of   7646
tid   1 meangap   0.14 maxgap   1.50 ms at iter   6894 of   6970
tid   2 meangap   0.14 maxgap   1.31 ms at iter    746 of   6972
tid   3 meangap   0.14 maxgap   1.53 ms at iter   6883 of   6972
tid   4 meangap   0.14 maxgap   1.31 ms at iter   6509 of   6971
tid   5 meangap   0.14 maxgap   1.39 ms at iter    729 of   6971

With span 1.0 and 7 threads:
tid   0 meangap   0.16 maxgap   1.61 ms at iter   1396 of   6213
tid   1 meangap   0.17 maxgap   1.56 ms at iter   2792 of   5981
tid   2 meangap   0.17 maxgap   1.60 ms at iter   2786 of   5982
tid   3 meangap   0.17 maxgap   1.52 ms at iter   1136 of   5982
tid   4 meangap   0.17 maxgap   1.66 ms at iter   1130 of   5981
tid   5 meangap   0.17 maxgap   1.57 ms at iter   1126 of   5980
tid   6 meangap   0.17 maxgap   1.56 ms at iter   2765 of   5978

Note the gap times are measured in milliseconds here, not seconds.  The
worst gap was less than 50ms, which isn't noticeable to most people.  Note
that the iterations at which the max gaps occur tend to cluster:  this is
evidence that *all* threads in the program are getting booted, due to the OS
giving some other process a chance to run.

WRT other platforms:

1. This will probably print gaps of 0.  time.clock() has extraordinarily
   fine resolution on Windows (finer than microsecond).  On other
   platforms, change the "now" import to use "time" instead of "clock".

2. Of all the OSes I've futzed with, Windows flavors are actually the
   best-behaved in this sense wrt threads:  Windows is much happier to
   switch threads frequently than other OSes.  Since typical Windows apps
   are thread-happy GUI programs, this probably isn't an accident.

> I think this behavior is probably highly operating system
> dependent.

Extremely, yes.

> The python interpreter is assuming the OS will do something reasonably
> fair when it gives up the global interpreter lock,

Python isn't assuming anything here.  "It's a feature" that Python thread
behavior reflects your native thread behavior; Python isn't trying to be an
operating system.

> ...
> Still, it would be nice if something could be done to make
> threading execution more fair,

As the output above shows, scheduling is very fair on Win98SE, except for
slightly favoring the first thread spawned (tid 0 always got to do more
iterations of its loop than the other tids, while the number of iterations
the other tids got to do are amazingly close to each other).

> but I'm not sure how to do this without creating some sort of
> scheduler in the python interpreter itself.

It wouldn't work -- Python can't tell the OS kernel how to schedule.  You
may be able to *influence* that via piles of platform-specific extension
modules.  BTW, XP may have a slider or swtich buried in one of the Control
Panel applets to influence whether foreground processes get preference in
scheduling (I can't remember, and don't have XP here now).

> ...
> Is the problem as bad under other operating systems?

Probably worse.

> Am I insane to try to do anything in the neighborhood of
> soft-realtime with Python?

Start with a soft-realtime OS.  After spending a few miserable weeks on it,
you'll probably find that the apparent unfairness in your GUI app is a
shallow problem after all <0.9 wink>.

threads-are-a-hoot-ly y'rs  - tim

import threading
from time import clock as now

SPAN = 1.0
MAXTHREADS = 100

maxgap  = [None] * MAXTHREADS
meangap = [None] * MAXTHREADS
maxiter = [None] * MAXTHREADS
totalit = [None] * MAXTHREADS

def busy_thread(index):
    maxgap[index] = -1.0
    current = last = now()
    finish = current + SPAN
    i = 0
    totalgaps = 0.0
    while current <= finish:
        current = now()
        i += 1
        delta = current - last
        totalgaps += delta
        if delta > maxgap[index]:
            maxgap[index] = delta
            maxiter[index] = i
        last = current
    totalit[index] = i
    meangap[index] = totalgaps / i

def drive(nthreads):
    threads = [threading.Thread(target=busy_thread, args=(i,))
               for i in range(nthreads)]

    for t in threads:
        t.start()
    for t in threads:
        t.join()

    print "With span", SPAN, "and", nthreads, "threads:"
    for i in range(nthreads):
        print ("tid %3d meangap %6.2f maxgap %6.2f ms "
               " at iter %6d of %6d" % (i, meangap[i] * 1000,
               maxgap[i] * 1000, maxiter[i], totalit[i]))
    print

for i in range(1, 8):
    drive(i)