[Python-Dev] A few lessons from the tempfile.py rewrite

Guido van Rossum guido@python.org
Sat, 17 Aug 2002 08:14:47 -0400


> 1) Dummy threads module.
> 
> Currently, a library module that wishes to be thread-safe but still
> work on platforms where threads are not implemented, has to jump
> through hoops.  In tempfile.py we have
> 
> | try:
> |     import thread as _thread
> |     _allocate_lock = _thread.allocate_lock
> | except (ImportError, AttributeError):
> |     class _allocate_lock:
> |         def acquire(self):
> |             pass
> |         release = acquire
> 
> It would be nice if the thread and threading modules existed on all
> platforms, providing these sorts of dummy locks on the platforms that
> don't actually implement threads.  I notice that Queue.py uses 'import
> thread' unconditionally -- perhaps this is already the case?  I can't
> find any evidence of it.

The Queue module was never intended for use in an unthreaded program;
it's specifically intended for safe communication between threads.  So
if you import Queue on a threadless platform, the import thread will
fail for a good reason.

The question with a dummy module is, how far do you want it to go?  I
guess one thing we could do would be to always make the thread module
available, and implement a dummy lock.  The lock would have acquire
and release methods that simply set and test a flag; acquire() raises
an exception the flag is already set (to simulate deadlock), and
release() raises an exception if it isn't set.  But it should *not*
provide start_new_thread; this can be used as a flag to indicate the
real presence of threads.  Then the threading module would import
successfully, but calling start() on a Thread object would fail.  Hm,
I'm not sure if I like that; maybe instantiating the Thread class
should fail already.

There's a backwards incompatibility problem.  There is code that
currently tries to import the thread or threading module and if this
succeeds expects it can use threads (not just locks), and has an
alternative implementation if threads do not exist.  Such code would
break if the thread module always existed.

How about a compromise: we can put a module dummy_thread.py in the
standard library, and at the top of tempfile.py, you can write

try:
    import thread as _thread
except ImportError:
    import dummy_thread as _thread
_allocate_lock = _thread.allocate_lock

If you can provide an implementation of dummy_thread.py I'll gladly
check it in.

> 2) pthread_once equivalent.
> 
> pthread_once is a handy function in the C pthreads library which
> can be used to guarantee that some data object is initialized exactly
> once, and no thread sees it in a partially initialized state.  I had
> to implement a fake version in tempfile.py.
> 
> | _once_lock = _allocate_lock()
> | 
> | def _once(var, initializer):
> |     """Wrapper to execute an initialization operation just once,
> |     even if multiple threads reach the same point at the same time.
> | 
> |     var is the name (as a string) of the variable to be entered into
> |     the current global namespace.
> | 
> |     initializer is a callable which will return the appropriate initial
> |     value for variable.  It will be called only if variable is not
> |     present in the global namespace, or its current value is None.
> | 
> |     Do not call _once from inside an initializer routine, it will deadlock.
> |     """
> | 
> |     vars = globals()
> |     # Check first outside the lock.
> |     if vars.get(var) is not None:
> |         return

(Martin Sj. commented at this line that this would overwrite var if it
was defined as None; from the docstring I gather that that's
intentional behavior.)

> |     try:
> |         _once_lock.acquire()

IMO this has a subtle bug: the acquire() should come *before* the try:
call.  If for whatever reason acquire() fails, you'd end up doing a
release() on a lock you didn't acquire.  It's true that

    l.acquire()
    try:

is not atomic since a signal handler could raise an exception between
them, but there's a race condition either way, and I don't know how to
fix them both at the same time (not without adding a construct to
shield signals, which IMO is overkill -- be Pythonic, live
dangerously, accept the risk that a ^C can screw you.  It can
anyway. :-)

> |         # Check again inside the lock.
> |         if vars.get(var) is not None:
> |             return
> |         vars[var] = initializer()
> |     finally:
> |         _once_lock.release()
> 
> I call it fake for three reasons.  First, it should be using
> threading.RLock so that recursive calls do not deadlock.  That's a
> trivial fix (this sort of high level API probably belongs in
> threading.py anyway).

What's the use case for that?  Surely an initialization function can
avoid calling itself.  I'd say YAGNI.

> Second, it uses globals(), which means that all
> symbols it initializes live in the namespace of its own module, when
> what's really wanted is the caller's module.  And most important, I'm
> certain that this interface is Not The Python Way To Do It.
> Unfortunately, I've not been able to figure out what the Python Way To
> Do It is, for this problem.

In the case of tempfile.py, I think the code will improve in clarity
if you simply write it out.  I tried this and it saved 10 lines of
code (mostly the docstring in _once() -- but that's fair game, since
_once embodies more tricks than the expanded code).

In addition, since gettempdir() is called for the default argument
value of mkstemp(), it would be much simpler to initialize tempdir at
module initialization time; the module initialization is guaranteed to
run only once.  If I do this, I save another 8 lines; but I believe
you probably wanted to postpone calling gettempdir() until any of the
functions that have gettempdir() as their argument get *called*, which
means that in fact all those functions have to be changed to have None
for their default and insert

  if dir is None:
      dir = gettempdir()

at the top of their bodies.

> 3) test_support.TestSkipped and unittest.py
> 
> Simple - you can't use TestSkipped in a unittest.py-based test set.
> This is a missing feature of unittest, which has no notion of skipping
> a given test.  Any exception thrown from inside one of its test
> routines is taken to indicate a failure.
> 
> I think the right fix here is to add a skip() method to
> unittest.TestCase which works with both a bare unittest.py-based test
> framework, and Python's own test_support.py.

Maybe you can bring this one up in the PyUnit list?  I don't know
where that is, but we're basically tracking Steve Purcell's code.
Maybe he has a good argument against this feature, or a better way to
do it; or maybe he thinks it is cool.

Personally, I think the thing to do is put tests that can't always run
in a separate test suite class and only add that class to the list of
suites when applicable.  I see you *almost* stumbled upon this idiom
with the dummy_test_TemporaryFile; but that approach seems overkill:
why not simply skip test_TemporaryFile when it's the same as
NamedTemporaryFile?

--Guido van Rossum (home page: http://www.python.org/~guido/)