[New-bugs-announce] [issue13517] readdir() in os.listdir not threadsafe on OSX 10.6.8
Thouis (Ray) Jones
report at bugs.python.org
Thu Dec 1 21:53:52 CET 2011
New submission from Thouis (Ray) Jones <thouis at gmail.com>:
On my system (OSX 10.6.8) using the python.org 32/64-bit build of 2.7.2, I see incorrect results from os.listdir() in a threaded program. The error is that the result of os.listdir() is missing a few files from its list.
First, my use case. I work with large image-based datasets, often with hundreds of thousands of images. The first step in processing is to locate all of these images and extract some basic information (size, channels, etc.). To do this more efficiently on network filesystems, where listing directories and stat()ing files is often slow, I wrote a multithreaded analog to os.walk(). While validating its results against unix 'find', I saw discrepancies in the number of files found.
My guess is that OSX's readdir() is not reentrant when dealing with SMB shares, even on different DIR pointers. It's also possible that readdir() is not reentrant with lstat(), as some of my tests seemed to indicate this, but I need to run some more tests to be sure that's what I was actually seeing.
In any case, there are three possible ways to fix this, I think.
- Remove the Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS around readdir() in posixmodule.c
- Put a mutex on readdir()
- Use readdir_r(). I've attached a potential patch for 2.7.2 for this solution.
I would prefer the second or last approach, as they preserve the ability to do other work while listing large directories.
By my reading of the python 3.0 to 3.4 sources, this problem exists in those versions, as well.
components: Library (Lib)
title: readdir() in os.listdir not threadsafe on OSX 10.6.8
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file23832/py272_readdir_r.patch
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce