[New-bugs-announce] [issue11224] 3.2: tarfile.getmembers causes 100% cpu usage on Windows

Sridhar Ratnakumar report at bugs.python.org
Wed Feb 16 19:12:26 CET 2011

New submission from Sridhar Ratnakumar <sridharr at activestate.com>:

tarfile.getmembers has become extremely slow on Windows. This was triggered in r85916 by Lars Gustaebel on Oct 29, 2010 to "add read support for all missing variants of the GNU sparse extensions".

To reproduce, use this "tgz" file:


It contains another tgz file called "data.tar.gz". Run `.getmembers()` on data.tar.gz.


This invokes tarfile._FileInFile.read(...) that seems to be cause of slowness (or rather a hang). 

I had to workaround this issue by monkey-patching the above `read` function to revert the change:

+if sys.version_info[:2] >= (3,2):
+    import tarfile
+    class _FileInFileNoSparse(tarfile._FileInFile):
+        def read(self, size):
+            if size is None:
+                size = self.size - self.position
+            else:
+                size = min(size, self.size - self.position)
+            self.fileobj.seek(self.offset + self.position)
+            self.position += size
+            return self.fileobj.read(size)
+    tarfile._FileInFile = _FileInFileNoSparse
+    LOG.info('Monkey patching `tarfile.py` to disable part of r85916 (py3k)')

We caught this bug as part of testing ActiveState PyPM on Python 3.2

If you want the easiest way to reproduce this, I can send you (in private) an internal build of ActivePython-3.2 containing PyPM. Running "pypm install numpy" (with breakpoints in tarfile.py) is all that is required to reproduce.

components: Library (Lib), Windows
messages: 128685
nosy: lars.gustaebel, srid
priority: normal
severity: normal
status: open
title: 3.2: tarfile.getmembers causes 100% cpu usage on Windows
type: resource usage
versions: Python 3.2

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list