[Python-checkins] r88528 - in python/branches/py3k: Lib/tarfile.py Lib/test/test_tarfile.py Misc/NEWS

lars.gustaebel python-checkins at python.org
Wed Feb 23 12:42:22 CET 2011


Author: lars.gustaebel
Date: Wed Feb 23 12:42:22 2011
New Revision: 88528

Log:
Issue #11224: Improved sparse file read support (r85916) introduced a
regression in _FileInFile which is used in file-like objects returned
by TarFile.extractfile(). The inefficient design of the
_FileInFile.read() method causes various dramatic side-effects and
errors:

  - The data segment of a file member is read completely into memory
    every(!) time a small block is accessed. This is not only slow
    but may cause unexpected MemoryErrors with very large files.
  - Reading members from compressed tar archives is even slower
    because of the excessive backwards seeking which is done when the
    same data segment is read over and over again.
  - As a backwards seek on a TarFile opened in stream mode is not
    possible, using extractfile() fails with a StreamError.



Modified:
   python/branches/py3k/Lib/tarfile.py
   python/branches/py3k/Lib/test/test_tarfile.py
   python/branches/py3k/Misc/NEWS

Modified: python/branches/py3k/Lib/tarfile.py
==============================================================================
--- python/branches/py3k/Lib/tarfile.py	(original)
+++ python/branches/py3k/Lib/tarfile.py	Wed Feb 23 12:42:22 2011
@@ -760,9 +760,8 @@
                         self.map_index = 0
             length = min(size, stop - self.position)
             if data:
-                self.fileobj.seek(offset)
-                block = self.fileobj.read(stop - start)
-                buf += block[self.position - start:self.position + length]
+                self.fileobj.seek(offset + (self.position - start))
+                buf += self.fileobj.read(length)
             else:
                 buf += NUL * length
             size -= length

Modified: python/branches/py3k/Lib/test/test_tarfile.py
==============================================================================
--- python/branches/py3k/Lib/test/test_tarfile.py	(original)
+++ python/branches/py3k/Lib/test/test_tarfile.py	Wed Feb 23 12:42:22 2011
@@ -419,6 +419,22 @@
 
     mode="r|"
 
+    def test_read_through(self):
+        # Issue #11224: A poorly designed _FileInFile.read() method
+        # caused seeking errors with stream tar files.
+        for tarinfo in self.tar:
+            if not tarinfo.isreg():
+                continue
+            fobj = self.tar.extractfile(tarinfo)
+            while True:
+                try:
+                    buf = fobj.read(512)
+                except tarfile.StreamError:
+                    self.fail("simple read-through using TarFile.extractfile() failed")
+                if not buf:
+                    break
+            fobj.close()
+
     def test_fileobj_regular_file(self):
         tarinfo = self.tar.next() # get "regtype" (can't use getmember)
         fobj = self.tar.extractfile(tarinfo)

Modified: python/branches/py3k/Misc/NEWS
==============================================================================
--- python/branches/py3k/Misc/NEWS	(original)
+++ python/branches/py3k/Misc/NEWS	Wed Feb 23 12:42:22 2011
@@ -27,6 +27,10 @@
 Library
 -------
 
+- Issue #11224: Fixed a regression in tarfile that affected the file-like
+  objects returned by TarFile.extractfile() regarding performance, memory
+  consumption and failures with the stream interface.
+
 - Issue #10924: Adding salt and Modular Crypt Format to crypt library.
   Moved old C wrapper to _crypt, and added a Python wrapper with
   enhanced salt generation and simpler API for password generation.


More information about the Python-checkins mailing list