[Python-checkins] bpo-41316: Make tarfile follow specs for FNAME (GH-21511)

Artem Bulgakov webhook-mailer at python.org
Mon Sep 7 12:46:41 EDT 2020

commit: 22748a83d927d3da1beaed771be30887c42b2500
branch: master
author: Artem Bulgakov <ArtemSBulgakov at ya.ru>
committer: GitHub <noreply at github.com>
date: 2020-09-07T09:46:33-07:00

bpo-41316: Make tarfile follow specs for FNAME (GH-21511)

tarfile writes full path to FNAME field of GZIP format instead of just basename if user specified absolute path. Some archive viewers may process file incorrectly. Also it creates security issue because anyone can know structure of directories on system and know username or other personal information.

RFC1952 says about FNAME:
This is the original name of the file being compressed, with any directory components removed.

So tarfile must remove directory names from FNAME and write only basename of file.

Automerge-Triggered-By: @jaraco

A Misc/NEWS.d/next/Library/2020-07-28-12-08-58.bpo-41316.bSCbK4.rst
M Lib/tarfile.py
M Lib/test/test_tarfile.py

diff --git a/Lib/tarfile.py b/Lib/tarfile.py
index 6769066cabd6f..1fae29430feff 100755
--- a/Lib/tarfile.py
+++ b/Lib/tarfile.py
@@ -420,6 +420,8 @@ def _init_write_gz(self):
         self.__write(b"\037\213\010\010" + timestamp + b"\002\377")
         if self.name.endswith(".gz"):
             self.name = self.name[:-3]
+        # Honor "directory components removed" from RFC1952
+        self.name = os.path.basename(self.name)
         # RFC1952 says we must use ISO-8859-1 for the FNAME field.
         self.__write(self.name.encode("iso-8859-1", "replace") + NUL)
diff --git a/Lib/test/test_tarfile.py b/Lib/test/test_tarfile.py
index 4ef20db097163..7b34d53d21601 100644
--- a/Lib/test/test_tarfile.py
+++ b/Lib/test/test_tarfile.py
@@ -1417,12 +1417,15 @@ def write(self, data):
                                    pax_headers={'non': 'empty'})
 class GzipWriteTest(GzipTest, WriteTest):
 class Bz2WriteTest(Bz2Test, WriteTest):
 class LzmaWriteTest(LzmaTest, WriteTest):
@@ -1465,8 +1468,17 @@ def test_file_mode(self):
 class GzipStreamWriteTest(GzipTest, StreamWriteTest):
-    pass
+    def test_source_directory_not_leaked(self):
+        """
+        Ensure the source directory is not included in the tar header
+        per bpo-41316.
+        """
+        tarfile.open(tmpname, self.mode).close()
+        payload = pathlib.Path(tmpname).read_text(encoding='latin-1')
+        assert os.path.dirname(tmpname) not in payload
 class Bz2StreamWriteTest(Bz2Test, StreamWriteTest):
     decompressor = bz2.BZ2Decompressor if bz2 else None
diff --git a/Misc/ACKS b/Misc/ACKS
index a2cdeb8504059..8b0d7a45da169 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -242,6 +242,7 @@ Colm Buckley
 Erik de Bueger
 Jan-Hein Bührman
 Lars Buitinck
+Artem Bulgakov
 Dick Bulterman
 Bill Bumgarner
 Jimmy Burgett
diff --git a/Misc/NEWS.d/next/Library/2020-07-28-12-08-58.bpo-41316.bSCbK4.rst b/Misc/NEWS.d/next/Library/2020-07-28-12-08-58.bpo-41316.bSCbK4.rst
new file mode 100644
index 0000000000000..139a170866ed4
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2020-07-28-12-08-58.bpo-41316.bSCbK4.rst
@@ -0,0 +1 @@
+Fix the :mod:`tarfile` module to write only basename of TAR file to GZIP compression header.
\ No newline at end of file

More information about the Python-checkins mailing list