[Python-Dev] tarfile and directory traversal vulnerability

Lars Gustäbel lars at gustaebel.de
Sat Aug 25 12:13:12 CEST 2007

On Fri, Aug 24, 2007 at 07:36:41PM +0200, Jan Matejek wrote:
> once upon a time there was a known vulnerability in tar (CVE-2001-1267,
> [1]), and while tar is now long fixed, python's tarfile module is
> affected too.
> The vulnerability goes basically like this: If you tar a file named
> "../../../../../etc/passwd" and then make the admin untar it,
> /etc/passwd gets overwritten.
> Another variety of this bug is a symlink one: if tar contains files like:
> ./aaaa-directory -> /etc
> ./aaaa-directory/passwd
> then the "aaaa-directory" symlink would be created first and /etc/passwd
> will be overwritten once again.

tarfile currently contains no sanity checks at all. The easiest
way to attack /etc/passwd would be to give tarfile a tar created
with `tar -cPf foo.tar /etc/passwd'.

> I was wondering how to fix it.
> The symlink problem obviously applies only to extractall() method and is
> easily fixed by delaying external (or possibly all) symlink creation,
> similar to how directory attributes are delayed now.
> I've attached a draft of the patch, if you like it, i'll polish it.

Suppose we have:
foo -> /etc

If creation of the foo symlink is delayed, foo/passwd will be
extracted in a directory foo which will be created implicitly.
If we create the foo symlink afterwards it will fail because foo
already exists. The best way would be to completely ignore
members and link targets that are absolute or outside the
archive's scope.

> The traversal problem is harder, and it applies to extract() method as well.
> For extractall() alone, i would use something like:
> if tarinfo.name.startswith('../'):
>     self.extract(tarinfo, path)
> else:
>     warnings.warn("non-local file skipped: %s" % tarinfo.name,
> RuntimeWarning, stacklevel=1)
> For extract(), i am not sure. Maybe it should throw exception when it
> encounters such file, and have a special option to extract such files
> anyway. [...]

Yes, I think that is the right way to do it.

Lars Gustäbel
lars at gustaebel.de

A chicken is an egg's way of producing more eggs.

More information about the Python-Dev mailing list