[New-bugs-announce] [issue17153] tarfile extract fails when Unicode in pathname

Vinay Sajip report at bugs.python.org
Thu Feb 7 17:43:21 CET 2013

New submission from Vinay Sajip:

The attached file failing.tar.gz contains a path with UTF-8-encoded Unicode. This causes extractall() to fail, but only when the destination path is Unicode. That's because it leads to a implicit str->unicode conversion using ASCII.

Test script:

import shutil, tarfile, tempfile

tf = tarfile.open('failing.tar.gz', 'r:gz')
workdir = tempfile.mkdtemp()
    # N.B. ensure dest path is Unicode to trigger the failure


$ python untar.py
Traceback (most recent call last):
  File "untar.py", line 8, in <module>
  File "/usr/lib/python2.7/tarfile.py", line 2046, in extractall
    self.extract(tarinfo, path)
  File "/usr/lib/python2.7/tarfile.py", line 2083, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/lib/python2.7/posixpath.py", line 71, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

components: Library (Lib), Unicode
messages: 181631
nosy: ezio.melotti, vinay.sajip
priority: normal
severity: normal
status: open
title: tarfile extract fails when Unicode in pathname
type: behavior
versions: Python 2.7

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list