[issue19920] TarFile.list() fails on some files

Serhiy Storchaka report at bugs.python.org
Sat Dec 7 18:37:40 CET 2013


New submission from Serhiy Storchaka:

TarFile.list() fails on some files. In particular on Lib/test/testtar.tar.

>>> import tarfile
>>> tarfile.open('Lib/test/testtar.tar').list()
?rw-r--r-- tarfile/tarfile       7011 2003-01-06 01:19:43 ustar/conttype 
?rw-r--r-- tarfile/tarfile       7011 2003-01-06 01:19:43 ustar/regtype 
?rwxr-xr-x tarfile/tarfile          0 2003-01-06 01:19:43 ustar/dirtype/ 
?rwxr-xr-x tarfile/tarfile        255 2003-01-06 01:19:43 ustar/dirtype-with-size/ 
?rw-r--r-- tarfile/tarfile          0 2003-01-06 01:19:43 ustar/lnktype link to ustar/regtype 
?rwxrwxrwx tarfile/tarfile          0 2003-01-06 01:19:43 ustar/symtype -> regtype 
?rw-rw---- tarfile/tarfile        3,0 2003-01-06 01:19:43 ustar/blktype 
?rw-rw-rw- tarfile/tarfile        1,3 2003-01-06 01:19:43 ustar/chrtype 
?rw-r--r-- tarfile/tarfile          0 2003-01-06 01:19:43 ustar/fifotype 
?rw-r--r-- tarfile/tarfile      86016 2003-01-06 01:19:43 ustar/sparse 
?rw-r--r-- tarfile/tarfile       7011 2003-01-06 01:19:43 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/serhiy/py/cpython/Lib/tarfile.py", line 1846, in list
    print(tarinfo.name + ("/" if tarinfo.isdir() else ""), end=' ')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 14: surrogates not allowed

Command-line interface of the tarfile module also fails:

$ ./python -m tarfile -v -l Lib/test/testtar.tar
?rw-r--r-- tarfile/tarfile       7011 2003-01-06 01:19:43 ustar/conttype 
?rw-r--r-- tarfile/tarfile       7011 2003-01-06 01:19:43 ustar/regtype 
?rwxr-xr-x tarfile/tarfile          0 2003-01-06 01:19:43 ustar/dirtype/ 
?rwxr-xr-x tarfile/tarfile        255 2003-01-06 01:19:43 ustar/dirtype-with-size/ 
?rw-r--r-- tarfile/tarfile          0 2003-01-06 01:19:43 ustar/lnktype link to ustar/regtype 
?rwxrwxrwx tarfile/tarfile          0 2003-01-06 01:19:43 ustar/symtype -> regtype 
?rw-rw---- tarfile/tarfile        3,0 2003-01-06 01:19:43 ustar/blktype 
?rw-rw-rw- tarfile/tarfile        1,3 2003-01-06 01:19:43 ustar/chrtype 
?rw-r--r-- tarfile/tarfile          0 2003-01-06 01:19:43 ustar/fifotype 
?rw-r--r-- tarfile/tarfile      86016 2003-01-06 01:19:43 ustar/sparse 
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File "/home/serhiy/py/cpython/Lib/tarfile.py", line 2500, in <module>
    main()
  File "/home/serhiy/py/cpython/Lib/tarfile.py", line 2444, in main
    tf.list(verbose=args.verbose)
  File "/home/serhiy/py/cpython/Lib/tarfile.py", line 1846, in list
    print(tarinfo.name + ("/" if tarinfo.isdir() else ""), end=' ')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 14: surrogates not allowed
?rw-r--r-- tarfile/tarfile       7011 2003-01-06 01:19:43 serhiy at raxxla:~/py/cpython$

----------
components: IO, Library (Lib), Unicode
messages: 205475
nosy: benjamin.peterson, ezio.melotti, haypo, lars.gustaebel, lemburg, pitrou, serhiy.storchaka
priority: normal
severity: normal
status: open
title: TarFile.list() fails on some files
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19920>
_______________________________________


More information about the Python-bugs-list mailing list