[New-bugs-announce] [issue8633] tarfile doesn't support undecodable filename in PAX format

STINNER Victor report at bugs.python.org
Thu May 6 00:14:51 CEST 2010


New submission from STINNER Victor <victor.stinner at haypocalc.com>:

tarfile is unable to open a TAR archive in PAX format embedding invalid filenames (filename not encoded in utf8, an undecodable filename). Attached file is an example (contain the file b'z/\xff', not decodable from utf8).

PAX specification has a "invalid" option with 4 values: bypass (default), rename, UTF-8, write.
http://www.opengroup.org/onlinepubs/009695399/utilities/pax.html

As it was done for other formats in issue #8390, PAX can use Python surrogateescape error handler to store undecodable bytes as unicode surrogates.

I think that PAX should be strict by default, but have an option to enable surrogateescape mode.

----------
components: Library (Lib)
files: z-pax.tar
messages: 105094
nosy: haypo, lars.gustaebel, loewis
priority: normal
severity: normal
status: open
title: tarfile doesn't support undecodable filename in PAX format
versions: Python 3.2
Added file: http://bugs.python.org/file17230/z-pax.tar

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8633>
_______________________________________


More information about the New-bugs-announce mailing list