[New-bugs-announce] [issue8390] tarfile: use surrogates for undecode fields
report at bugs.python.org
Wed Apr 14 01:53:15 CEST 2010
New submission from STINNER Victor <victor.stinner at haypocalc.com>:
When reading a tar archive, tarfile decodes fields using "replace" error handler by default. The result is that we loose informations if there is an undecodable character.
Since the PEP 383, undecodable filenames are stored using surrogates in Python3. I think that it's a good idea to use surrogates for tar, because it's a common problem to have undecodable data in a tar archive (see the unicode section of the tarfile documentation).
components: Library (Lib), Unicode
nosy: haypo, loewis
title: tarfile: use surrogates for undecode fields
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file16917/tarfile_surrogates.patch
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce