[New-bugs-announce] [issue8784] tarfile/Windows: Don't use mbcs as the default encoding
report at bugs.python.org
Sat May 22 03:22:15 CEST 2010
New submission from STINNER Victor <victor.stinner at haypocalc.com>:
mbcs encoding replace non encodable characters (loose information) and doesn't support surrogateescape error handler. It ignores the error handler argument: see #850997, and tarfile now uses surrogateescape error handler by default (#8390). This encoding is just horrible for unicode support :-)
Since Windows native API use unicode character (UTF-16), I think that it would be better to use utf-8 for the default encoding on Windows. utf-8 is able to encode and decode the full Unicode charset and supports all error handlers (especially surrogateescape).
Attached patch sets the default encoding to utf-8 on Windows, and removes the test ENCODING is None because sys.getfilesystemencoding() cannot be None anymore (in 3.2 only, it's a recent change: #8610).
components: Library (Lib), Unicode, Windows
nosy: haypo, lars.gustaebel
title: tarfile/Windows: Don't use mbcs as the default encoding
versions: Python 3.2
Added file: http://bugs.python.org/file17435/tarfile_windows_utf8.patch
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce