[issue8784] tarfile/Windows: Don't use mbcs as the default encoding
Lars Gustäbel
report at bugs.python.org
Thu Jun 10 20:52:01 CEST 2010
Lars Gustäbel <lars at gustaebel.de> added the comment:
Maybe I'm going out on a limb here, but I think we should again consider what tarfile users on Windows(!) actually use it for under which circumstances. The following list is probably not exhaustive, but IMHO covers 90%:
1. Download tar archives from a webpage (when no zip is supplied) for viewing or extracting.
2. Create backups for personal use.
3. Create source archives from a project for unix users who hate zipfiles.
I am convinced that the tarfile module is not very popular on Windows, because of a simple reason: tar archives are not. Windows users will always prefer zip archives and hence the zipfile module, because it's something they're familiar with.
The point I am trying to make is, that, first, we should not choose a default encoding based on what works best with WinRAR, 7-zip and such, because they all act very differently which makes it impossible. Second, we must not overemphasize the encoding issue to a point where portability is in danger. This means that in almost all real-life cases there are no encoding issues. In my whole tarfile maintaining career I cannot remember a single incident of a tar archive that I got from an external source that contained special characters. The only tar archives that contain special characters in my experience are backups. But: these backups are created and later restored on one and the same system. Again, no encoding issues.
Long story short, I still vote for utf-8, because it enables Windows users to create backups without losing special characters, and it's ASCII-"compatible" and should be able to read 99% of the files that you get from the internet.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8784>
_______________________________________
More information about the Python-bugs-list
mailing list