[docs] [issue19847] Setting the default filesystem-encoding
report at bugs.python.org
Mon Dec 2 13:34:18 CET 2013
STINNER Victor added the comment:
"It is nice that you could fixed the documentation due to this report but this was just a sideeffect - so closing this report and moving it to "Documentation" was maybe wrong."
Oh sorry, I read the issue too quickly, I stopped at the first sentence. I reopen the issue the reply to the other points.
"In my opinion relying on the locale environment is risky since filesystem-encoding != locale. This is especially the case if working on a filesystem from an external media like an external hard disk drive. Operating on multiple media can also result in different filesystem-encodings."
This issue is not specific to Python. If you mount an USB key formated in VFAT with the wrong encoding on Linux, you will get mojibake in your file explorer. Same issue if you connect a network share (ex: NFS) using a different encoding than the server. You can find many other examples (hint: Mac OS X and Unicode normalization).
There is no good compromise here. The only two safe options are:
(A) convert filenames of your filesystem to the same encoding than your computer (there are tools for that, like convmv)
(B) use raw bytes instead of Unicode, Python 3 should accept bytes anywhere that OS data is expected (filenames, command line arguments, environment variables)
All operating systems (except Windows) are now using UTF-8 by default for the locale encoding. So slowly, mojibake issues on filename should become very rare.
"It would be useful if the user can make his own checks and change the default filesystem-encoding if needed."
This idea was already proposed in issue #8622, but it was a big fail. Please read my following email for more information:
resolution: fixed ->
status: closed -> open
Python tracker <report at bugs.python.org>
More information about the docs