[issue9630] Reencode filenames when setting the filesystem encoding

Marc-Andre Lemburg report at bugs.python.org
Wed Sep 29 13:45:15 CEST 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> 
> Forget my previous message, I forgot important points.
> 
>> So the only reason why you have to go through
>> all those hoops is to
>>
>> * allow the complete set of Python supported encoding
>>   names for the PYTHONFSENCODING
>>
>> * make sure that the Py_FilesystemDefaultEncoding is set
>>   to the actual name of the codec as used by the system
> 
> Not only. As I wrote in my first message (msg114191), there are two
> other good reasons to keep the current code but redecode filenames:
> 
>  * Encoding aliases: locale encoding is not always written as the
>    official Python encoding name. Eg. utf8 vs UTF-8, iso8859-1 vs
>    latin_1, etc. We have to be able to load Lib/encodings/aliases.py to
>    to get the Python codec.
> 
>  * Codecs implemented in Python: only ascii, latin1, utf8 and mbcs
>    codecs are builtin. All other encodings are implemented in Python. If
>    your filesystem encoding is ShiftJIS, you have to load
>    Lib/encodings/shift_jis.py to load the codec.
> 
> For these two reasons, we have to import Python modules before being
> able to set the filesystem encoding. So we have to redecode filenames
> after setting the filesystem encodings.

No, that's not needed ! Please see my earlier message: you can still
do all this at a later time during startup and double-check that
the encoding is indeed valid.

The main point is that you don't need to apply all those checks
before setting the file system encoding in the interpreter.
Early on you just assume that the env vars are setup correctly
and head on into starting up the interpreter.

If the decoding fails during startup due to a wrong encoding of
file or path names, the interpreter will signal this. If you have
a case where everything imports fine, you can then still double
check at the time the file system encoding is set now to e.g.
detect cases where the encoding was set to ascii, but in reality
the interpreter was just lucky and the file system encoding
should be utf-8.

----------
title: Redecode filenames when setting the filesystem encoding -> Reencode filenames when setting the filesystem encoding

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9630>
_______________________________________


More information about the Python-bugs-list mailing list