[issue12281] bytes.decode('mbcs', 'ignore') does replace undecodable bytes on Windows Vista or later
STINNER Victor
report at bugs.python.org
Fri Jun 17 02:35:35 CEST 2011
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
> What is the use of these code_page_encode() functions?
I wrote them to be able to write tests.
We can maybe use them to implement the Python code page codecs using a
custom codec register function: see msg138246. Windows codecs seem to be
less reliable/portable than Python builtin codecs, they behave
differently depending on the Windows version. Windows codecs are maybe
faster, I should (write and) run a benchmark.
My main concern is to fix error handling of the Python mbcs codec.
--
I am also trying to factorize the code in posixmodule.c: I would like to
remove the bytes implementation of each function when a function has two
implementations (bytes and Unicode) only for Windows. The idea is to
decode filenames exactly as Windows do and reuse the Unicode
implementation. I don't know yet how Windows do decode bytes filenames
(especially how it handles undecodable bytes), I suppose that it uses
MultiByteToWideChar using cp=CP_ACP and flags=0.
We may patch os.fsdecode() to handle undecodable bytes like Windows
does. codecs.code_page_decode() would help this specific idea, except
that my current patch doesn't allow to specify directly the flags.
"replace" and "ignore" error handlers don't behave as flags=0, or at
least not in some cases. codecs.code_page_decode() should allow to
specific an error handler *or* the flags (mutual exclusive options).
Example:
def fsdecode(filename):
if isinstance(filename, bytes):
return codecs.code_page_decode(codecs.CP_ACP, filename, flags=0)
elif isinstance(filename, str):
return filename
else:
raise TypeError()
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12281>
_______________________________________
More information about the Python-bugs-list
mailing list