[Python-3000] PEP: Python3 and UnicodeDecodeError
victor.stinner at haypocalc.com
Thu Oct 2 14:35:48 CEST 2008
Le Thursday 02 October 2008 14:07:50 M.-A. Lemburg, vous avez écrit :
> On 2008-10-02 13:50, Victor Stinner wrote:
> > This is a PEP (...)
> The PEP doesn't appear to address any potential changes. Wouldn't
> it be better to add such information to the Python3 documentation
> itself ?!
I don't know the right name of this document. Yeah, it may move to Doc/ in
Python3 source code.
> > Example of an invalid bytes sequence: ::
> > >>> str(b'\xff', 'utf8')
> > UnicodeDecodeError
> > >>> str(b'\xff', 'iso-8859-1')
> > 'ÿ'
> You have left out all the options you have by using a different
> error handling mechanism (using a third parameter to str()), e.g.
> 'replace', 'ignore', etc.
Yes, I can explain why replace and ignore can *not* be use in this case. If
you use ignore or replace, filenames will be valid unicode strings, but you
will be unable to open / copy / remove you file.
> > Default encoding
> > ================
> > Python uses "UTF-8" as the default Unicode encoding. You can read the
> > default charset using sys.getdefaultencoding(). The "default encoding" is
> > used by PyUnicode_FromStringAndSize().
> Not only there: the C API makes various assumptions on the default
> encoding as well. We should probably drop the term "default encoding"
> altogether and replace it with "utf-8".
The concept of "default encoding" is unclear in Python. Yes, we might remove
sys.getdefaultencoding() and write that PyUnicode_FromStringAndSize() uses
the UTF-8 charset.
> sys.setdefaultencoding() should probably be dropped altogether from
Victor Stinner aka haypo
More information about the Python-3000