[issue8383] pickle is unable to encode unicode surrogates
STINNER Victor
report at bugs.python.org
Tue Apr 13 02:39:57 CEST 2010
New submission from STINNER Victor <victor.stinner at haypocalc.com>:
Python3 uses unicode surrogates to store undecodable filenames. Eg. the filename b"abc\xff.py" is encoded as "abc\xdcff.py" if the file system encoding is ASCII. Pickle is unable to store them:
./python -c 'import pickle; pickle.dumps("abc\udcff")'
(...)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 20: surrogates not allowed
This is a limitation of pickle (in the binary mode): Python accepts to store any unicode character, but pickle doesn't.
Using "surrogatepass" error handler should be enough to fix this issue.
Related issue: #3672 (Reject surrogates in utf-8 codec) -> r72208 creates "surrogatepass" error handler.
----------
components: Library (Lib)
messages: 102996
nosy: haypo, lemburg, loewis
severity: normal
status: open
title: pickle is unable to encode unicode surrogates
versions: Python 3.1, Python 3.2
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8383>
_______________________________________
More information about the Python-bugs-list
mailing list