[Python-Dev] Unpickling py2 str as py3 bytes (and vice versa) - implementation (issue #6784)
Merlijn van Deen
valhallasw at arctus.nl
Tue Mar 13 12:44:58 CET 2012
http://bugs.python.org/issue6784 ("byte/unicode pickle
incompatibilities between python2 and python3")
Currently, pickle unpickles python2 'str' objects as python3 'str'
objects, where the encoding to use is passed to the Unpickler.
However, there are cases where it makes more sense to unpickle a
python2 'str' as python3 'bytes' - for instance when it is actually
binary data, and not text.
Currently, the mapping is as follows, when reading a pickle:
python2 'str' -> python3 'str' (using an encoding supplied to Unpickler)
python2 'unicode' -> python3 'str'
or, when creating a pickle using protocol <= 2:
python3 'str' -> python2 'unicode'
python3 'bytes' -> python2 '__builtins__.bytes object'
This issue suggests to add a flag to change the behaviour as follows:
a) python2 'str' -> python3 'bytes'
b) python3 'bytes' -> python2 'str'
The question on this is how to pass this flag. To quote Antoine (with
permission) on my mail about this issue on core-mentorship:
> I haven't answered because I'm unsure about the approach itself - do we
> want to add yet another argument to pickle methods, especially this late
> in the 3.x development cycle?
Currently, I have implemented it using an extra argument for the
Pickler and Unpickler objects ('bytestr'), which toggles the
>>> pickled = Pickler(data, bytestr=True); unpickled = Unpickler(data, bytestr=True).
This is the approach used in pickle_bytestr.patch 
Another option would be to implement a seperate Pickler/Unpickler
object, such that
>>> pickled = BytestrPickler(data, bytestr=True); unpickled = BytestrUnpickler(data, bytestr=True)
This is the approach I initially implemented .
Alternatively, there is the option only to implement the Unpickler,
leaving the Pickler as it is. This allows
>>> unpickled = Unpickler(data, encoding=bytes)
where the bytes type is used as a special 'flag'.
And, of course, there is the option not to implement this in the stdlib at all.
What are your ideas on this?
More information about the Python-Dev