[Python-Dev] Unpickling py2 str as py3 bytes (and vice versa) - implementation (issue #6784)

Michael Foord fuzzyman at voidspace.org.uk
Tue Mar 13 20:42:20 CET 2012


On 13 Mar 2012, at 04:44, Merlijn van Deen wrote:

> http://bugs.python.org/issue6784 ("byte/unicode pickle
> incompatibilities between python2 and python3")
> 
> Hello all,
> 
> Currently, pickle unpickles python2 'str' objects as python3 'str'
> objects, where the encoding to use is passed to the Unpickler.
> However, there are cases where it makes more sense to unpickle a
> python2 'str' as python3 'bytes' - for instance when it is actually
> binary data, and not text.
> 
> Currently, the mapping is as follows, when reading a pickle:
> python2 'str' -> python3 'str' (using an encoding supplied to Unpickler)
> python2 'unicode' -> python3 'str'
> 
> or, when creating a pickle using protocol <= 2:
> python3 'str' -> python2 'unicode'
> python3 'bytes' -> python2 '__builtins__.bytes object'
> 


It does seem unfortunate that by default it is impossible for a developer to "do the right thing" as regards pickling / unpickling here. Binary data on Python 2 being unpickled as Unicode on Python 3 is presumably for the convenience of developers doing the *wrong thing* (and only works for ascii anyway).

All the best,

Michael Foord


> This issue suggests to add a flag to change the behaviour as follows:
> a) python2 'str' -> python3 'bytes'
> b) python3 'bytes' -> python2 'str'
> 
> The question on this is how to pass this flag. To quote Antoine (with
> permission) on my mail about this issue on core-mentorship:
> 
>> I haven't answered because I'm unsure about the approach itself - do we
>> want to add yet another argument to pickle methods, especially this late
>> in the 3.x development cycle?
> 
> 
> Currently, I have implemented it using an extra argument for the
> Pickler and Unpickler objects ('bytestr'), which toggles the
> behaviour. I.e.:
>>>> pickled = Pickler(data, bytestr=True); unpickled = Unpickler(data, bytestr=True).
> This is the approach used in pickle_bytestr.patch [1]
> 
> Another option would be to implement a seperate Pickler/Unpickler
> object, such that
>>>> pickled = BytestrPickler(data, bytestr=True); unpickled = BytestrUnpickler(data, bytestr=True)
> This is the approach I initially implemented [2].
> 
> Alternatively, there is the option only to implement the Unpickler,
> leaving the Pickler as it is. This allows
>>>> unpickled = Unpickler(data, encoding=bytes)
> where the bytes type is used as a special 'flag'.
> 
> And, of course, there is the option not to implement this in the stdlib at all.
> 
> 
> What are your ideas on this?
> 
> Best,
> Merlijn
> 
> [0] http://bugs.python.org/issue6784
> [1] http://bugs.python.org/file24719/pickle_bytestr.patch
> [2] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
> 


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html







More information about the Python-Dev mailing list