[Python-Dev] Unpickling py2 str as py3 bytes (and vice versa) - implementation (issue #6784)
Michael Foord
fuzzyman at voidspace.org.uk
Tue Mar 13 20:42:20 CET 2012
On 13 Mar 2012, at 04:44, Merlijn van Deen wrote:
> http://bugs.python.org/issue6784 ("byte/unicode pickle
> incompatibilities between python2 and python3")
>
> Hello all,
>
> Currently, pickle unpickles python2 'str' objects as python3 'str'
> objects, where the encoding to use is passed to the Unpickler.
> However, there are cases where it makes more sense to unpickle a
> python2 'str' as python3 'bytes' - for instance when it is actually
> binary data, and not text.
>
> Currently, the mapping is as follows, when reading a pickle:
> python2 'str' -> python3 'str' (using an encoding supplied to Unpickler)
> python2 'unicode' -> python3 'str'
>
> or, when creating a pickle using protocol <= 2:
> python3 'str' -> python2 'unicode'
> python3 'bytes' -> python2 '__builtins__.bytes object'
>
It does seem unfortunate that by default it is impossible for a developer to "do the right thing" as regards pickling / unpickling here. Binary data on Python 2 being unpickled as Unicode on Python 3 is presumably for the convenience of developers doing the *wrong thing* (and only works for ascii anyway).
All the best,
Michael Foord
> This issue suggests to add a flag to change the behaviour as follows:
> a) python2 'str' -> python3 'bytes'
> b) python3 'bytes' -> python2 'str'
>
> The question on this is how to pass this flag. To quote Antoine (with
> permission) on my mail about this issue on core-mentorship:
>
>> I haven't answered because I'm unsure about the approach itself - do we
>> want to add yet another argument to pickle methods, especially this late
>> in the 3.x development cycle?
>
>
> Currently, I have implemented it using an extra argument for the
> Pickler and Unpickler objects ('bytestr'), which toggles the
> behaviour. I.e.:
>>>> pickled = Pickler(data, bytestr=True); unpickled = Unpickler(data, bytestr=True).
> This is the approach used in pickle_bytestr.patch [1]
>
> Another option would be to implement a seperate Pickler/Unpickler
> object, such that
>>>> pickled = BytestrPickler(data, bytestr=True); unpickled = BytestrUnpickler(data, bytestr=True)
> This is the approach I initially implemented [2].
>
> Alternatively, there is the option only to implement the Unpickler,
> leaving the Pickler as it is. This allows
>>>> unpickled = Unpickler(data, encoding=bytes)
> where the bytes type is used as a special 'flag'.
>
> And, of course, there is the option not to implement this in the stdlib at all.
>
>
> What are your ideas on this?
>
> Best,
> Merlijn
>
> [0] http://bugs.python.org/issue6784
> [1] http://bugs.python.org/file24719/pickle_bytestr.patch
> [2] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>
--
http://www.voidspace.org.uk/
May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing
http://www.sqlite.org/different.html
More information about the Python-Dev
mailing list