[issue13505] Bytes objects pickled in 3.x with protocol <=2 are unpickled incorrectly in 2.x

Antoine Pitrou report at bugs.python.org
Wed Nov 30 03:21:48 CET 2011


New submission from Antoine Pitrou <pitrou at free.fr>:

In Python 3.2:

>>> pickle.dumps(b'xyz', protocol=2)
b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01(KxKyKze\x85q\x02Rq\x03.'

In Python 2.7:

>>> pickle.loads(b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01(KxKyKze\x85q\x02Rq\x03.')
'[120, 121, 122]'

The problem is that the bytes() constructor argument is a list of ints, which gives a different result when reconstructed under 2.x where bytes is an alias of str:

>>> pickletools.dis(pickle.dumps(b'xyz', protocol=2))
    0: \x80 PROTO      2
    2: c    GLOBAL     '__builtin__ bytes'
   21: q    BINPUT     0
   23: ]    EMPTY_LIST
   24: q    BINPUT     1
   26: (    MARK
   27: K        BININT1    120
   29: K        BININT1    121
   31: K        BININT1    122
   33: e        APPENDS    (MARK at 26)
   34: \x85 TUPLE1
   35: q    BINPUT     2
   37: R    REDUCE
   38: q    BINPUT     3
   40: .    STOP
highest protocol among opcodes = 2

Bytearray objects use a different trick: they pass a (unicode string, encoding) pair which has the same constructor semantics under 2.x and 3.x. Additionally, such encoding is statistically more efficient: a list of 1-byte ints will take 2 bytes per encoded char, while a latin1-to-utf8 transcoded string (BINUNICODE uses utf-8) will take on average 1.5 bytes per encoded char (assuming a 50% probability of higher-than-127 bytes).

>>> pickletools.dis(pickle.dumps(bytearray(b'xyz'), protocol=2))
    0: \x80 PROTO      2
    2: c    GLOBAL     '__builtin__ bytearray'
   25: q    BINPUT     0
   27: X    BINUNICODE 'xyz'
   35: q    BINPUT     1
   37: X    BINUNICODE 'latin-1'
   49: q    BINPUT     2
   51: \x86 TUPLE2
   52: q    BINPUT     3
   54: R    REDUCE
   55: q    BINPUT     4
   57: .    STOP
highest protocol among opcodes = 2

----------
components: Library (Lib)
messages: 148635
nosy: alexandre.vassalotti, irmen, pitrou
priority: high
severity: normal
status: open
title: Bytes objects pickled in 3.x with protocol <=2 are unpickled incorrectly in 2.x
type: behavior
versions: Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13505>
_______________________________________


More information about the Python-bugs-list mailing list