why is bytearray treated so inefficiently by pickle?
Irmen de Jong
irmen.NOSPAM at xs4all.nl
Sun Nov 27 09:33:22 EST 2011
Hi,
A bytearray is pickled (using max protocol) as follows:
>>> pickletools.dis(pickle.dumps(bytearray([255]*10),2))
0: \x80 PROTO 2
2: c GLOBAL '__builtin__ bytearray'
25: q BINPUT 0
27: X BINUNICODE u'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
52: q BINPUT 1
54: U SHORT_BINSTRING 'latin-1'
63: q BINPUT 2
65: \x86 TUPLE2
66: q BINPUT 3
68: R REDUCE
69: q BINPUT 4
71: . STOP
>>> bytearray("\xff"*10).__reduce__()
(<type 'bytearray'>, (u'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff', 'latin-1'), None)
Is there a particular reason it is encoded so inefficiently? Most notably, the actual
*bytes* in the bytearray are represented by an UTF-8 string. This needs to be
transformed into a unicode string and then encoded back into bytes, when unpickled. The
thing being a bytearray, I would expect it to be pickled as such: a sequence of bytes.
And then possibly converted back to bytearray using the constructor that takes the bytes
directly (BINSTRING/BINBYTES pickle opcodes).
The above occurs both on Python 2.x and 3.x.
Any ideas? Candidate for a patch?
Irmen.
More information about the Python-list
mailing list