[Python-ideas] Fast pickle

Serhiy Storchaka storchaka at gmail.com
Sat Nov 17 22:18:29 CET 2012


I have an idea which can increase pickle/unpickle performance. This requires a change of format, so we need a new version of the protocol. May be it will be a 4 (PEP 3154) or 5.

All items should be aligned to 4 bytes. It allow fast reading/writing of small integers, memcpy and UTF-8 codec should be faster on aligned data.

In order not to waste space, a byte code should be combined with the data or the size of data.

For integers:

<code> <24-bit integer>
<code> <24-bit size> <size-byte integer> <padding>
<code> <56-bit size> <size-byte integer> <padding>

For strings:

<code> <8-bit size> <size-byte string> <padding>
<code> <24-bit size> <size-byte string> <padding>
<code> <56-bit size> <size-byte string> <padding>

For collections:

<code> <24-bit size> <item1> <item2> ... <item #size>
<code> <56-bit size> <item1> <item2> ... <item #size>

For references:
 
<code> <24-bit index>
<code> <56-bit index>

For 1- and 2-byte integers, this can be expensive. We can add a special code for grouping. It will be even shorter than in the old protocols.

<group code> <item code> <16-bit count> <integer1> <integer2> ... <integer #count> <padding>

What do you think about this?





More information about the Python-ideas mailing list