
I have an idea which can increase pickle/unpickle performance. This requires a change of format, so we need a new version of the protocol. May be it will be a 4 (PEP 3154) or 5. All items should be aligned to 4 bytes. It allow fast reading/writing of small integers, memcpy and UTF-8 codec should be faster on aligned data. In order not to waste space, a byte code should be combined with the data or the size of data. For integers: <code> <24-bit integer> <code> <24-bit size> <size-byte integer> <padding> <code> <56-bit size> <size-byte integer> <padding> For strings: <code> <8-bit size> <size-byte string> <padding> <code> <24-bit size> <size-byte string> <padding> <code> <56-bit size> <size-byte string> <padding> For collections: <code> <24-bit size> <item1> <item2> ... <item #size> <code> <56-bit size> <item1> <item2> ... <item #size> For references: <code> <24-bit index> <code> <56-bit index> For 1- and 2-byte integers, this can be expensive. We can add a special code for grouping. It will be even shorter than in the old protocols. <group code> <item code> <16-bit count> <integer1> <integer2> ... <integer #count> <padding> What do you think about this?

Hello, On Sat, 17 Nov 2012 23:18:29 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
I have an idea which can increase pickle/unpickle performance. This requires a change of format, so we need a new version of the protocol. May be it will be a 4 (PEP 3154) or 5.
All items should be aligned to 4 bytes. It allow fast reading/writing of small integers, memcpy and UTF-8 codec should be faster on aligned data.
I can see several drawbacks here: - you will still have to support unaligned data (because of memoryview slices) - it may not be significantly faster, because actual creation of objects also contributes to unpickling performance - there will be less code sharing between different protocol versions, making pickle harder to maintain If you think this is worthwhile, I think you should first draft a proof of concept to evaluate the performance gain. Regards Antoine.

Hello, On Sat, 17 Nov 2012 23:18:29 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
I have an idea which can increase pickle/unpickle performance. This requires a change of format, so we need a new version of the protocol. May be it will be a 4 (PEP 3154) or 5.
All items should be aligned to 4 bytes. It allow fast reading/writing of small integers, memcpy and UTF-8 codec should be faster on aligned data.
I can see several drawbacks here: - you will still have to support unaligned data (because of memoryview slices) - it may not be significantly faster, because actual creation of objects also contributes to unpickling performance - there will be less code sharing between different protocol versions, making pickle harder to maintain If you think this is worthwhile, I think you should first draft a proof of concept to evaluate the performance gain. Regards Antoine.
participants (2)
-
Antoine Pitrou
-
Serhiy Storchaka