
On Tue, 19 Nov 2013 15:17:06 -0600 Tim Peters <tim.peters@gmail.com> wrote:
Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?) - a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes
I suspect we have very different mental models here. By "has an opcode", I do NOT mean "must be visible to the opcode-decoding loop". I just mean "has a unique byte assigned in the pickle opcode space".
I expect that in the CPython implementation of unpickling, the buffering layer would _consume_ the FRAME opcode, along with the frame size. The opcode-decoding loop would never see it.
But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic.
Ahah, ok, I see where you're going. But how many other implementations of unpickling are there?
Initially, all I desperately ;-) want changed here is for the _buffering layer_, on the writing end, to write 9 bytes instead of 8 (1 new one for a FRAME opcode), and on the reading end to consume 9 bytes instead of 8 (extra credit if it checked the first byte to verify it really is a FRAME opcode - there's nothing wrong with sanity checks).
Then it becomes _possible_ to optimize "small pickles" later (in the sense of not bothering to frame them at all).
So the CPython unpickler must be able to work with and without framing by detecting the FRAME opcode? Otherwise the "later optimization" can't work. Regards Antoine.