Accepting PEP 3154 for 3.4?

Hello, Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4. I think it would be good to get the PEP and the implementation accepted for 3.4. As far as I can say, this has been a low-controvery proposal, and it brings fairly obvious improvements to the table (which table?). I still need some kind of BDFL or BDFL delegate to do that, though -- unless I am allowed to mark my own PEP accepted :-) (I've asked Tim, specifically, for comments, since he contributed a lot to previous versions of the pickle protocol.) The PEP is at http://www.python.org/dev/peps/pep-3154/ (should be rebuilt soon by the server, I'd say) Alexandre's implementation is tracked at http://bugs.python.org/issue17810 Regards Antoine.

On Sat, Nov 16, 2013 at 11:15 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Hello,
Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4.
I think it would be good to get the PEP and the implementation accepted for 3.4.
+1 Once the core portion of the PEP 451 implementation is done, I plan on tweaking pickle to take advantage of module.__spec__ (where applicable). It would be nice to have the PEP 3154 implementation in place before I do anything in that space. -eric

On 17 Nov 2013 04:45, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
On Sat, Nov 16, 2013 at 11:15 AM, Antoine Pitrou <solipsis@pitrou.net>
wrote:
Hello,
Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4.
I think it would be good to get the PEP and the implementation accepted for 3.4.
+1
Once the core portion of the PEP 451 implementation is done, I plan on tweaking pickle to take advantage of module.__spec__ (where applicable). It would be nice to have the PEP 3154 implementation in place before I do anything in that space.
+1 from me, too. Cheers, Nick.
-eric _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On Sat, Nov 16, 2013 at 10:15 AM, Antoine Pitrou <solipsis@pitrou.net>wrote:
Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4.
I think it would be good to get the PEP and the implementation accepted for 3.4. As far as I can say, this has been a low-controvery proposal, and it brings fairly obvious improvements to the table (which table?). I still need some kind of BDFL or BDFL delegate to do that, though -- unless I am allowed to mark my own PEP accepted :-)
(I've asked Tim, specifically, for comments, since he contributed a lot to previous versions of the pickle protocol.)
The PEP is at http://www.python.org/dev/peps/pep-3154/ (should be rebuilt soon by the server, I'd say)
Alexandre's implementation is tracked at http://bugs.python.org/issue17810
Assuming Tim doesn't object (hi Tim!) I think this PEP is fine to accept -- all the ideas sound good, and I agree with moving to support 8-byte sizes and framing. I haven't looked at the implementation but I trust you as a reviewer and the beta release process. -- --Guido van Rossum (python.org/~guido)

[Antoine Pitrou]
Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4.
I think it would be good to get the PEP and the implementation accepted for 3.4. As far as I can say, this has been a low-controvery proposal, and it brings fairly obvious improvements to the table (which table?). I still need some kind of BDFL or BDFL delegate to do that, though -- unless I am allowed to mark my own PEP accepted :-)
Try it and see whether anyone complains ;-) I like it. I didn't review the code, but the PEP addresses real issues, and the solutions look good on paper ;-) One thing I wonder about: I don't know that non-seekable pickle streams are important use cases, but am willing to be told that they are. In which case, framing is a great idea. But I wonder why it isn't done with a new framing opcode instead (say, FRAME followed by 8-byte count). I suppose that would be like the "prefetch" idea, except that framing opcodes would be mandatory (instead of optional) in proto 4. Why I initially like that: - Uniform decoding loop ("the next thing" _always_ starts with an opcode). - Some easy sanity checking due to the tiny redundancy (if the byte immediately following the current frame is not a FRAME opcode, the pickle is corrupt; and also corrupt if a FRAME opcode is encountered _inside_ the current frame). When slinging 8-byte counts, _some_ sanity-checking seems like a good idea ;-)

18.11.13 07:53, Tim Peters написав(ла):
- Some easy sanity checking due to the tiny redundancy (if the byte immediately following the current frame is not a FRAME opcode, the pickle is corrupt; and also corrupt if a FRAME opcode is encountered _inside_ the current frame).
For efficient unpickling a FRAME opcode followed by 8-byte count should be *last* thing in a frame (unless it is a last frame).

On Mon, Nov 18, 2013 at 3:28 AM, Serhiy Storchaka <storchaka@gmail.com>wrote:
18.11.13 07:53, Tim Peters написав(ла):
- Some easy sanity checking due to the tiny redundancy (if the byte
immediately following the current frame is not a FRAME opcode, the pickle is corrupt; and also corrupt if a FRAME opcode is encountered _inside_ the current frame).
For efficient unpickling a FRAME opcode followed by 8-byte count should be *last* thing in a frame (unless it is a last frame).
I don't understand that. Clearly the framing is the weakest point of the PEP (== elicits the most bikeshedding). I am also unsure about the value of framing when pickles are written to strings. -- --Guido van Rossum (python.org/~guido)

On Mon, 18 Nov 2013 07:48:27 -0800 Guido van Rossum <guido@python.org> wrote:
On Mon, Nov 18, 2013 at 3:28 AM, Serhiy Storchaka <storchaka@gmail.com>wrote:
18.11.13 07:53, Tim Peters написав(ла):
- Some easy sanity checking due to the tiny redundancy (if the byte
immediately following the current frame is not a FRAME opcode, the pickle is corrupt; and also corrupt if a FRAME opcode is encountered _inside_ the current frame).
For efficient unpickling a FRAME opcode followed by 8-byte count should be *last* thing in a frame (unless it is a last frame).
I don't understand that.
Clearly the framing is the weakest point of the PEP (== elicits the most bikeshedding). I am also unsure about the value of framing when pickles are written to strings.
It hasn't much value in that case, but the cost is also small (8 bytes every 64KB, roughly). Regards Antoine.

On Mon, Nov 18, 2013 at 8:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 18 Nov 2013 07:48:27 -0800 Guido van Rossum <guido@python.org> wrote:
On Mon, Nov 18, 2013 at 3:28 AM, Serhiy Storchaka <storchaka@gmail.com wrote:
18.11.13 07:53, Tim Peters написав(ла):
- Some easy sanity checking due to the tiny redundancy (if the byte
immediately following the current frame is not a FRAME opcode, the pickle is corrupt; and also corrupt if a FRAME opcode is encountered _inside_ the current frame).
For efficient unpickling a FRAME opcode followed by 8-byte count should be *last* thing in a frame (unless it is a last frame).
I don't understand that.
Clearly the framing is the weakest point of the PEP (== elicits the most bikeshedding). I am also unsure about the value of framing when pickles are written to strings.
It hasn't much value in that case, but the cost is also small (8 bytes every 64KB, roughly).
That's small if your pickle is large, but for small pickles it can add up. Still, not enough to reject the PEP. Just get Tim to agree with you, or switch to Tim's proposal. -- --Guido van Rossum (python.org/~guido)

[Guido]
Clearly the framing is the weakest point of the PEP (== elicits the most bikeshedding). I am also unsure about the value of framing when pickles are written to strings.
[Antoine]
It hasn't much value in that case,
It has _no_ value in that case, yes? It doesn't appear to have _much_ value in the case of a seekable stream, either - the implementation has always been free to read ahead then. The real value appears to be in cases of non-seekable streams.
but the cost is also small (8 bytes every 64KB, roughly).
That's small if your pickle is large, but for small pickles it can add up.
Which is annoying. It was already annoying when the PROTO opcode was introduced, and the size of small pickles increased by 2 bytes. That added up too :-(
Still, not enough to reject the PEP. Just get Tim to agree with you, or switch to Tim's proposal.
I just asked a question ;-) If a mandatory proto 4 FRAME opcode were added, it would just increase the bloat for small pickles (up from the currently proposed 8 bytes of additional overhead to 9).

On Mon, 18 Nov 2013 16:02:31 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Guido]
Clearly the framing is the weakest point of the PEP (== elicits the most bikeshedding). I am also unsure about the value of framing when pickles are written to strings.
[Antoine]
It hasn't much value in that case,
It has _no_ value in that case, yes? It doesn't appear to have _much_ value in the case of a seekable stream, either - the implementation has always been free to read ahead then. The real value appears to be in cases of non-seekable streams.
but the cost is also small (8 bytes every 64KB, roughly).
That's small if your pickle is large, but for small pickles it can add up.
Which is annoying. It was already annoying when the PROTO opcode was introduced, and the size of small pickles increased by 2 bytes. That added up too :-(
Are very small pickles that size-sensitive? I have the impression that if 8 bytes vs. e.g. 15 bytes makes a difference for your application, you'd be better off with a hand-made format. Regards Antoine.

[Tim]
... It was already annoying when the PROTO opcode was introduced, and the size of small pickles increased by 2 bytes. That added up too :-(
[Antoine]
Are very small pickles that size-sensitive? I have the impression that if 8 bytes vs. e.g. 15 bytes makes a difference for your application, you'd be better off with a hand-made format.
The difference between 8 and 15 is, e.g., nearly doubling the amount of network traffic (for apps that use pickles across processes or machines). A general approach has no way to guess how it will be used. For example, `multiprocessing` uses pickles extensively for inter-process communication of Python data. Some users try broadcasting giant arrays across processes, while others broadcast oceans of tiny integers (like indices into giant arrays inherited via fork()). Since pickle intends to be "the" Python serialization format, it really should try to be friendly for all plausible uses.

On Mon, 18 Nov 2013 16:25:07 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Antoine]
Are very small pickles that size-sensitive? I have the impression that if 8 bytes vs. e.g. 15 bytes makes a difference for your application, you'd be better off with a hand-made format.
The difference between 8 and 15 is, e.g., nearly doubling the amount of network traffic (for apps that use pickles across processes or machines).
A general approach has no way to guess how it will be used. For example, `multiprocessing` uses pickles extensively for inter-process communication of Python data. Some users try broadcasting giant arrays across processes, while others broadcast oceans of tiny integers (like indices into giant arrays inherited via fork()).
Well, sending oceans of tiny integers will also incur many system calls and additional synchronization costs, since sending data on a multiprocessing Queue has to acquire a semaphore. So it generally sounds like a bad idea, IMHO. That said, I agree with:
Since pickle intends to be "the" Python serialization format, it really should try to be friendly for all plausible uses.
I simply don't think adding a fixed 8-byte overhead is actually annoying. It's less than the PyObject overhead in 64-bit mode... Regards Antoine.

[Antoine]
Well, sending oceans of tiny integers will also incur many system calls and additional synchronization costs, since sending data on a multiprocessing Queue has to acquire a semaphore. So it generally sounds like a bad idea, IMHO.
That said, I agree with:
Since pickle intends to be "the" Python serialization format, it really should try to be friendly for all plausible uses.
I simply don't think adding a fixed 8-byte overhead is actually annoying. It's less than the PyObject overhead in 64-bit mode...
A long-running process can legitimately put billions of items on work queues, far more than could ever fit in RAM simultaneously. Comparing this to PyObject overhead makes no sense to me. Neither does the line of argument "there are several kinds of overheads, so making this overhead worse too doesn't matter". When possible, we should strive not to add overheads that don't repay their costs. For small pickles, an 8-byte size field doesn't appear to buy anything. But I appreciate that it costs implementation effort to avoid producing it in these cases.

On 19 Nov 2013 09:33, "Tim Peters" <tim.peters@gmail.com> wrote:
[Antoine]
Well, sending oceans of tiny integers will also incur many system calls and additional synchronization costs, since sending data on a multiprocessing Queue has to acquire a semaphore. So it generally sounds like a bad idea, IMHO.
That said, I agree with:
Since pickle intends to be "the" Python serialization format, it really should try to be friendly for all plausible uses.
I simply don't think adding a fixed 8-byte overhead is actually annoying. It's less than the PyObject overhead in 64-bit mode...
A long-running process can legitimately put billions of items on work queues, far more than could ever fit in RAM simultaneously. Comparing this to PyObject overhead makes no sense to me. Neither does the line of argument "there are several kinds of overheads, so making this overhead worse too doesn't matter".
When possible, we should strive not to add overheads that don't repay their costs. For small pickles, an 8-byte size field doesn't appear to buy anything. But I appreciate that it costs implementation effort to avoid producing it in these cases.
Given that cases where the overhead matters can drop back to proto 3 if absolutely necessary, perhaps it's reasonable to just run with the current proposal? Cheers, Nick.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 18/11/2013 10:25pm, Tim Peters wrote:
The difference between 8 and 15 is, e.g., nearly doubling the amount of network traffic (for apps that use pickles across processes or machines).
I tried using multiprocessing.Pipe() and send_bytes()/recv_bytes() to send messages between processes: 8 bytes messages -- 525,000 msgs/sec 15 bytes messages -- 556,000 msgs/sec So the size of small messages does not seem to make much difference. (This was in a Linux VM with Python 2.7. Python 3.3 is much slower because it is not implemented in C, but gives similarly close results.) -- Richard

[Richard Oudkerk]
I tried using multiprocessing.Pipe() and send_bytes()/recv_bytes() to send messages between processes:
8 bytes messages -- 525,000 msgs/sec 15 bytes messages -- 556,000 msgs/sec
So the size of small messages does not seem to make much difference.
To the contrary, the lesson is clear: to speed up multiprocessing, the larger the messages the faster it goes ;-)

On 19/11/2013 12:55am, Tim Peters wrote:
[Richard Oudkerk]
I tried using multiprocessing.Pipe() and send_bytes()/recv_bytes() to send messages between processes:
8 bytes messages -- 525,000 msgs/sec 15 bytes messages -- 556,000 msgs/sec
So the size of small messages does not seem to make much difference.
To the contrary, the lesson is clear: to speed up multiprocessing, the larger the messages the faster it goes ;-)
Ah, yes. It was probably round the other way. -- Richard

On 11/18/2013 07:48 AM, Guido van Rossum wrote:
Clearly the framing is the weakest point of the PEP (== elicits the most bikeshedding).
Indeed--it is still ongoing: http://bugs.python.org/issue19780 //arry/

On Sun, 17 Nov 2013 23:53:09 -0600 Tim Peters <tim.peters@gmail.com> wrote:
But I wonder why it isn't done with a new framing opcode instead (say, FRAME followed by 8-byte count). I suppose that would be like the "prefetch" idea, except that framing opcodes would be mandatory (instead of optional) in proto 4. Why I initially like that:
- Uniform decoding loop ("the next thing" _always_ starts with an opcode).
But it's not actually uniform. A frame isn't a normal opcode, it's a large section of bytes that contains potentially many opcodes. The framing layer is really below the opcode layer, it makes also sense to implement it like that. (I also tried to implement Serhiy's PREFETCH idea, but it didn't bring any actual simplification)
When slinging 8-byte counts, _some_ sanity-checking seems like a good idea ;-)
I don't know. It's not much worse (for denial of service opportunities) than a 4-byte count, which already exists in earlier protocols. Regards Antoine.

On Mon, 18 Nov 2013 17:14:21 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 17 Nov 2013 23:53:09 -0600 Tim Peters <tim.peters@gmail.com> wrote:
But I wonder why it isn't done with a new framing opcode instead (say, FRAME followed by 8-byte count). I suppose that would be like the "prefetch" idea, except that framing opcodes would be mandatory (instead of optional) in proto 4. Why I initially like that:
- Uniform decoding loop ("the next thing" _always_ starts with an opcode).
But it's not actually uniform. A frame isn't a normal opcode, it's a large section of bytes that contains potentially many opcodes.
The framing layer is really below the opcode layer, it makes also sense to implement it like that.
(I also tried to implement Serhiy's PREFETCH idea, but it didn't bring any actual simplification)
When slinging 8-byte counts, _some_ sanity-checking seems like a good idea ;-)
I don't know. It's not much worse (for denial of service opportunities) than a 4-byte count, which already exists in earlier protocols.
Actually, now that I think of it, it's even better. A 2**63 bytes allocation is guaranteed to fail, since most 64-bit CPUs have a smaller address space than that (for example, x86-64 CPUs seem to have a 48 bits virtual address space). On the other hand, a 2**31 bytes allocation may very well succeed, eat almost all the RAM and/or slow down the machine by swapping out. Regards Antoine.

[Tim]
But I wonder why it isn't done with a new framing opcode instead (say, FRAME followed by 8-byte count). I suppose that would be like the "prefetch" idea, except that framing opcodes would be mandatory (instead of optional) in proto 4. Why I initially like that:
- Uniform decoding loop ("the next thing" _always_ starts with an opcode).
But it's not actually uniform. A frame isn't a normal opcode, it's a large section of bytes that contains potentially many opcodes.
The framing layer is really below the opcode layer, it makes also sense to implement it like that.
That makes sense to me.
(I also tried to implement Serhiy's PREFETCH idea, but it didn't bring any actual simplification)
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right? Which leads to another idea: after the PROTO opcode, there is, or is not, an optional PREFETCH opcde with an 8-byte argument. If the PREFETCH opcode exists, then it gives the number of bytes up to and including the pickle's STOP opcode. So there's exactly 0 or 1 PREFETCH opcodes per pickle. Is there an advantage to spraying multiple 8-byte "frame counts" throughout a pickle stream? 8 bytes is surely enough to specify the size of any single pickle for half a generation ;-) to come.
When slinging 8-byte counts, _some_ sanity-checking seems like a good idea ;-)
I don't know. It's not much worse (for denial of service opportunities) than a 4-byte count, which already exists in earlier protocols.
I'm not thinking of DOS at all, just general sanity as data objects get larger & larger. Pickles have almost no internal checks now. But I've seen my share of corrupted pickles! About the only thing that catches them early is hitting a byte that isn't a legitimate pickle opcode. That _used_ to be a much stronger check than it is now, because the 8-bit opcode space was sparsely populated at first. But, over time, more and more opcodes get added, so the chance of mistaking a garbage byte for a legit opcode has increased correspondingly. A PREFETCH opcode with a "bytes until STOP" makes for a decent bad ;-) sanity check too ;-)

On Mon, 18 Nov 2013 16:18:21 -0600 Tim Peters <tim.peters@gmail.com> wrote:
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right?
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Which leads to another idea: after the PROTO opcode, there is, or is not, an optional PREFETCH opcde with an 8-byte argument. If the PREFETCH opcode exists, then it gives the number of bytes up to and including the pickle's STOP opcode. So there's exactly 0 or 1 PREFETCH opcodes per pickle.
Is there an advantage to spraying multiple 8-byte "frame counts" throughout a pickle stream?
Well, yes: much better memory usage for large pickles. Some people use pickles to store huge data, which was the motivation to add the 8-byte-size opcodes after all.
I'm not thinking of DOS at all, just general sanity as data objects get larger & larger. Pickles have almost no internal checks now. But I've seen my share of corrupted pickles!
IMO, any validation should use a dedicated CRC-like scheme, rather than relying on the fact that correct pickles are statistically unlikely :-) (or you can implicitly rely on TCP or UDP checksums, or you disk subsystem's own sanity checks) Regards Antoine.

[Tim]
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right?
[Antoine]
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Ah, of course. Presumably the outgoing pickle stream is first stored in some memory buffer, right? If pickling completes before the buffer is first flushed, then you know exactly how large the entire pickle is. If "it's small" (say, < 100 bytes), don't write out the PREFETCH part. Else do.
Which leads to another idea: after the PROTO opcode, there is, or is not, an optional PREFETCH opcde with an 8-byte argument. If the PREFETCH opcode exists, then it gives the number of bytes up to and including the pickle's STOP opcode. So there's exactly 0 or 1 PREFETCH opcodes per pickle.
Is there an advantage to spraying multiple 8-byte "frame counts" throughout a pickle stream?
Well, yes: much better memory usage for large pickles. Some people use pickles to store huge data, which was the motivation to add the 8-byte-size opcodes after all.
We'd have the same advantage _if_ it were feasible to know the entire size up front. I understand now that it's not feasible.
... IMO, any validation should use a dedicated CRC-like scheme, rather than relying on the fact that correct pickles are statistically unlikely :-)
OK! A new CRC opcode every two bytes ;-)
(or you can implicitly rely on TCP or UDP checksums, or you disk subsystem's own sanity checks)
Of course we've been doing that all along. Yet I've still seen my share of corrupted pickles anyway. Don't underestimate the determination of users to screw up everything possible in every way possible ;-)

On 19 Nov 2013 08:52, "Tim Peters" <tim.peters@gmail.com> wrote:
[Tim]
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right?
[Antoine]
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Ah, of course. Presumably the outgoing pickle stream is first stored in some memory buffer, right? If pickling completes before the buffer is first flushed, then you know exactly how large the entire pickle is. If "it's small" (say, < 100 bytes), don't write out the PREFETCH part. Else do.
Which leads to another idea: after the PROTO opcode, there is, or is not, an optional PREFETCH opcde with an 8-byte argument. If the PREFETCH opcode exists, then it gives the number of bytes up to and including the pickle's STOP opcode. So there's exactly 0 or 1 PREFETCH opcodes per pickle.
Is there an advantage to spraying multiple 8-byte "frame counts" throughout a pickle stream?
Well, yes: much better memory usage for large pickles. Some people use pickles to store huge data, which was the motivation to add the 8-byte-size opcodes after all.
We'd have the same advantage _if_ it were feasible to know the entire size up front. I understand now that it's not feasible.
This may be a dumb suggestion (since I don't know the pickle infrastructure at all), but could there be a short section of unframed data before switching to framing mode? For example, emit up to 256 bytes unframed in all pickles, then start emitting appropriate FRAME opcodes if the pickle continues on? Cheers, Nick.
... IMO, any validation should use a dedicated CRC-like scheme, rather than relying on the fact that correct pickles are statistically unlikely :-)
OK! A new CRC opcode every two bytes ;-)
(or you can implicitly rely on TCP or UDP checksums, or you disk subsystem's own sanity checks)
Of course we've been doing that all along. Yet I've still seen my share of corrupted pickles anyway. Don't underestimate the determination of users to screw up everything possible in every way possible ;-) _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Ok, how about merging the two sub-threads :-) On Mon, 18 Nov 2013 16:44:59 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Antoine]
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Ah, of course. Presumably the outgoing pickle stream is first stored in some memory buffer, right? If pickling completes before the buffer is first flushed, then you know exactly how large the entire pickle is. If "it's small" (say, < 100 bytes), don't write out the PREFETCH part. Else do.
That's true. We could also have a SMALLPREFETCH opcode with a one-byte length to still get the benefits of prefetching.
Well, yes: much better memory usage for large pickles. Some people use pickles to store huge data, which was the motivation to add the 8-byte-size opcodes after all.
We'd have the same advantage _if_ it were feasible to know the entire size up front. I understand now that it's not feasible.
AFAICT, it would only be possible by doing two-pass pickling, which would also slow it down massively.
A long-running process can legitimately put billions of items on work queues, far more than could ever fit in RAM simultaneously. Comparing this to PyObject overhead makes no sense to me. Neither does the line of argument "there are several kinds of overheads, so making this overhead worse too doesn't matter".
Well, it's a question of cost / benefit: does it make sense to optimize something that will be dwarfed by other factors in real world situations?
When possible, we should strive not to add overheads that don't repay their costs. For small pickles, an 8-byte size field doesn't appear to buy anything. But I appreciate that it costs implementation effort to avoid producing it in these cases.
I share the concern, although I still don't think the "ocean of tiny pickles" is a reasonable use case :-) That said, assuming you think this is important (do you?), we're left with the following constraints: - it would be nice to have this PEP in 3.4 - 3.4 beta1 and feature freeze is in approximately one week - switching to the PREFETCH scheme requires some non-trivial work on the current patch, work done by either Alexandre or me (but I already have pathlib (PEP 428) on my plate, so it'll have to be Alexandre) - unless you want to do it, of course? What do you think? Regards Antoine.

[Antoine]
... Well, it's a question of cost / benefit: does it make sense to optimize something that will be dwarfed by other factors in real world situations?
For most of my career, a megabyte of RAM was an unthinkable luxury. Now I'm running on an OS that needs a gigabyte of RAM just to boot. So I'm old enough to think from the opposite end: "does it make sense to bloat something that's already working fine?". It's the "death of a thousand cuts" (well, this thing doesn't matter _that_ much - and neither does that, nor those others over there ...) that leads to a GiB-swallowing OS and a once-every-4-years crusade to reduce Python's startup time. For examples ;-)
... That said, assuming you think this is important (do you?),
Honestly, it's more important to me "in theory" to oppose unnecessary bloat (of all kinds). But, over time, it's attitude that shapes results, so I'm not apologetic about that.
we're left with the following constraints: - it would be nice to have this PEP in 3.4 - 3.4 beta1 and feature freeze is in approximately one week - switching to the PREFETCH scheme requires some non-trivial work on the current patch, work done by either Alexandre or me (but I already have pathlib (PEP 428) on my plate, so it'll have to be Alexandre) - unless you want to do it, of course?
What do you think?
I wouldn't hold up PEP acceptance for this. It won't be a disaster in any case, just - at worse - another little ratchet in the "needless bloat" direction. And the PEP has more things that are pure wins. If it's possible to squeeze in the variable-length encoding, that would be great. If I were you, though, I'd check the patch in as-is, just in case. I can't tell whether I'll have time to work on it (have other things going now outside my control).

19.11.13 01:10, Antoine Pitrou написав(ла):
- switching to the PREFETCH scheme requires some non-trivial work on the current patch, work done by either Alexandre or me (but I already have pathlib (PEP 428) on my plate, so it'll have to be Alexandre) - unless you want to do it, of course?
I had implemented optimized PREFETCH scheme months ago (the code was smaller than with the framing layer). I need a day or two to update the code to the current Alexandre's patch.

On Mon, 18 Nov 2013 16:44:59 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right?
[Antoine]
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Ah, of course. Presumably the outgoing pickle stream is first stored in some memory buffer, right? If pickling completes before the buffer is first flushed, then you know exactly how large the entire pickle is. If "it's small" (say, < 100 bytes), don't write out the PREFETCH part. Else do.
Yet another possibility: keep framing but use a variable-length encoding for the frame size: - first byte: bits 7-5: N (= frame size bytes length - 1) - first byte: bits 4-0: first 5 bits of frame size - remaning N bytes: remaining bits of frame size With this scheme, very small pickles have a one byte overhead; small ones a two byte overhead; and the max frame size is 2**61 rather than 2**64, which should still be sufficient. And the frame size is read using either one or two read() calls, which is efficient. Regards Antoine.

On 19 November 2013 09:57, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 18 Nov 2013 16:44:59 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right?
[Antoine]
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Ah, of course. Presumably the outgoing pickle stream is first stored in some memory buffer, right? If pickling completes before the buffer is first flushed, then you know exactly how large the entire pickle is. If "it's small" (say, < 100 bytes), don't write out the PREFETCH part. Else do.
Yet another possibility: keep framing but use a variable-length encoding for the frame size:
- first byte: bits 7-5: N (= frame size bytes length - 1) - first byte: bits 4-0: first 5 bits of frame size - remaning N bytes: remaining bits of frame size
With this scheme, very small pickles have a one byte overhead; small ones a two byte overhead; and the max frame size is 2**61 rather than 2**64, which should still be sufficient.
And the frame size is read using either one or two read() calls, which is efficient.
And it's only a minimal change from the current patch. Sounds good to me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Nov 18, 2013 at 4:30 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 19 November 2013 09:57, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 18 Nov 2013 16:44:59 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
But it has a different kind of advantage: PREFETCH was optional. As Guido said, it's annoying to bloat the size of small pickles (which may, although individually small, occur in great numbers) by 8 bytes each. There's really no point to framing small chunks of data, right?
[Antoine]
You can't know how much space the pickle will take until the pickling ends, though, which makes it difficult to decide whether you want to emit a PREFETCH opcode or not.
Ah, of course. Presumably the outgoing pickle stream is first stored in some memory buffer, right? If pickling completes before the buffer is first flushed, then you know exactly how large the entire pickle is. If "it's small" (say, < 100 bytes), don't write out the PREFETCH part. Else do.
Yet another possibility: keep framing but use a variable-length encoding for the frame size:
- first byte: bits 7-5: N (= frame size bytes length - 1) - first byte: bits 4-0: first 5 bits of frame size - remaning N bytes: remaining bits of frame size
With this scheme, very small pickles have a one byte overhead; small ones a two byte overhead; and the max frame size is 2**61 rather than 2**64, which should still be sufficient.
And the frame size is read using either one or two read() calls, which is efficient.
And it's only a minimal change from the current patch. Sounds good to me.
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme? -- --Guido van Rossum (python.org/~guido)

[Guido]
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme?
Yes, but not for protocol 4 - time's running out fast for that. When we "only" had the XXX1, XXX2, and XXX4 opcodes, it was kinda silly, but after adding XXX8 flavors to all of those it's become unbearable.

On Mon, 18 Nov 2013 16:48:05 -0800 Guido van Rossum <guido@python.org> wrote:
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme?
Well, it's not that cumbersome... If you look at CPU encodings, they also tend to have different opcodes for different immediate lengths. In your case, I'd say it mostly leads to a bit of code duplication. But the opcode space is far from exhausted right now :) Regards Antoine.

So why is framing different? On Tue, Nov 19, 2013 at 10:51 AM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Mon, 18 Nov 2013 16:48:05 -0800 Guido van Rossum <guido@python.org> wrote:
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme?
Well, it's not that cumbersome... If you look at CPU encodings, they also tend to have different opcodes for different immediate lengths.
In your case, I'd say it mostly leads to a bit of code duplication. But the opcode space is far from exhausted right now :)
Regards
Antoine.
-- --Guido van Rossum (python.org/~guido)

So using an opcode for framing is out? (Sorry, I've lost track of the back-and-forth.) On Tue, Nov 19, 2013 at 10:57 AM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Tue, 19 Nov 2013 10:52:58 -0800 Guido van Rossum <guido@python.org> wrote:
So why is framing different?
Because it doesn't use opcodes, so it can't use different opcodes to differentiate between different frame size widths :-)
Regards
Antoine.
-- --Guido van Rossum (python.org/~guido)

On Tue, 19 Nov 2013 11:05:45 -0800 Guido van Rossum <guido@python.org> wrote:
So using an opcode for framing is out? (Sorry, I've lost track of the back-and-forth.)
It doesn't seem to bring anything, and it makes the overhead worse for tiny pickles (since it will be two bytes at least, instead of one byte with the current variable length encoding proposal). If overhead doesn't matter, I'm fine with keeping a simple 8-bytes frame size :-) Regards Antoine.

Well, both fixed 8-byte framing and variable-size framing it introduces a new way of representing numbers in the stream, which means that everyone parsing and generating pickles must be able to support both styles. (But fixed is easier since the XXX8 opcodes use the same format.) I'm thinking of how you correctly read a pickle from a non-buffering pipe with the minimum number of read() calls without ever reading beyond the end of a valid pickle. (That's a requirement, right?) If you know it's protocol 4: with fixed framing: read 10 bytes, that's the magic word plus first frame; then you can start buffering with variable framing: read 3 bytes, then depending on the 3rd byte read some more to find the frame size; then you can start buffering with mandatory frame opcode: pretty much the same with optional frame opcode: pretty much the same (the 3rd byte must be a valid opcode, even if it isn't a frame opcode) if you don't know the protocol number: read the first byte, then read the second byte (or not if it's not explicitly versioned), then you know the protocol and can do the rest as above On Tue, Nov 19, 2013 at 11:11 AM, Antoine Pitrou <solipsis@pitrou.net>wrote:
On Tue, 19 Nov 2013 11:05:45 -0800 Guido van Rossum <guido@python.org> wrote:
So using an opcode for framing is out? (Sorry, I've lost track of the back-and-forth.)
It doesn't seem to bring anything, and it makes the overhead worse for tiny pickles (since it will be two bytes at least, instead of one byte with the current variable length encoding proposal).
If overhead doesn't matter, I'm fine with keeping a simple 8-bytes frame size :-)
Regards
Antoine.
-- --Guido van Rossum (python.org/~guido)

[Guido]
So using an opcode for framing is out? (Sorry, I've lost track of the back-and-forth.)
It was never in ;-) I'd *prefer* one, but not enough to try to block the PEP. As is, framing is done at a "lower level" than opcode decoding. I fear this is brittle, for all the usual "explicit is better than implicit" kinds of reasons. The only way now to know that you're looking at a frame size is to keep a running count of bytes processed and realize you've reached a byte offset where a frame size "is expected". With an opcode, framing could also be optional (whenever desired), because frame sizes would be _explicitly_ marked in the byte stream Then the framing overhead for small pickles could drop to 0 bytes (instead of the current 8, or 1 thru 9 under various other schemes). Ideal would be an explicit framing opcode combined with variable-length size encoding. That would never require more bytes than the current scheme, and 'almost always" require fewer. But even I don't think it's of much value to chop a few bytes off every 64KB of pickle ;-)

On Tue, 19 Nov 2013 13:22:52 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Guido]
So using an opcode for framing is out? (Sorry, I've lost track of the back-and-forth.)
It was never in ;-) I'd *prefer* one, but not enough to try to block the PEP. As is, framing is done at a "lower level" than opcode decoding. I fear this is brittle, for all the usual "explicit is better than implicit" kinds of reasons. The only way now to know that you're looking at a frame size is to keep a running count of bytes processed and realize you've reached a byte offset where a frame size "is expected".
That's integrated to the built-in buffering. It's not really an additional constraint: the frame sizes simply dictate how buffering happens in practice. The main point of framing is to *simplify* the buffering logic (of course, the old buffering logic is still there for protocols <= 3, unfortunately). Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?) - a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes Regards Antoine.

Am 19.11.13 20:59, schrieb Antoine Pitrou:
That's integrated to the built-in buffering. It's not really an additional constraint: the frame sizes simply dictate how buffering happens in practice. The main point of framing is to *simplify* the buffering logic (of course, the old buffering logic is still there for protocols <= 3, unfortunately).
I wonder why this needs to be part of the pickle protocol at all, if it really is "below" the opcodes. Anybody desiring framing could just implement a framing version of the io.BufferedReader, which could be used on top of a socket connection (say) to allow fetching larger blocks from the network stack. This would then be transparent to the pickle implementation; the framing reader would, of course, provide the peek() operation to allow the unpickler to continue to use buffering. Such a framing BufferedReader might even be included in the standard library. Regards, Martin

On Tue, 19 Nov 2013 21:25:34 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
Am 19.11.13 20:59, schrieb Antoine Pitrou:
That's integrated to the built-in buffering. It's not really an additional constraint: the frame sizes simply dictate how buffering happens in practice. The main point of framing is to *simplify* the buffering logic (of course, the old buffering logic is still there for protocols <= 3, unfortunately).
I wonder why this needs to be part of the pickle protocol at all, if it really is "below" the opcodes. Anybody desiring framing could just implement a framing version of the io.BufferedReader, which could be used on top of a socket connection (say) to allow fetching larger blocks from the network stack. This would then be transparent to the pickle implementation; the framing reader would, of course, provide the peek() operation to allow the unpickler to continue to use buffering.
Such a framing BufferedReader might even be included in the standard library.
Well, unless you propose a patch before Saturday, I will happily ignore your proposal. Regards Antoine.

Am 19.11.13 21:28, schrieb Antoine Pitrou:
Well, unless you propose a patch before Saturday, I will happily ignore your proposal.
See http://bugs.python.org/file32709/framing.diff Regards, Martin

On Tue, 19 Nov 2013 23:05:07 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
Am 19.11.13 21:28, schrieb Antoine Pitrou:
Well, unless you propose a patch before Saturday, I will happily ignore your proposal.
See
Ok, thanks. So now that I look at the patch I see the following problems with this idea: - "pickle + framing" becomes a different protocol than "pickle" alone, which means we lose the benefit of protocol autodetection. It's as though pickle.load() required you to give the protocol number, instead of inferring it from the pickle bytestream. - it is less efficient than framing built inside pickle, since it adds separate buffers and memory copies (while the point of framing is to make buffering more efficient). Your idea is morally similar to saying "we don't need to optimize the size of pickles, since you can gzip them anyway". However, the fact that the _pickle module currently goes to lengths to try to optimize buffering, implies to me that it's reasonable to also improve the pickle protocol so as to optimize buffering. Regards Antoine.

Am 19.11.13 23:50, schrieb Antoine Pitrou:
Ok, thanks. So now that I look at the patch I see the following problems with this idea:
- "pickle + framing" becomes a different protocol than "pickle" alone, which means we lose the benefit of protocol autodetection. It's as though pickle.load() required you to give the protocol number, instead of inferring it from the pickle bytestream.
Not necessarily. Framing becomes a different protocol, yes. But autodetection would still be possible (it actually is possible in my proposed definition).
- it is less efficient than framing built inside pickle, since it adds separate buffers and memory copies (while the point of framing is to make buffering more efficient).
Correct. However, if the intent is to reduce the number of system calls, then this is still achieved.
Your idea is morally similar to saying "we don't need to optimize the size of pickles, since you can gzip them anyway".
Not really. In the case of gzip, it might be that the size reduction of properly saving bytes in pickle might be even larger. Here, the wire representation, and the number of system calls is actually (nearly) identical.
However, the fact that the _pickle module currently goes to lengths to try to optimize buffering, implies to me that it's reasonable to also improve the pickle protocol so as to optimize buffering.
AFAICT, the real driving force is the desire to not read-ahead more than the pickle is long. This is what complicates the code. The easiest (and most space-efficient) solution to that problem would be to prefix the entire pickle with a data size field (possibly in a variable-length representation), i.e. to make a single frame. If that was done, I would guess that Tim's concerns about brittleness would go away (as you couldn't have a length field in the middle of data). IMO, the PEP has nearly the same flaw as the HTTP chunked transfer, which also puts length fields in the middle of the payload (except that HTTP makes it worse by making them optional). Of course, a single length field has other drawbacks, such as having to pickle everything before sending out the first bytes. Regards, Martin

On Wed, 20 Nov 2013 00:56:13 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
AFAICT, the real driving force is the desire to not read-ahead more than the pickle is long. This is what complicates the code. The easiest (and most space-efficient) solution to that problem would be to prefix the entire pickle with a data size field (possibly in a variable-length representation), i.e. to make a single frame.
Pickling then becomes very problematic: you have to keep the entire pickle in memory until the end, when you finally can write the size at the beginning of the pickle.
If that was done, I would guess that Tim's concerns about brittleness would go away (as you couldn't have a length field in the middle of data). IMO, the PEP has nearly the same flaw as the HTTP chunked transfer, which also puts length fields in the middle of the payload (except that HTTP makes it worse by making them optional).
Tim's concern is easily addressed with a FRAME opcode, without changing the overall scheme (as he lately proposed). Regards Antoine.

[Martin v. Löwis]
... AFAICT, the real driving force is the desire to not read-ahead more than the pickle is long. This is what complicates the code. The easiest (and most space-efficient) solution to that problem would be to prefix the entire pickle with a data size field (possibly in a variable-length representation), i.e. to make a single frame.
In a bout of giddy optimism, I suggested that earlier in the thread. It would be sweet :-)
If that was done, I would guess that Tim's concerns about brittleness would go away (as you couldn't have a length field in the middle of data). IMO, the PEP has nearly the same flaw as the HTTP chunked transfer, which also puts length fields in the middle of the payload (except that HTTP makes it worse by making them optional).
Of course, a single length field has other drawbacks, such as having to pickle everything before sending out the first bytes.
And that's the killer. Pickle strings are generally produced incrementally, in smallish pieces. But that may go on for very many iterations, and there's no way to guess the final size in advance. I only see three ways to do it: 1. Hope the whole string fits in RAM. 2. Pickle twice, the first time just to get the final size (& throw the pickle pieces away on the first pass while summing their sizes). 3. Flush the pickle string to disk periodically, then after it's done read it up and copy it to the intended output stream. All of those really suck :-( BTW, I'm not a web guy: in what way is HTTP chunked transfer mode viewed as being flawed? Everything I ever read about it seemed to think it was A Good Idea.

Am 20.11.13 06:18, schrieb Tim Peters:
BTW, I'm not a web guy: in what way is HTTP chunked transfer mode viewed as being flawed? Everything I ever read about it seemed to think it was A Good Idea.
It just didn't work for some time, see e.g. http://bugs.python.org/issue1486335 http://bugs.python.org/issue1966 http://bugs.python.org/issue1312980 http://bugs.python.org/issue3761 It's not that the protocol was underspecified - just the implementation was "brittle" (if I understand that word correctly). And I believe (and agree with you) that the cause for this "difficult to implement" property is that the framing is in putting framing "in the middle" of the stack (i.e. not really *below* pickle itself, but into pickle but below the opcodes - just like http chunked transfer is "in" http, but below the content encoding). Regards, Martin

A problem with chunked IIRC is that the frame headers are variable-length (a CRLF followed by a hex number followed by some optional gunk followed by CRLF) so you have to drop back into one-byte-at-a-time to read it. (Well, I suppose you could read 5 bytes, which is the minimum: CR, LF, X, CR, LF, and when the second CR isn't among these, you have a lower bound for how much more to read, although at that point you better load up on coffee before writing the rest of the code. :-) Some good things about it: - Explicit final frame (byte count zero), so no need to rely on the data to know the end. - The redundancy in the format (start and end with CRLF, hex numbers) makes it more likely that framing errors (e.g. due to an incorrect counting or some layer collapsing CRLF into LF) are detected. On Wed, Nov 20, 2013 at 7:44 AM, "Martin v. Löwis" <martin@v.loewis.de>wrote:
Am 20.11.13 06:18, schrieb Tim Peters:
BTW, I'm not a web guy: in what way is HTTP chunked transfer mode viewed as being flawed? Everything I ever read about it seemed to think it was A Good Idea.
It just didn't work for some time, see e.g.
http://bugs.python.org/issue1486335 http://bugs.python.org/issue1966 http://bugs.python.org/issue1312980 http://bugs.python.org/issue3761
It's not that the protocol was underspecified - just the implementation was "brittle" (if I understand that word correctly). And I believe (and agree with you) that the cause for this "difficult to implement" property is that the framing is in putting framing "in the middle" of the stack (i.e. not really *below* pickle itself, but into pickle but below the opcodes - just like http chunked transfer is "in" http, but below the content encoding).
Regards, Martin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

[Tim]
BTW, I'm not a web guy: in what way is HTTP chunked transfer mode viewed as being flawed? Everything I ever read about it seemed to think it was A Good Idea.
[Martin]
It just didn't work for some time, see e.g.
http://bugs.python.org/issue1486335 http://bugs.python.org/issue1966 http://bugs.python.org/issue1312980 http://bugs.python.org/issue3761
It's not that the protocol was underspecified - just the implementation was "brittle" (if I understand that word correctly).
"Easily broken in catastrophic ways" is close, like a chunk of peanut brittle can shatter into a gazillion pieces if you drop it on the floor. http://en.wikipedia.org/wiki/Brittle_(food) Or like the infinite loops in some of the bug reports, "just because" some server screwed up the protocol a little at EOF. But for pickling there are a lot fewer picklers than HTML transport creators ;-) So I'm not much worried about that. Another of the bug reports amounted just to that urllib, at first, didn't support chunked transfer mode at all.
And I believe (and agree with you) that the cause for this "difficult to implement" property is that the framing is in putting framing "in the middle" of the stack (i.e. not really *below* pickle itself, but into pickle but below the opcodes - just like http chunked transfer is "in" http, but below the content encoding).
It's certainly messy that way. But doable, and I expect the people working on it are more than capable enough to get it right, by at latest the 4th try ;-)

19.11.13 21:59, Antoine Pitrou написав(ла):
Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?)
This is only one simple check when reading the frame opcode.
- a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes
But with naked frame sizes without opcodes it have even more chance to produce invalid pickle.

[Tim]
... better than implicit" kinds of reasons. The only way now to know that you're looking at a frame size is to keep a running count of bytes processed and realize you've reached a byte offset where a frame size "is expected".
[Antoine]
That's integrated to the built-in buffering.
Well, obviously, because it wouldn't work at all unless the built-in buffering knew all about it ;-)
It's not really an additional constraint: the frame sizes simply dictate how buffering happens in practice. The main point of framing is to *simplify* the buffering logic (of course, the old buffering logic is still there for protocols <= 3, unfortunately).
And always will be - there are no pickle simplifications, because everything always sticks around forever. Over time, pickle just gets more complicated. That's in the nature of the beast.
Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?) - a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes
I suspect we have very different mental models here. By "has an opcode", I do NOT mean "must be visible to the opcode-decoding loop". I just mean "has a unique byte assigned in the pickle opcode space". I expect that in the CPython implementation of unpickling, the buffering layer would _consume_ the FRAME opcode, along with the frame size. The opcode-decoding loop would never see it. But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic. So, then, to the 2 points you raised: 1. If the CPython decoder ever sees a FRAME opcode, I expect it to raise an exception. That's all - it's an invalid pickle (or bug in the code) if it contains a FRAME the buffering layer didn't consume. 2. pickletools.optimize() in the CPython implementation should never see a FRAME opcode either. Initially, all I desperately ;-) want changed here is for the _buffering layer_, on the writing end, to write 9 bytes instead of 8 (1 new one for a FRAME opcode), and on the reading end to consume 9 bytes instead of 8 (extra credit if it checked the first byte to verify it really is a FRAME opcode - there's nothing wrong with sanity checks). Then it becomes _possible_ to optimize "small pickles" later (in the sense of not bothering to frame them at all). So long as frames remain implicit magic, that's impossible without moving to yet another new protocol level.

On Tue, 19 Nov 2013 15:17:06 -0600 Tim Peters <tim.peters@gmail.com> wrote:
Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?) - a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes
I suspect we have very different mental models here. By "has an opcode", I do NOT mean "must be visible to the opcode-decoding loop". I just mean "has a unique byte assigned in the pickle opcode space".
I expect that in the CPython implementation of unpickling, the buffering layer would _consume_ the FRAME opcode, along with the frame size. The opcode-decoding loop would never see it.
But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic.
Ahah, ok, I see where you're going. But how many other implementations of unpickling are there?
Initially, all I desperately ;-) want changed here is for the _buffering layer_, on the writing end, to write 9 bytes instead of 8 (1 new one for a FRAME opcode), and on the reading end to consume 9 bytes instead of 8 (extra credit if it checked the first byte to verify it really is a FRAME opcode - there's nothing wrong with sanity checks).
Then it becomes _possible_ to optimize "small pickles" later (in the sense of not bothering to frame them at all).
So the CPython unpickler must be able to work with and without framing by detecting the FRAME opcode? Otherwise the "later optimization" can't work. Regards Antoine.

[Tim]
... But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic.
[Antoine]
Ahah, ok, I see where you're going. But how many other implementations of unpickling are there?
That's something you should have researched when writing the PEP ;-) How many implementations of Python aren't CPython? That's probably the answer. I'm not an expert on that, but there's more than one.
Initially, all I desperately ;-) want changed here is for the _buffering layer_, on the writing end, to write 9 bytes instead of 8 (1 new one for a FRAME opcode), and on the reading end to consume 9 bytes instead of 8 (extra credit if it checked the first byte to verify it really is a FRAME opcode - there's nothing wrong with sanity checks).
Then it becomes _possible_ to optimize "small pickles" later (in the sense of not bothering to frame them at all).
So the CPython unpickler must be able to work with and without framing by detecting the FRAME opcode?
Not at first, no. At first the buffering layer could raise an exception if there's no FRAME opcode when it expected one. Or just read up garbage bytes and assume it's a frame size, which is effectively what it's doing now anyway ;-)
Otherwise the "later optimization" can't work.
Right. _If_ reducing framing overhead to "nothing" for small pickles turns out to be sufficiently desirable, then the buffering layer would need to learn how to turn itself off in the absence of a FRAME opcode immediately following the current frame. Perhaps the opcode decoding loop would also need to learn how to turn the buffering layer back on again too (next time a FRAME opcode showed up). Sounds annoying, but not impossible.

On Tue, 19 Nov 2013 15:41:51 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
... But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic.
[Antoine]
Ahah, ok, I see where you're going. But how many other implementations of unpickling are there?
That's something you should have researched when writing the PEP ;-) How many implementations of Python aren't CPython? That's probably the answer. I'm not an expert on that, but there's more than one.
But "how many of them use something else than Lib/pickle.py" is the actual question.
Otherwise the "later optimization" can't work.
Right. _If_ reducing framing overhead to "nothing" for small pickles turns out to be sufficiently desirable, then the buffering layer would need to learn how to turn itself off in the absence of a FRAME opcode immediately following the current frame. Perhaps the opcode decoding loop would also need to learn how to turn the buffering layer back on again too (next time a FRAME opcode showed up). Sounds annoying, but not impossible.
The problem with "let's make the unpickler more lenient in a later version" is that then you have protocol 4 pickles that won't work with all protocol 4-accepting versions of the pickle module. Regards Antoine.

[Antoine]
Ahah, ok, I see where you're going. But how many other implementations of unpickling are there?
[Tim]
That's something you should have researched when writing the PEP ;-) How many implementations of Python aren't CPython? That's probably the answer. I'm not an expert on that, but there's more than one.
[Antoine]
But "how many of them use something else than Lib/pickle.py" is the actual question.
I don't know - and neither do you ;-) I do know that I'd like, e.g., a version of pickletools.dis() in CPython that _did_ show the framing bits, for debugging. That's a bare-bones "unpickler". I don't know how many other "partial unpicklers" exist in the wild either. But their lives would also be much easier if the framing stuff were explicit. "Mandatory optimization" should be an oxymoron ;-)
... The problem with "let's make the unpickler more lenient in a later version" is that then you have protocol 4 pickles that won't work with all protocol 4-accepting versions of the pickle module.
Yup. s/4/5/ would need to be part of a delayed optimization.

On Tue, 19 Nov 2013 16:06:22 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Antoine]
Ahah, ok, I see where you're going. But how many other implementations of unpickling are there?
[Tim]
That's something you should have researched when writing the PEP ;-) How many implementations of Python aren't CPython? That's probably the answer. I'm not an expert on that, but there's more than one.
[Antoine]
But "how many of them use something else than Lib/pickle.py" is the actual question.
I don't know - and neither do you ;-)
I do know that I'd like, e.g., a version of pickletools.dis() in CPython that _did_ show the framing bits, for debugging. That's a bare-bones "unpickler". I don't know how many other "partial unpicklers" exist in the wild either. But their lives would also be much easier if the framing stuff were explicit. "Mandatory optimization" should be an oxymoron ;-)
Well, I don't think it's a big deal to add a FRAME opcode if it doesn't change the current framing logic. I'd like to defer to Alexandre on this one, anyway. Regards Antoine.

On Tue, Nov 19, 2013 at 2:09 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Well, I don't think it's a big deal to add a FRAME opcode if it doesn't change the current framing logic. I'd like to defer to Alexandre on this one, anyway.
Looking at the different options available to us: 1A. Mandatory framing (+) Allows the internal buffering layer of the Unpickler to rely on the presence of framing to simplify its implementation. (-) Forces all implementations of pickle to include support for framing if they want to use the new protocol. (-) Cannot be removed from future versions of the Unpickler without breaking protocols which mandates framing. 1B. Optional framing (+) Could allow optimizations to disable framing if beneficial (e.g., when pickling to and unpickling from a string). 2A. With explicit FRAME opcode (+) Makes optional framing simpler to implement. (+) Makes variable-length encoding of the frame size simpler to implement. (+) Makes framing visible to pickletools. (-) Adds an extra byte of overhead to each frames. 2B. No opcode 3A. With fixed 8-bytes headers (+) Is simple to implement (-) Adds overhead to small pickles. 3B. With variable-length headers (-) Requires Pickler implemention to do extra data copies when pickling to strings. 4A. Framing baked-in the pickle protocol (+) Enables faster implementations 4B. Framing through a specialized I/O buffering layer (+) Could be reused by other modules I may change my mind as I work on the implementation, but at least for now, I think the combination of 1B, 2A, 3A, 4A will be a reasonable compromise here.

[Alexandre Vassalotti]
Looking at the different options available to us:
1A. Mandatory framing (+) Allows the internal buffering layer of the Unpickler to rely on the presence of framing to simplify its implementation. (-) Forces all implementations of pickle to include support for framing if they want to use the new protocol. (-) Cannot be removed from future versions of the Unpickler without breaking protocols which mandates framing. 1B. Optional framing (+) Could allow optimizations to disable framing if beneficial (e.g., when pickling to and unpickling from a string).
Or to slash the size of small pickles (an 8-byte size field can be more than half the total pickle size).
2A. With explicit FRAME opcode (+) Makes optional framing simpler to implement. (+) Makes variable-length encoding of the frame size simpler to implement. (+) Makes framing visible to pickletools.
(+) Adds (a little) redundancy for sanity checking.
(-) Adds an extra byte of overhead to each frames. 2B. No opcode
3A. With fixed 8-bytes headers (+) Is simple to implement (-) Adds overhead to small pickles. 3B. With variable-length headers (-) Requires Pickler implemention to do extra data copies when pickling to strings.
4A. Framing baked-in the pickle protocol (+) Enables faster implementations 4B. Framing through a specialized I/O buffering layer (+) Could be reused by other modules
I may change my mind as I work on the implementation, but at least for now, I think the combination of 1B, 2A, 3A, 4A will be a reasonable compromise here.
At this time I'd make the same choices, so don't expect an argument from me ;-) Thank you!

On Tue, 19 Nov 2013 19:51:10 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 18 Nov 2013 16:48:05 -0800 Guido van Rossum <guido@python.org> wrote:
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme?
Well, it's not that cumbersome... If you look at CPU encodings, they also tend to have different opcodes for different immediate lengths.
In your case, I'd say it mostly leads to a bit of code duplication. But the opcode space is far from exhausted right now :)
Oops... Make that "in our case", of course. cheers Antoine.

[Antoine]
Yet another possibility: keep framing but use a variable-length encoding for the frame size:
- first byte: bits 7-5: N (= frame size bytes length - 1) - first byte: bits 4-0: first 5 bits of frame size - remaning N bytes: remaining bits of frame size
With this scheme, very small pickles have a one byte overhead; small ones a two byte overhead; and the max frame size is 2**61 rather than 2**64, which should still be sufficient.
And the frame size is read using either one or two read() calls, which is efficient.
That would be a happy compromise :-) I'm unclear on how that would work for, e.g., encoding 40 = 0b000101000. That has 6 significant bits. Would you store 0 in the leading byte and 40 in the second byte? That would work. 2**61 = 2,305,843,009,213,693,952 is a lot of bytes, especially for a pickle ;-)

On Mon, 18 Nov 2013 18:50:01 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Antoine]
Yet another possibility: keep framing but use a variable-length encoding for the frame size:
- first byte: bits 7-5: N (= frame size bytes length - 1) - first byte: bits 4-0: first 5 bits of frame size - remaning N bytes: remaining bits of frame size
With this scheme, very small pickles have a one byte overhead; small ones a two byte overhead; and the max frame size is 2**61 rather than 2**64, which should still be sufficient.
And the frame size is read using either one or two read() calls, which is efficient.
That would be a happy compromise :-)
I'm unclear on how that would work for, e.g., encoding 40 = 0b000101000. That has 6 significant bits. Would you store 0 in the leading byte and 40 in the second byte? That would work.
Yeah, I haven't decided whether it would be big-endian or little-endian. It doesn't matter much. big-endian sounds a bit easier to decode and encode (no bit shifts needed), it's also less consistent with the rest of the pickle opcodes.
2**61 = 2,305,843,009,213,693,952 is a lot of bytes, especially for a pickle ;-)
Let's call it a biggle :-) Regards Antoine.

[Antoine]
- first byte: bits 7-5: N (= frame size bytes length - 1) - first byte: bits 4-0: first 5 bits of frame size - remaning N bytes: remaining bits of frame size
[Tim]
I'm unclear on how that would work for, e.g., encoding 40 = 0b000101000. That has 6 significant bits. Would you store 0 in the leading byte and 40 in the second byte? That would work.
[Antoine]
Yeah, I haven't decided whether it would be big-endian or little-endian. It doesn't matter much. big-endian sounds a bit easier to decode and encode (no bit shifts needed), it's also less consistent with the rest of the pickle opcodes.
As you've told me, the framing layer is beneath the opcode layer, so what opcodes do is irrelevant ;-) Big-endian would be my choice (easier (en)(de)coding, both via software and via eyeball when staring at dumps);.

Hello, I have made two last-minute changes to the PEP: - addition of the FRAME opcode, as discussed with Tim, and keeping a fixed 8-byte frame size - addition of the MEMOIZE opcode, courtesy of Alexandre, which replaces PUT opcodes in protocol 4 and helps shrink the size of pickles If there's no further opposition, I'd like to mark this PEP accepted (or let someone else do it) in 24 hours, so that the implementation can be integrated before Sunday. Regards Antoine. On Sat, 16 Nov 2013 19:15:26 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
Hello,
Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4.
I think it would be good to get the PEP and the implementation accepted for 3.4. As far as I can say, this has been a low-controvery proposal, and it brings fairly obvious improvements to the table (which table?). I still need some kind of BDFL or BDFL delegate to do that, though -- unless I am allowed to mark my own PEP accepted :-)
(I've asked Tim, specifically, for comments, since he contributed a lot to previous versions of the pickle protocol.)
The PEP is at http://www.python.org/dev/peps/pep-3154/ (should be rebuilt soon by the server, I'd say)
Alexandre's implementation is tracked at http://bugs.python.org/issue17810
Regards
Antoine.

[Antoine]
I have made two last-minute changes to the PEP:
- addition of the FRAME opcode, as discussed with Tim, and keeping a fixed 8-byte frame size
Cool!
- addition of the MEMOIZE opcode, courtesy of Alexandre, which replaces PUT opcodes in protocol 4 and helps shrink the size of pickles
Long overdue - clever idea!
If there's no further opposition, I'd like to mark this PEP accepted (or let someone else do it) in 24 hours, so that the implementation can be integrated before Sunday.
I think Guido already spoke on this - but, if he didn't, I will. Accepted :-)

On Wed, 20 Nov 2013 18:45:53 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Antoine]
I have made two last-minute changes to the PEP:
- addition of the FRAME opcode, as discussed with Tim, and keeping a fixed 8-byte frame size
Cool!
- addition of the MEMOIZE opcode, courtesy of Alexandre, which replaces PUT opcodes in protocol 4 and helps shrink the size of pickles
Long overdue - clever idea!
If there's no further opposition, I'd like to mark this PEP accepted (or let someone else do it) in 24 hours, so that the implementation can be integrated before Sunday.
I think Guido already spoke on this - but, if he didn't, I will. Accepted :-)
Thank you! I have marked it accepted then. (with a final nit - EMPTY_FROZENSET isn't necessary and is gone) Regards Antoine.

Yup. Agreed. Ship it! On Wed, Nov 20, 2013 at 4:54 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 20 Nov 2013 18:45:53 -0600 Tim Peters <tim.peters@gmail.com> wrote:
[Antoine]
I have made two last-minute changes to the PEP:
- addition of the FRAME opcode, as discussed with Tim, and keeping a fixed 8-byte frame size
Cool!
- addition of the MEMOIZE opcode, courtesy of Alexandre, which replaces PUT opcodes in protocol 4 and helps shrink the size of pickles
Long overdue - clever idea!
If there's no further opposition, I'd like to mark this PEP accepted (or let someone else do it) in 24 hours, so that the implementation can be integrated before Sunday.
I think Guido already spoke on this - but, if he didn't, I will. Accepted :-)
Thank you! I have marked it accepted then. (with a final nit - EMPTY_FROZENSET isn't necessary and is gone)
Regards
Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

Hello, I've now pushed Alexandre's implementation, and the PEP is marked final. Regards Antoine. On Sat, 16 Nov 2013 19:15:26 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
Hello,
Alexandre Vassalotti (thanks a lot!) has recently finalized his work on the PEP 3154 implementation - pickle protocol 4.
I think it would be good to get the PEP and the implementation accepted for 3.4. As far as I can say, this has been a low-controvery proposal, and it brings fairly obvious improvements to the table (which table?). I still need some kind of BDFL or BDFL delegate to do that, though -- unless I am allowed to mark my own PEP accepted :-)
(I've asked Tim, specifically, for comments, since he contributed a lot to previous versions of the pickle protocol.)
The PEP is at http://www.python.org/dev/peps/pep-3154/ (should be rebuilt soon by the server, I'd say)
Alexandre's implementation is tracked at http://bugs.python.org/issue17810
Regards
Antoine.
participants (12)
-
"Martin v. Löwis"
-
Alexandre Vassalotti
-
Antoine Pitrou
-
Eric Snow
-
francis
-
Guido van Rossum
-
Larry Hastings
-
Nick Coghlan
-
Richard Oudkerk
-
Serhiy Storchaka
-
Tim Peters
-
xiscu