[Python-Dev] Accepting PEP 3154 for 3.4?

Wed Nov 20 00:56:13 CET 2013

Am 19.11.13 23:50, schrieb Antoine Pitrou:
> Ok, thanks. So now that I look at the patch I see the following problems
> with this idea:
> 
> - "pickle + framing" becomes a different protocol than "pickle" alone,
>   which means we lose the benefit of protocol autodetection. It's as
>   though pickle.load() required you to give the protocol number,
>   instead of inferring it from the pickle bytestream.

Not necessarily. Framing becomes a different protocol, yes. But
autodetection would still be possible (it actually is possible in
my proposed definition).

> - it is less efficient than framing built inside pickle, since it
>   adds separate buffers and memory copies (while the point of framing
>   is to make buffering more efficient).

Correct. However, if the intent is to reduce the number of system
calls, then this is still achieved.

> Your idea is morally similar to saying "we don't need to optimize the
> size of pickles, since you can gzip them anyway". 

Not really. In the case of gzip, it might be that the size reduction
of properly saving bytes in pickle might be even larger. Here,
the wire representation, and the number of system calls is actually
(nearly) identical.

> However, the fact
> that the _pickle module currently goes to lengths to try to optimize
> buffering, implies to me that it's reasonable to also improve the
> pickle protocol so as to optimize buffering.

AFAICT, the real driving force is the desire to not read-ahead
more than the pickle is long. This is what complicates the code.
The easiest (and most space-efficient) solution to that problem
would be to prefix the entire pickle with a data size field
(possibly in a variable-length representation), i.e. to make a
single frame.

If that was done, I would guess that Tim's concerns about brittleness
would go away (as you couldn't have a length field in the middle of
data). IMO, the PEP has nearly the same flaw as the HTTP chunked
transfer, which also puts length fields in the middle of the payload
(except that HTTP makes it worse by making them optional).

Of course, a single length field has other drawbacks, such as having
to pickle everything before sending out the first bytes.

Regards,
Martin