[Python-Dev] Accepting PEP 3154 for 3.4?

Tim Peters tim.peters at gmail.com
Mon Nov 18 23:44:59 CET 2013


[Tim]
>> But it has a different kind of advantage:  PREFETCH was optional.  As
>> Guido said, it's annoying to bloat the size of small pickles (which
>> may, although individually small, occur in great numbers) by 8 bytes
>> each.  There's really no point to framing small chunks of data, right?

[Antoine]
> You can't know how much space the pickle will take until the pickling
> ends, though, which makes it difficult to decide whether you want to
> emit a PREFETCH opcode or not.

Ah, of course.  Presumably the outgoing pickle stream is first stored
in some memory buffer, right?  If pickling completes before the buffer
is first flushed, then you know exactly how large the entire pickle
is.  If "it's small" (say, < 100 bytes), don't write out the PREFETCH
part.  Else do.


>> Which leads to another idea:  after the PROTO opcode, there is, or is
>> not, an optional PREFETCH opcde with an 8-byte argument.  If the
>> PREFETCH opcode exists, then it gives the number of bytes up to and
>> including the pickle's STOP opcode.  So there's exactly 0 or 1
>> PREFETCH opcodes per pickle.
>>
>> Is there an advantage to spraying multiple 8-byte "frame counts"
>> throughout a pickle stream?

> Well, yes: much better memory usage for large pickles.
> Some people use pickles to store huge data, which was the motivation to
> add the 8-byte-size opcodes after all.

We'd have the same advantage _if_ it were feasible to know the entire
size up front.  I understand now that it's not feasible.

> ...
> IMO, any validation should use a dedicated CRC-like scheme, rather than
> relying on the fact that correct pickles are statistically unlikely :-)

OK!  A new CRC opcode every two bytes ;-)

> (or you can implicitly rely on TCP or UDP checksums, or you disk
> subsystem's own sanity checks)

Of course we've been doing that all along.  Yet I've still seen my
share of corrupted pickles anyway.  Don't underestimate the
determination of users to screw up everything possible in every way
possible ;-)


More information about the Python-Dev mailing list