[Python-Dev] Accepting PEP 3154 for 3.4?

Nick Coghlan ncoghlan at gmail.com
Tue Nov 19 00:02:50 CET 2013


On 19 Nov 2013 08:52, "Tim Peters" <tim.peters at gmail.com> wrote:
>
> [Tim]
> >> But it has a different kind of advantage:  PREFETCH was optional.  As
> >> Guido said, it's annoying to bloat the size of small pickles (which
> >> may, although individually small, occur in great numbers) by 8 bytes
> >> each.  There's really no point to framing small chunks of data, right?
>
> [Antoine]
> > You can't know how much space the pickle will take until the pickling
> > ends, though, which makes it difficult to decide whether you want to
> > emit a PREFETCH opcode or not.
>
> Ah, of course.  Presumably the outgoing pickle stream is first stored
> in some memory buffer, right?  If pickling completes before the buffer
> is first flushed, then you know exactly how large the entire pickle
> is.  If "it's small" (say, < 100 bytes), don't write out the PREFETCH
> part.  Else do.
>
>
> >> Which leads to another idea:  after the PROTO opcode, there is, or is
> >> not, an optional PREFETCH opcde with an 8-byte argument.  If the
> >> PREFETCH opcode exists, then it gives the number of bytes up to and
> >> including the pickle's STOP opcode.  So there's exactly 0 or 1
> >> PREFETCH opcodes per pickle.
> >>
> >> Is there an advantage to spraying multiple 8-byte "frame counts"
> >> throughout a pickle stream?
>
> > Well, yes: much better memory usage for large pickles.
> > Some people use pickles to store huge data, which was the motivation to
> > add the 8-byte-size opcodes after all.
>
> We'd have the same advantage _if_ it were feasible to know the entire
> size up front.  I understand now that it's not feasible.

This may be a dumb suggestion (since I don't know the pickle infrastructure
at all), but could there be a short section of unframed data before
switching to framing mode?

For example, emit up to 256 bytes unframed in all pickles, then start
emitting appropriate FRAME opcodes if the pickle continues on?

Cheers,
Nick.

>
> > ...
> > IMO, any validation should use a dedicated CRC-like scheme, rather than
> > relying on the fact that correct pickles are statistically unlikely :-)
>
> OK!  A new CRC opcode every two bytes ;-)
>
> > (or you can implicitly rely on TCP or UDP checksums, or you disk
> > subsystem's own sanity checks)
>
> Of course we've been doing that all along.  Yet I've still seen my
> share of corrupted pickles anyway.  Don't underestimate the
> determination of users to screw up everything possible in every way
> possible ;-)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131119/f7d09fd4/attachment.html>


More information about the Python-Dev mailing list