[Python-Dev] Accepting PEP 3154 for 3.4?

Tim Peters tim.peters at gmail.com
Mon Nov 18 23:18:21 CET 2013

>> But I wonder why it isn't done with a new framing opcode instead (say,
>> FRAME followed by 8-byte count).  I suppose that would be like the
>> "prefetch" idea, except that framing opcodes would be mandatory
>> (instead of optional) in proto 4.  Why I initially like that:
>> - Uniform decoding loop ("the next thing" _always_ starts with an opcode).

> But it's not actually uniform. A frame isn't a normal opcode, it's a
> large section of bytes that contains potentially many opcodes.
> The framing layer is really below the opcode layer, it makes also sense
> to implement it like that.

That makes sense to me.

> (I also tried to implement Serhiy's PREFETCH idea, but it didn't bring
> any actual simplification)

But it has a different kind of advantage:  PREFETCH was optional.  As
Guido said, it's annoying to bloat the size of small pickles (which
may, although individually small, occur in great numbers) by 8 bytes
each.  There's really no point to framing small chunks of data, right?

Which leads to another idea:  after the PROTO opcode, there is, or is
not, an optional PREFETCH opcde with an 8-byte argument.  If the
PREFETCH opcode exists, then it gives the number of bytes up to and
including the pickle's STOP opcode.  So there's exactly 0 or 1
PREFETCH opcodes per pickle.

Is there an advantage to spraying multiple 8-byte "frame counts"
throughout a pickle stream?  8 bytes is surely enough to specify the
size of any single pickle for half a generation ;-) to come.

>> When slinging 8-byte counts, _some_ sanity-checking seems like a good idea ;-)

> I don't know. It's not much worse (for denial of service opportunities)
> than a 4-byte count, which already exists in earlier protocols.

I'm not thinking of DOS at all, just general sanity as data objects
get larger & larger.  Pickles have almost no internal checks now.  But
I've seen my share of corrupted pickles!  About the only thing that
catches them early is hitting a byte that isn't a legitimate pickle
opcode.  That _used_ to be a much stronger check than it is now,
because the 8-bit opcode space was sparsely populated at first.  But,
over time, more and more opcodes get added, so the chance of mistaking
a garbage byte for a legit opcode has increased correspondingly.

A PREFETCH opcode with a "bytes until STOP" makes for a decent bad ;-)
sanity check too ;-)

More information about the Python-Dev mailing list