[Datetime-SIG] Are there any "correct" implementations of tzinfo?

Tue Sep 15 03:56:47 CEST 2015

[Tim]
>> Sorry, I'm not arguing about this any more.  Pickle doesn't work at
>> all at the level of "count of bytes followed by a string".

[Random832 <random832 at fastmail.com>]
> The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes
> indeed* "count of bytes followed by a string".

Yes, some individual opcodes do work that way.

>> If you
>> want to make a pickle argument that makes sense, I'm afraid you'll
>> need to become familiar with how pickle works first.  This is not the
>> place for a pickle tutorial.
>>
>> Start by learning what a datetime pickle actually is.
>> pickletools.dis() will be very helpful.

>     0: \x80 PROTO      3
>     2: c    GLOBAL     'datetime datetime'
>    21: q    BINPUT     0
>    23: C    SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00'
>    35: q    BINPUT     1
>    37: \x85 TUPLE1
>    38: q    BINPUT     2
>    40: R    REDUCE
>    41: q    BINPUT     3
>    43: .    STOP
>
> The payload is ten bytes, and the byte immediately before it is in fact
> 0x0a. If I pickle any byte string under 256 bytes long by itself, the
> byte immediately before the data is the length. This is how I initially
> came to the conclusion that "count of bytes followed by a string" was
> valid.

Ditto.

> I did, before writing my earlier post, look into the high-level aspects
> of how datetime pickle works - it uses __reduce__ to create up to two
> arguments, one of which is a 10-byte string, and the other is the
> tzinfo. Those arguments are passed into the date constructor and
> detected by that constructor - for example, I can call it directly with
> datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result
> as unpickling.

Good job!  That abuse of the constructor was supposed to remain a secret ;-)

> At the low level, the part that represents that first argument does
> indeed appear to be "count of bytes followed by a string". I can add to
> the count, add more bytes, and it will call the constructor with the
> longer string. If I use pickletools.dis on my modified value the output
> looks the same except for, as expected, the offsets and the value of the
> argument to the SHORT_BINBYTES opcode.
>
> So, it appears that, as I was saying, "wasted space" would not have been
> an obstacle to having the "payload" accepted by the constructor (and
> produced by __reduce__ ultimately _getstate) consist of "a byte string
> of >= 10 bytes, the first 10 of which are used and the rest of which are
> ignored by python <= 3.5" instead of "a byte string of exactly 10
> bytes", since it would have accepted and produced exactly the same
> pickle values, but been prepared to accept larger arguments pickled from
> future versions.

Yes, if we had done things differently from the start, things would
work differently today.  But what's the point?  We have to live now
with what _was_ done.  A datetime pickle carrying a string payload
with anything other than exactly 10 bytes will almost always blow up
under older Pythons. and would be considered "a bug" if it didn't.
Pickles are not at all intended to be forgiving (they're enough of a
potential security hole without going out of their way to ignore
random mysteries).

It may be nicer if Python had a serialization format more deliberately
designed for evolution of class structure - but it doesn't.  Classes
that need such a thing now typically store their own idea of a
"version" number as part of their pickled state  datetime never did.

> ...
> So have I shown you that I know enough about the pickle format to know
> that permitting a longer string (and ignoring the extra bytes) would
> have had zero impact on the pickle representation of values that did not
> contain a longer string?

Yes.  If we had a time machine, it might even have proved useful ;-)

> I'd already figured out half of this before
> writing my earlier post; I just assumed *you* knew enough that I
> wouldn't have to show my work.

It's always best to show your work on a public list.  Thanks for
finally ;-) doing so!