[Python-3000] Heaptypes

Thu Jul 19 05:01:18 CEST 2007

On 7/18/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > You broke backwards compatibility this way; I think that a pickle
> > produced by Python 2.x should be readable by Python 3.0.
>
> It is, is it not?

No; {'a': 1} pickled on 2.x results in an error complaining about an
unhashable object when the pickle is read in 3.0; this is the error
you saw in test_pickle.py.

> > (I haven't decided whether to keep str8 or something like it, or
> > whether to try to get rid of it completely).
>
> I assumed the latter - and if it indeed goes away, it's certainly
> a bug to ever return str8 from pickle, right?

If indeed it goes away, it can't be returned. If it's still around, we
can argue about the desirability of returning one.

> > One possibility might be to first try to decode the STRING argument as
> > utf-8, and if that fails to convert it to str8 instead. What do you
> > think? I don't understand all of the changes you made in r56438,
> > perhaps you can save most of them.
>
> The question really is what bytes should be pickled as; that needs to
> be decided before fixing the code. Should it be built-in (and if so,
> using what code)? If not, it probably needs to go through __reduce__,
> and if so, what should __reduce__ return for bytes object?

Either a new opcode (which would such a pickle fail hard when
unpickled with 2.5, but that's probably fine as it would fail anyway),
or some variation of what I coded before, using __reduce__.

> __reduce__ currently does (O(s#)) with (ob_type, ob_bytes, ob_size).
> Now, s# creates a Unicode object, and the pickling fails to round-trip
> correctly.

I thought that before your patch a bytes object roundtripped correctly
with all three protocols. Or maybe it got broken when s# was changed?

An additional requirement might be that if bytes are introduced in
2.6, a pickle containing bytes written by 3.0 should be readable by
2.6. Ideally, pickles not containing bytes written in 3.0 should
always be readable in 2.6 (assuming the user-defined types it
references exist).

> If __reduce__ returns a Unicode object, what encoding should be assumed?
> (which then needs to be symmetric with bytes())
>
> If __reduce__ returns a str8 object, you will have to keep str8 (or
> else you cannot pickle bytes).

When __reduce__ returns a string at all, that means it's the name of a
global. I guess that should be encoded using UTF-8, so that as long as
the name is ASCII, 2.x can unpickle it. But I'm not sure if that's
what you were asking.

Anyway, one reason this is such a mess is clearly that the pickle
protocol has no independent spec -- it's grown organically in code.
Reverse-engineering the intent of the code is a pain.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)