[Python-3000] Heaptypes

Thu Jul 19 01:57:01 CEST 2007

On 7/16/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Guido van Rossum schrieb:
> > That sounds like a good idea to try. It may break some more tests but
> > those are all indications of places that incorrectly still require
> > str8.
> >
> >> I wonder whether the "s" specifier in CallFunction, BuildValue etc
> >> should create Unicode objects, rather than str8 objects.
>
> Done. I fixed a number of test cases that broke because of that.
> In particular, bytes.__reduce__ could not easily return str8 objects
> as its marshalling state anymore (and shouldn't do so, anyway).
> So I made bytes a builtin type of pickle, using the S code.
> As a consequence, a number of other types had to get fixed.
>
> So in total, it adds one new failure: something in test_pickle
> now complains that bytes objects are not hashable.

Now that this is checked in, I understand the problem. You are using
the same opcodes for pickling bytes and str8 -- save_bytes() is a
clone of save_string() (the latter is the callback for str8, not for
str). But you made load_string() always return bytes. The broken tests
fail because they use hardcoded pickles which use the STRING opcode to
save a str8 which is used as a dict key.

You broke backwards compatibility this way; I think that a pickle
produced by Python 2.x should be readable by Python 3.0.

Now, one could argue about whether an 8-bit string pickled in 2.x
should be returned as a Unicode string in 3.0 or as a bytes array.
There is even an argument to be made that it should be a bytes array,
since an 8-bit string in 2.x it's just as likely to represent binary
data as text data, and even if it's text, we don't know the encoding.
But I think that there is a counter-argument that's stronger: the dict
{'a': 42} pickled in 2.x must unpickle as a dict with an immutable
object as key. So we should either unpickle 'a' as a (unicode) str
with value 'a', or as (8-bit) str8, as long as the latter type exists
(I haven't decided whether to keep str8 or something like it, or
whether to try to get rid of it completely).

One possibility might be to first try to decode the STRING argument as
utf-8, and if that fails to convert it to str8 instead. What do you
think? I don't understand all of the changes you made in r56438,
perhaps you can save most of them.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)