[Python-3000] Heaptypes

Guido van Rossum guido at python.org
Fri Jul 20 00:25:07 CEST 2007


On 7/19/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > But you can do it using bytes('\xff', 'latin-1'). I think that's a
> > reasonable thing for bytes.__reduce__() to return.
>
> That's certainly a choice. Another choice is that bytes defaults to
> latin-1, rather than the system default encoding. This is roughly
> equivalent, and gives a slightly more compact pickle result.

I don't like bytes defaulting to anything at all; that they currently
do is a transitional issue in the branch. Java used to have a default
of Latin-1 for converting bytes <--> string and it was considered a
mistake AFAIK.

I've implemented the explicit latin-1version for now; we can change this later.

> > How about the following. it's not perfect but it's the best I can
> > think of that doesn't break any pickles.
> >
> > In 3.0, when an S, T or U pickle code is encountered, the returned
> > value is a Unicode string decoded from the bytes using Latin-1. This
> > means that all S, T or U pickle codes returns Unicode objects. In
> > those cases where this was really meant to transfer binary data, the
> > application running under 3.0 can fix this by calling bytes(X,
> > 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call
> > str(Y, 'utf-8') after that.
>
> It would actually have to be Y.encode('latin-1').decode('utf-8')
> (assuming Y is what you get from unpickling):

That's another way of saying it. I meant for Y to be the result of
bytes(X, 'latin-1') but that was non-obvious. Anyway I think we're in
agreement here. :-)

> py> str('\xc3\xb6', 'utf-8')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: decoding Unicode is not supported
>
> > But 3.0 should only *generate* the S, T or U pickle codes for str8
> > values (as long as that type exists) or for str values containing only
> > 7-bit ASCII bytes; for all else it should use the unicode pickle
> > codes.
>
> Sounds fine to me.
>
> > For bytes, I propose that b"ab\xff".__reduce__() return (bytes,
> > ("ab\xff", "latin-1")).
>
> See above. Unless somebody objects, I'd rather make latin-1 the
> default for bytes when a string is passed (I'm uncertain myself
> of how much explicit is better than implicit here).

See above.

> I'll look into implementing that strategy.

How about instead you help with fixing pickling of datetime objects?
This broke when I fixed test_pickle. Rolling back your changes to
datetime pickling didn't seem to help.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list