[Python-3000] pickle compatibility between 2.x and 3.0
Guido van Rossum
guido at python.org
Thu Nov 1 15:18:32 CET 2007
It's time to start thinking about pickle compatibility between 2.x and
3.0. The main problem is the 2.x str type -- it doesn't have a true
equivalent in 3.0.
When 3.0 encounters a 'str' object in a pickle written by 2.x, it has
two choices: trying to convert it to a 3.0 (unicode) str object by
applying some encoding, or interpreting it as a 3.0 bytes object. The
latter would be trivial, but likely wrong, as the 2.x program that
wrote the pickle would likely have meant it to be a text string
(although there are certainly cases where binary data gets pickled as
well, in which case bytes is of course the correct translation). Since
in 3.0, bytes don't interact with text strings the way in 2.x str
interacts with unicode, receiving bytes is somewhat inconvenient for
the 3.0 program.
OTOH, applying an encoding gives us the painful choice of deciding
what encoding to use -- the input pickle doesn't give us any hints,
and as indicated we're not even sure that text was intended.
I could leave this all up to the 3.0 application, which would have to
"fix up" any bytes in the pickle it receives explicitly if it wants
to. Alternatively, I could add an encoding option to the pickle
loading APIs (and for full flexibility an errors option as well) so
that at least simple text-based applications might have a chance of
reading the data that they themselves wrote before they were ported to
3.0 with minimal changes (only the unpickling calls would have to be
Do people here think it's worth it? Think of any place where you
currently are using pickles. What would your 3.0 porting strategy
likely be? Would not having automatic decoding of pickled 8-bit
strings be a major burden?
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000