Re: [Python-Dev] Pickler/Unpickler API clarification

7 Mar 2009


      Guido van Rossum wrote:
...
On Sat, Mar 7, 2009 at 8:04 AM, Michael Haggerty  wrote:
...
Typically, the purpose of a database is to persist data across program
runs.  So typically, your suggestion would only help if there were a way
to persist the primed Pickler across runs.
I haven't followed all this, but isn't is at least possible to
conceive of the primed pickler as being recreated from scratch from
constant data each run?
If there were a guarantee that pickling the same data would result in
the same memo ID -> object mapping, that would also work.  But that
doesn't seem to be a realistic guarantee to make.  AFAIK the memo IDs
are integers chosen consecutively in the order that objects are pickled,
which doesn't seem so bad.  But compound objects are a problem.  For
example, when pickling a map, the map entries would have to be pickled
in an order that remains consistent across runs (and even across Python
versions).  Even worse, all user-written __getstate__() methods would
have to return exactly the same result, even across program runs.
...
...
(The primed Unpickler is not quite so important because it can be primed
by reading a pickle of the primer, which in turn can be stored somewhere
in the DB.)
In the particular case of cvs2svn, each of our databases is in fact
written in a single pass, and then in later passes only read, not
written.  So I suppose we could do entirely without pickleable Picklers,
if they were copyable within a single program run.  But that constraint
would make the feature even less general.
Being copyable is mostly equivalent to being picklable, but it's
probably somewhat weaker because it's easier to define it as a pointer
copy for some types that aren't easily picklable.
Indeed.  And pickling the memo should not present any fundamental
problems, since by construction it can only contain pickleable objects.

Michael