Guido van Rossum wrote:
On Sat, Mar 7, 2009 at 8:04 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
Typically, the purpose of a database is to persist data across program runs. So typically, your suggestion would only help if there were a way to persist the primed Pickler across runs.
I haven't followed all this, but isn't is at least possible to conceive of the primed pickler as being recreated from scratch from constant data each run?
If there were a guarantee that pickling the same data would result in the same memo ID -> object mapping, that would also work. But that doesn't seem to be a realistic guarantee to make. AFAIK the memo IDs are integers chosen consecutively in the order that objects are pickled, which doesn't seem so bad. But compound objects are a problem. For example, when pickling a map, the map entries would have to be pickled in an order that remains consistent across runs (and even across Python versions). Even worse, all user-written __getstate__() methods would have to return exactly the same result, even across program runs.
(The primed Unpickler is not quite so important because it can be primed by reading a pickle of the primer, which in turn can be stored somewhere in the DB.)
In the particular case of cvs2svn, each of our databases is in fact written in a single pass, and then in later passes only read, not written. So I suppose we could do entirely without pickleable Picklers, if they were copyable within a single program run. But that constraint would make the feature even less general.
Being copyable is mostly equivalent to being picklable, but it's probably somewhat weaker because it's easier to define it as a pointer copy for some types that aren't easily picklable.
Indeed. And pickling the memo should not present any fundamental problems, since by construction it can only contain pickleable objects. Michael