Pickling Robustly

Glyph Lefkowitz glyph at twistedmatrix.com
Wed Apr 19 13:20:00 EDT 2000


Please forgive me if I've missed something obvious, and don't flame
too hard -- I've read the FAQs, and in general tried to do my
homework.  This is my first post on USENET.

(Apologies again if this shows up multiple times; my ISP doesn't provide netnews, and the server I managed to snag doesn't allow posting :-( )

I am currently writing a client-server application in Python.  The
server has significant state, in the form of a directed graph with
many cyclic and shared references.

I was OVERJOYED to discover Pickle.  The persistence format that I was
using before (in the slower, buggier, more obfuscated Java version of
this software) was rather naive, performed poorly, and imposed several
arbitrary and inconvenient restrictions on the type of data that was
allowable.  (Funny, I worked with Java for 2 years, and python for
only a few months, but I find myself saying things like "inconvenient
and arbitrary" about Java all the time now ... as well as things like
"slower, buggier, more obfuscated")

Pickle is everything I wanted, and more.  The copy_reg module makes it
moreso.  I am so thrilled I can barely contain myself.

Except...

It appears that Pickle is not terribly robust.  One of the major
necessary features of the software I'm writing is the ability of naive
programmers to write all sorts of arbitrary code: the more naive and
the more arbitrary the better.  Python makes this part so easy it
feels like I'm not even doing anything.  However, it also means I may
have to put all this code into restricted execution mode (and
therefore reduce its random arbitrarity (is that a word?)) to prevent
it from munging the whole of my persistent data.  I am also terrified
that if I try this approach I will miss some special case that will be
pointed out to me when I lose a day's worth of work because someone
assigned a data member to some "bad" type.

Due to the extremely self-referential nature of the data structures
I'm using, it is much easier to keep everything in one pickle (and
that effectively happens anyway, even if I don't try to, thanks to the
dependency resolution that pickle does). The problem is that pickle
(as far as I can tell, anyway) does not allow one to register error
handlers to keep track of when a problem occurrs.  If an unpicklable
datatype is attempted to pickle, EVERYTHING previously pickled will
suddenly vanish and no indication will be given about where the
offending data was.  If you've got a large data structure like I
anticipate, this means writing and loading code on-the-fly to attempt
a traversal of that structure to determine where the unpicklable type
is.

I am using the following code to make some things more amenable to
pickling (see below) but it's not nearly enough; I can't hope to catch
every datatype this way.  (Or can I?  Is it possible to define some
sort of catch-all type?)  If there is no way to do this with the
existing pickle, would it be possible to hack it in somewhere easily,
or shall I have to do my own persistance algorythm?

Note -- it is perfectly acceptible to me for the solution to this
problem to merely ignore those types which it does not persist.  All I
want to do is be able to explicitly say 'these attributes are
unpicklable', and have them fail to appear when the pickle is
re-loaded.  The more types persist correctly, the better, of course,
but that's not my main concern.

---cut here: pickleplus.py---


import copy_reg, types, new
from cPickle import *

def method_pickle(method):
	
	return method_unpickle, (method.im_func.__name__,
							 method.im_self,
							 method.im_class)

def method_unpickle(im_name,
					im_self,
					im_class):
	
	unbound=getattr(im_class,im_name)
	if im_self is None:
		return unbound
	bound=new.instancemethod(unbound.im_func,
							 im_self,
							 im_class)
	return bound

def function_pickle(function):
	
	function_name=function.__name__
	module_name=function.func_globals['__name__']
	return function_unpickle, (module_name,
							   function_name)


def function_unpickle(module_name,
					  function_name):

	module=__import__(module_name,
					  # none none because there is no context
					  # for this import, 1 because we want the
					  # subpackage (if this is one)
					  None,None,1)
	return getattr(module, function_name)


copy_reg.pickle(types.MethodType,
				method_pickle,
				method_unpickle)


---cut here---

Thanks very much for any help!

-- 
                  __________________________________________
                 |    ______      __   __  _____  _     _   |
                 |   |  ____ |      \_/   |_____] |_____|   |
                 |   |_____| |_____  |    |       |     |   |
                 |   @ t w i s t e d m a t r i x  . c o m   |
                 |   http://www.twistedmatrix.com/~glyph/   |
                 `__________________________________________'




More information about the Python-list mailing list