[Python-Dev] pickling of large arrays

Ralf W. Grosse-Kunstleve rwgk@yahoo.com
Mon, 29 Jul 2002 06:02:00 -0700 (PDT)


We are using Boost.Python to expose reference-counted C++ container
types (similar to std::vector<>) to Python. E.g.:

from arraytbx import shared
d = shared.double(1000000) # double array with a million elements
c = shared.complex_double(100) # std::complex<double> array
# and many more types, incl. several custom C++ types

We need a way to pickle these arrays. Since they can easily be
converted to tuples we could just define functions like:

  def __getstate__(self):
    return tuple(self)

However, since the arrays are potentially huge this could incur
a large overhead (e.g. a tuple of a million Python float).
Next idea:

  def __getstate__(self):
    return iter(self)

Unfortunately (but not unexpectedly) pickle is telling me:
'can't pickle iterator objects'

Attached is a short Python script (tested with 2.2.1) with a prototype
implementation of a pickle helper ("piece_meal") for large arrays.
piece_meal's __getstate__ converts a block of a given size to a Python
list and returns a tuple with that list and a new piece_meal instance
which knows how to generate the next chunk. I.e. piece_meal instances
are created recursively until the input sequence is exhausted. The
corresponding __setstate__ puts the pieces back together again
(uncomment the print statement to see the pieces).

I am wondering if a similar mechanism could be used to enable pickling
of iterators, or maybe special "pickle_iterators", which would
immediately enable pickling of our large arrays or any other object
that can be iterated over (e.g. Numpy arrays which are currently
pickled as potentially huge strings). Has this been discussed already?
Are there better ideas?

Ralf


import pickle

class piece_meal:

  block_size = 4

  def __init__(self, sequence, position):
    self.sequence = sequence
    self.position = position

  def __getstate__(self):
    next_position = self.position - piece_meal.block_size
    if (next_position <= 0):
      return (self.sequence[:self.position], 0)
    return (self.sequence[next_position:self.position],
            piece_meal(self.sequence, next_position))

  def __setstate__(self, state):
    #print "piece_meal:", state
    if (state[1] == 0):
      self.sequence = state[0]
    else:
      self.sequence = state[1].sequence + state[0]

class array:

  def __init__(self, n):
    self.elems = [i for i in xrange(n)]

  def __getstate__(self):
    return piece_meal(self.elems, len(self.elems))

  def __setstate__(self, state):
    self.elems = state.sequence

def exercise():
  for i in xrange(11):
    a = array(i)
    print a.elems
    s = pickle.dumps(a)
    b = pickle.loads(s)
    print b.elems

if (__name__ == "__main__"):
  exercise()


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com