[Python-Dev] Pickling in XML format

Raymond Hettinger python@rcn.com
Wed, 7 Aug 2002 18:43:46 -0400


Do you guys have any thoughts on the merits of adding dumpXML and loadXML methods to the pickle module?

The only disadvantage that comes to mind is that the file sizes are larger (though they may compress more efficiently.

The advantages center around portability and the use of existing tools:
-- The pickles would be validatable against a DTD or schema
-- Pickles would be more human readable than the current format
-- XLST make translations to HTML, JavaPickle formats, more compact formats, etc.
-- XPATH could be used as a recursive search tool
-- Pickles would be editable and viewable with XML editors
-- No need for stack machine instructions to be included
-- Python object trees could potentially be loaded in other languages
-- The DTD can be used by non-Python sources to create data that is directly loadable in to Python objects
-- Pickle security can be improved by using tight DTDs instead of copyreg.

I would appreciate you thoughts.


Raymond Hettinger

P.S.  Here's an example of what it would look like:

class Circle:
    def __init__(self, rad):
 self.rad = rad

class Square:
    def __init__(self, side):
 self.side = side
    def __getinitargs__(self):
        return (self.side,)

class Triangle:
    def __init__(self, side1, side2, side3):
        self.sides = map(math.toRadians, (side1, side2, side3))
    def __getstate__(self):
        return self.sides
    def __setstate__(self, state):
        self.sides = state

>>> d = {"one":"uno", "two":"dos"}
>>> obj = [d, 42, u"abc", [1.0,2+5j], Circle(5), Square(4), Triangle(3,4,5), d, None, True, False, Circle, len]
>>> pickle.dumpsXML(obj)

<objectlist>
<list>
  <dict id="0">
    <item> <str>one</str> <str>uno</str> </item>
    <item> <str>two</str> <str>dos</str> </item>
  </dict>
  <int>42</int> 
  <unicode>abc</unicode> 
  <list>
    <float>1.0</float> 
    <complex>2+5j</complex> 
  </list>
  <instance module="__main__" name="Circle">
    <dict>
      <item> <str>rad</str> <int>5</int> </item>
    </dict>
  </instance>
  <instance module="__main__" name="Square">
    <tuple>
      <int>5</int>
    </tuple>
  </instance>
  <instance module="__main__" name="Triangle">
    <list>
      <float>0.052358333333333333</float>
      <float>0.069811111111111115</float>
      <float>0.087263888888888891</float>
    </list>
  </instance>
  <memo idref="0"/>
  <none/>
  <true/>
  <false/>
  <global module="__main__" name="Circle"/>
  <global module="__builtin__" name="len"/>
</list>
</objectlist>