Fwd: [XML-SIG] xmlpickle.py ?!

M.-A. Lemburg mal@lemburg.com
Tue, 08 Aug 2000 10:56:09 +0200

Jim Fulton wrote:
> <note>I don't normally have time to follow the xml sig.
>   Someone kindly forwarded Marc-Andre's note to me.
>   I haven't seen the rest of this thread.
> </note>

The thread is just starting... thanks for chiming in.
> "M.-A. Lemburg" <mal@lemburg.com> wrote:
> >
> > I'm currently looking into writing a xmlpickle.py module
> > with the intent to be able to pickle (and unpickle) arbitrary
> > Python objects in a way that makes the objects editable through
> > a XML editor or convertible to some other format using the
> > existing XML tools.
> I wonder whether a tool that generated XML for arbitrary Python
> objects would really be that useful for transfer to
> other applications. I suspect not.

I'm not sure either, but given that XML is becoming an
industry standard and that more and more tools are becoming
available, I have a feeling that xmlpickle is a good
idea in the sense of making Python buzz word forward
compatible ;-)

Of course, a third party tool won't be able to handle arbitrary
Python pickles, but for a quick transfer of object data or
together with a semantic style sheet xmlpickles should make
a good inter-application transport encoding between closely
related software, e.g. Python on one side, C++ on the other.
I wonder how well SOAP would handle pickling arbitrary
Python objects...

> > After looking at the archives of this SIG, I found that the
> > idea was already tossed around a few times, but I couldn't
> > find any downloadble outcome.
> Zope has a facility that I've been meaning to make more
> generally available but haven't had time to. :/
> In my case, I wanted to be able to convert to/from binary
> pickles and xml, so I had an intern write something that
> works from pickles, rather than from objects. It can be used
> to look at existing pickles and can be used, in conjunction with
> pickle or cPickle to convert objects to and from XML.
> If your interested, let me know and I'll provide more details.

I've had a look at ppml.py in Zope, but didn't really
grok the idea behind it -- it's completely undocumented
and contains some really weird callbacks :-/

My general idea for xmlpickle is to come up with a format that
is human readable and editable, i.e. literal representations
should be used in favour of binary ones (size is not a problem;
speed can later be added via a C extension).

> > I've looked at pickle.py a bit and realized that the extensible
> > nature of the pickle mechanism would probably cause trouble
> > because the DTD would have to be generated as well (not a good
> > idea).
> Why would a DTD have to be generated?

If you take the first path (see below; one element per pickle'able
type), then you'd have to regenerate the DTD in case new types
were registered through copy_reg.
> > There are two alternatives to this though:
> >
> > 1. add an element which handles all non-core Python object
> >    types (the ones registered through copy_reg)
> >
> > 2. use an abstract DTD altogheter
> >
> > Example for 1:
> >
> > <PythonPickle version="1.0">
> > <Dictionary>
> >         <String name="aString">abcdef</String>
> >         <List name="aList">
> >                 <Integer>10</Integer>
> >                 <String>abc</String>
> >         </List>
> >         <Instance name="aInstance" module="test" classname="test">
> >                 <String name="instvar">value</String>
> >         </Instance>
> >         <Object name="myObject" constructor="mx.DateTime.DateTime">
> >                 <Tuple>
> >                         <Integer>2000</Integer>
> >                         <Integer>8</Integer>
> >                         <Integer>6</Integer>
> >                 </Tuple>
> >         </Object>
> > </Dictionary>
> > </PythonPickle>
> This is the route I took. Here's an example that's
> probably alot bigger than you want....
>     <pickle>
>       <dictionary id="3046.4">
>         <item>
>             <key> <string id="3046.5" encoding="repr">title</string> </key>
>             <value> <string encoding="repr"></string> </value>
>         </item>
>         <item>
>             <key> <string id="3046.6" encoding="repr">raw</string> </key>
>             <value> <string id="3046.7" encoding="cdata"><![CDATA[
> <dtml-var standard_html_header>\n
> <h2><dtml-var title_or_id> <dtml-var document_title></h2>\n
> <dtml-var "\'\\n\\n\'">\n
> <p>\n
> This is the <dtml-var document_id> Document \n
> in the <dtml-var title_and_id> Folder.\n
> </p>\n
> <dtml-var standard_html_footer>
> ]]></string> </value>
>         </item>
>         <item>
>             <key> <string id="3046.8" encoding="repr">__ac_local_roles__</string> </key>
>             <value>
>               <dictionary id="3046.9">
>                 <item>
>                     <key> <string id="3046.10" encoding="repr">jim</string> </key>
>                     <value>
>                       <list id="3046.11">
>                         <string id="3046.12" encoding="repr">Owner</string>
>                       </list>
>                     </value>
>                 </item>
>               </dictionary>
>             </value>
>         </item>
>         <item>
>             <key> <string id="3046.13" encoding="repr">globals</string> </key>
>             <value>
>               <dictionary id="3046.14"/>
>             </value>
>         </item>
>         <item>
>             <key> <string id="3046.15" encoding="repr">__name__</string> </key>
>             <value> <string id="3046.16" encoding="repr">m2</string> </value>
>         </item>
>         <item>
>             <key> <string id="3046.17" encoding="repr">_vars</string> </key>
>             <value>
>               <dictionary id="3046.18"/>
>             </value>
>         </item>
>       </dictionary>
>     </pickle>
> Note that this is pretty much a straight translation of
> the Python pickle "schema". :)

This looks pretty much like what I had in mind (this is
what ppml.py generates, right ?). The only part I don't
like about ppml.py's approach is that it pickles e.g.
integers to a binary format.

> Note the id attributes
> and reference tags, which allow cyclical data structures.

Way cool, yes :-)

> (I recently discovered that there is a problem with my id
> values. Does anyone know what it is? ;)
> One other note. I found the XML spec to be a little
> ambigouos (or maybe I'm just too dense) wrt binary data
> and newlines, so I decided to punt and escape newlines and
> binary data.  I encode strings as either "repr" which is a
> repr like encoding that escapes things in a way that is
> just a tad more terse than repr. I switch to base64 when
> the escaping penalty exceeds 40%. 

I don't really care about size... my goal is keeping data editable
and human readable -- this also makes writing backends in
other languages a lot easier.

> Since alot of our pickles
> have marked up text, I automatically use CDATA sections when
> I can and where it would help. See the example above.

How robust is this CDATA wrapping ? What if the data itself
is XML and contains a CDATA section ?
> I really need to write down a DTD for this......

You should :-)

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/