Fwd: [XML-SIG] xmlpickle.py ?!
M.-A. Lemburg
mal@lemburg.com
Tue, 08 Aug 2000 10:56:09 +0200
Jim Fulton wrote:
>
> <note>I don't normally have time to follow the xml sig.
> Someone kindly forwarded Marc-Andre's note to me.
> I haven't seen the rest of this thread.
> </note>
The thread is just starting... thanks for chiming in.
> "M.-A. Lemburg" <mal@lemburg.com> wrote:
> >
> > I'm currently looking into writing a xmlpickle.py module
> > with the intent to be able to pickle (and unpickle) arbitrary
> > Python objects in a way that makes the objects editable through
> > a XML editor or convertible to some other format using the
> > existing XML tools.
>
> I wonder whether a tool that generated XML for arbitrary Python
> objects would really be that useful for transfer to
> other applications. I suspect not.
I'm not sure either, but given that XML is becoming an
industry standard and that more and more tools are becoming
available, I have a feeling that xmlpickle is a good
idea in the sense of making Python buzz word forward
compatible ;-)
Of course, a third party tool won't be able to handle arbitrary
Python pickles, but for a quick transfer of object data or
together with a semantic style sheet xmlpickles should make
a good inter-application transport encoding between closely
related software, e.g. Python on one side, C++ on the other.
I wonder how well SOAP would handle pickling arbitrary
Python objects...
> > After looking at the archives of this SIG, I found that the
> > idea was already tossed around a few times, but I couldn't
> > find any downloadble outcome.
>
> Zope has a facility that I've been meaning to make more
> generally available but haven't had time to. :/
> In my case, I wanted to be able to convert to/from binary
> pickles and xml, so I had an intern write something that
> works from pickles, rather than from objects. It can be used
> to look at existing pickles and can be used, in conjunction with
> pickle or cPickle to convert objects to and from XML.
>
> If your interested, let me know and I'll provide more details.
I've had a look at ppml.py in Zope, but didn't really
grok the idea behind it -- it's completely undocumented
and contains some really weird callbacks :-/
My general idea for xmlpickle is to come up with a format that
is human readable and editable, i.e. literal representations
should be used in favour of binary ones (size is not a problem;
speed can later be added via a C extension).
> > I've looked at pickle.py a bit and realized that the extensible
> > nature of the pickle mechanism would probably cause trouble
> > because the DTD would have to be generated as well (not a good
> > idea).
>
> Why would a DTD have to be generated?
If you take the first path (see below; one element per pickle'able
type), then you'd have to regenerate the DTD in case new types
were registered through copy_reg.
> > There are two alternatives to this though:
> >
> > 1. add an element which handles all non-core Python object
> > types (the ones registered through copy_reg)
> >
> > 2. use an abstract DTD altogheter
> >
> > Example for 1:
> >
> > <PythonPickle version="1.0">
> > <Dictionary>
> > <String name="aString">abcdef</String>
> > <List name="aList">
> > <Integer>10</Integer>
> > <String>abc</String>
> > </List>
> > <Instance name="aInstance" module="test" classname="test">
> > <String name="instvar">value</String>
> > </Instance>
> > <Object name="myObject" constructor="mx.DateTime.DateTime">
> > <Tuple>
> > <Integer>2000</Integer>
> > <Integer>8</Integer>
> > <Integer>6</Integer>
> > </Tuple>
> > </Object>
> > </Dictionary>
> > </PythonPickle>
>
> This is the route I took. Here's an example that's
> probably alot bigger than you want....
>
> <pickle>
> <dictionary id="3046.4">
> <item>
> <key> <string id="3046.5" encoding="repr">title</string> </key>
> <value> <string encoding="repr"></string> </value>
> </item>
> <item>
> <key> <string id="3046.6" encoding="repr">raw</string> </key>
> <value> <string id="3046.7" encoding="cdata"><![CDATA[
>
> <dtml-var standard_html_header>\n
> <h2><dtml-var title_or_id> <dtml-var document_title></h2>\n
> <dtml-var "\'\\n\\n\'">\n
> <p>\n
> This is the <dtml-var document_id> Document \n
> in the <dtml-var title_and_id> Folder.\n
> </p>\n
> <dtml-var standard_html_footer>
>
> ]]></string> </value>
> </item>
> <item>
> <key> <string id="3046.8" encoding="repr">__ac_local_roles__</string> </key>
> <value>
> <dictionary id="3046.9">
> <item>
> <key> <string id="3046.10" encoding="repr">jim</string> </key>
> <value>
> <list id="3046.11">
> <string id="3046.12" encoding="repr">Owner</string>
> </list>
> </value>
> </item>
> </dictionary>
> </value>
> </item>
> <item>
> <key> <string id="3046.13" encoding="repr">globals</string> </key>
> <value>
> <dictionary id="3046.14"/>
> </value>
> </item>
> <item>
> <key> <string id="3046.15" encoding="repr">__name__</string> </key>
> <value> <string id="3046.16" encoding="repr">m2</string> </value>
> </item>
> <item>
> <key> <string id="3046.17" encoding="repr">_vars</string> </key>
> <value>
> <dictionary id="3046.18"/>
> </value>
> </item>
> </dictionary>
> </pickle>
>
> Note that this is pretty much a straight translation of
> the Python pickle "schema". :)
This looks pretty much like what I had in mind (this is
what ppml.py generates, right ?). The only part I don't
like about ppml.py's approach is that it pickles e.g.
integers to a binary format.
> Note the id attributes
> and reference tags, which allow cyclical data structures.
Way cool, yes :-)
> (I recently discovered that there is a problem with my id
> values. Does anyone know what it is? ;)
>
> One other note. I found the XML spec to be a little
> ambigouos (or maybe I'm just too dense) wrt binary data
> and newlines, so I decided to punt and escape newlines and
> binary data. I encode strings as either "repr" which is a
> repr like encoding that escapes things in a way that is
> just a tad more terse than repr. I switch to base64 when
> the escaping penalty exceeds 40%.
I don't really care about size... my goal is keeping data editable
and human readable -- this also makes writing backends in
other languages a lot easier.
> Since alot of our pickles
> have marked up text, I automatically use CDATA sections when
> I can and where it would help. See the example above.
How robust is this CDATA wrapping ? What if the data itself
is XML and contains a CDATA section ?
> I really need to write down a DTD for this......
You should :-)
Thanks,
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/