Fwd: [XML-SIG] xmlpickle.py ?!

Jim Fulton jim@digicool.com
Tue, 08 Aug 2000 07:52:08 -0400


"M.-A. Lemburg" wrote:
> 
> Jim Fulton wrote:
> >
> > <note>I don't normally have time to follow the xml sig.
> >   Someone kindly forwarded Marc-Andre's note to me.
> >   I haven't seen the rest of this thread.
> > </note>
> 
> The thread is just starting... thanks for chiming in.
> 
> > "M.-A. Lemburg" <mal@lemburg.com> wrote:
> > >
> > > I'm currently looking into writing a xmlpickle.py module
> > > with the intent to be able to pickle (and unpickle) arbitrary
> > > Python objects in a way that makes the objects editable through
> > > a XML editor or convertible to some other format using the
> > > existing XML tools.
> >
> > I wonder whether a tool that generated XML for arbitrary Python
> > objects would really be that useful for transfer to
> > other applications. I suspect not.
> 
> I'm not sure either, but given that XML is becoming an
> industry standard and that more and more tools are becoming
> available, I have a feeling that xmlpickle is a good
> idea in the sense of making Python buzz word forward
> compatible ;-)

I'm not saying that an XML pickle variant isn't useful, 
but that it's not going to be very useful for interoperability.
I think that for interoperability, application-specific 
XML formats that don't need to be as complete as pickle
are more useful.

> Of course, a third party tool won't be able to handle arbitrary
> Python pickles, but for a quick transfer of object data or
> together with a semantic style sheet xmlpickles should make
> a good inter-application transport encoding between closely
> related software, e.g. Python on one side, C++ on the other.
> 
> I wonder how well SOAP would handle pickling arbitrary
> Python objects...

In particular, I wonder if it tries to be complete. I haven't 
really looked at SOAP lately. In my experience, RPC mechanisms 
don't really need or try to handle arbitrarily complex objects.
OTOH, lots of applications don't need complete transfer.
 
> > > After looking at the archives of this SIG, I found that the
> > > idea was already tossed around a few times, but I couldn't
> > > find any downloadble outcome.
> >
> > Zope has a facility that I've been meaning to make more
> > generally available but haven't had time to. :/
> > In my case, I wanted to be able to convert to/from binary
> > pickles and xml, so I had an intern write something that
> > works from pickles, rather than from objects. It can be used
> > to look at existing pickles and can be used, in conjunction with
> > pickle or cPickle to convert objects to and from XML.
> >
> > If your interested, let me know and I'll provide more details.
> 
> I've had a look at ppml.py in Zope, but didn't really
> grok the idea behind it -- it's completely undocumented
> and contains some really weird callbacks :-/

Yes, well, if your interested in pusuing it, I'll provide more
info.
 
> My general idea for xmlpickle is to come up with a format that
> is human readable and editable, i.e. literal representations
> should be used in favour of binary ones (size is not a problem;
> speed can later be added via a C extension).

OK. Obviously, a gif image needs to be encoded.  We could certainly
modify the algorithm that decides between repr and base64 to give
more prference to repr.
 
> > > I've looked at pickle.py a bit and realized that the extensible
> > > nature of the pickle mechanism would probably cause trouble
> > > because the DTD would have to be generated as well (not a good
> > > idea).
> >
> > Why would a DTD have to be generated?
> 
> If you take the first path (see below; one element per pickle'able
> type), then you'd have to regenerate the DTD in case new types
> were registered through copy_reg.

Yes, but why do you need t DTD. Lots of people don't
seem to use DTDs and DTDs don't work very well with namsspaces.
 
(snip)
> >
> > Note that this is pretty much a straight translation of
> > the Python pickle "schema". :)
> 
> This looks pretty much like what I had in mind (this is
> what ppml.py generates, right ?

Right.

>). The only part I don't
> like about ppml.py's approach is that it pickles e.g.
> integers to a binary format.
 
Nah:

<pickle> <int>123</int> </pickle>

> > Note the id attributes
> > and reference tags, which allow cyclical data structures.
> 
> Way cool, yes :-)
> 
> > (I recently discovered that there is a problem with my id
> > values. Does anyone know what it is? ;)
> >
> > One other note. I found the XML spec to be a little
> > ambigouos (or maybe I'm just too dense) wrt binary data
> > and newlines, so I decided to punt and escape newlines and
> > binary data.  I encode strings as either "repr" which is a
> > repr like encoding that escapes things in a way that is
> > just a tad more terse than repr. I switch to base64 when
> > the escaping penalty exceeds 40%.
> 
> I don't really care about size... my goal is keeping data editable
> and human readable -- this also makes writing backends in
> other languages a lot easier.

So we could add some tuning to this. Note that the goal is not to 
reduce size, but to detect "binary" data. Python doesn't make
a distinction between binary and text, but base64 is probably
a much better way to encode truly binary data.
 
> > Since alot of our pickles
> > have marked up text, I automatically use CDATA sections when
> > I can and where it would help. See the example above.
> 
> How robust is this CDATA wrapping ? What if the data itself
> is XML and contains a CDATA section ?

Then it's not used. We will only use CDATA if we can.
 
Jim