Fwd: [XML-SIG] xmlpickle.py ?!
Jim Fulton
jim@digicool.com
Tue, 08 Aug 2000 07:52:08 -0400
"M.-A. Lemburg" wrote:
>
> Jim Fulton wrote:
> >
> > <note>I don't normally have time to follow the xml sig.
> > Someone kindly forwarded Marc-Andre's note to me.
> > I haven't seen the rest of this thread.
> > </note>
>
> The thread is just starting... thanks for chiming in.
>
> > "M.-A. Lemburg" <mal@lemburg.com> wrote:
> > >
> > > I'm currently looking into writing a xmlpickle.py module
> > > with the intent to be able to pickle (and unpickle) arbitrary
> > > Python objects in a way that makes the objects editable through
> > > a XML editor or convertible to some other format using the
> > > existing XML tools.
> >
> > I wonder whether a tool that generated XML for arbitrary Python
> > objects would really be that useful for transfer to
> > other applications. I suspect not.
>
> I'm not sure either, but given that XML is becoming an
> industry standard and that more and more tools are becoming
> available, I have a feeling that xmlpickle is a good
> idea in the sense of making Python buzz word forward
> compatible ;-)
I'm not saying that an XML pickle variant isn't useful,
but that it's not going to be very useful for interoperability.
I think that for interoperability, application-specific
XML formats that don't need to be as complete as pickle
are more useful.
> Of course, a third party tool won't be able to handle arbitrary
> Python pickles, but for a quick transfer of object data or
> together with a semantic style sheet xmlpickles should make
> a good inter-application transport encoding between closely
> related software, e.g. Python on one side, C++ on the other.
>
> I wonder how well SOAP would handle pickling arbitrary
> Python objects...
In particular, I wonder if it tries to be complete. I haven't
really looked at SOAP lately. In my experience, RPC mechanisms
don't really need or try to handle arbitrarily complex objects.
OTOH, lots of applications don't need complete transfer.
> > > After looking at the archives of this SIG, I found that the
> > > idea was already tossed around a few times, but I couldn't
> > > find any downloadble outcome.
> >
> > Zope has a facility that I've been meaning to make more
> > generally available but haven't had time to. :/
> > In my case, I wanted to be able to convert to/from binary
> > pickles and xml, so I had an intern write something that
> > works from pickles, rather than from objects. It can be used
> > to look at existing pickles and can be used, in conjunction with
> > pickle or cPickle to convert objects to and from XML.
> >
> > If your interested, let me know and I'll provide more details.
>
> I've had a look at ppml.py in Zope, but didn't really
> grok the idea behind it -- it's completely undocumented
> and contains some really weird callbacks :-/
Yes, well, if your interested in pusuing it, I'll provide more
info.
> My general idea for xmlpickle is to come up with a format that
> is human readable and editable, i.e. literal representations
> should be used in favour of binary ones (size is not a problem;
> speed can later be added via a C extension).
OK. Obviously, a gif image needs to be encoded. We could certainly
modify the algorithm that decides between repr and base64 to give
more prference to repr.
> > > I've looked at pickle.py a bit and realized that the extensible
> > > nature of the pickle mechanism would probably cause trouble
> > > because the DTD would have to be generated as well (not a good
> > > idea).
> >
> > Why would a DTD have to be generated?
>
> If you take the first path (see below; one element per pickle'able
> type), then you'd have to regenerate the DTD in case new types
> were registered through copy_reg.
Yes, but why do you need t DTD. Lots of people don't
seem to use DTDs and DTDs don't work very well with namsspaces.
(snip)
> >
> > Note that this is pretty much a straight translation of
> > the Python pickle "schema". :)
>
> This looks pretty much like what I had in mind (this is
> what ppml.py generates, right ?
Right.
>). The only part I don't
> like about ppml.py's approach is that it pickles e.g.
> integers to a binary format.
Nah:
<pickle> <int>123</int> </pickle>
> > Note the id attributes
> > and reference tags, which allow cyclical data structures.
>
> Way cool, yes :-)
>
> > (I recently discovered that there is a problem with my id
> > values. Does anyone know what it is? ;)
> >
> > One other note. I found the XML spec to be a little
> > ambigouos (or maybe I'm just too dense) wrt binary data
> > and newlines, so I decided to punt and escape newlines and
> > binary data. I encode strings as either "repr" which is a
> > repr like encoding that escapes things in a way that is
> > just a tad more terse than repr. I switch to base64 when
> > the escaping penalty exceeds 40%.
>
> I don't really care about size... my goal is keeping data editable
> and human readable -- this also makes writing backends in
> other languages a lot easier.
So we could add some tuning to this. Note that the goal is not to
reduce size, but to detect "binary" data. Python doesn't make
a distinction between binary and text, but base64 is probably
a much better way to encode truly binary data.
> > Since alot of our pickles
> > have marked up text, I automatically use CDATA sections when
> > I can and where it would help. See the example above.
>
> How robust is this CDATA wrapping ? What if the data itself
> is XML and contains a CDATA section ?
Then it's not used. We will only use CDATA if we can.
Jim