[XML-SIG] Anything else to go in?
Ken MacLeod
ken@bitsko.slc.ut.us
Thu, 15 Oct 1998 13:40:56 -0500 (CDT)
> * A module to marshal simple Python data types into XML.
> There's still no obvious DTD to choose for this, though; I'm starting
> to think that I should drop xml/marshal.py, and wait until version 1.1
> of the package to add this; perhaps by then one DTD will have emerged
> as the standard for doing this.
I've just completed a draft DTD for Casbah/LDO XML serialization (below).
This DTD can be targeted, so you can be as specific about Python types
as pickle, or interoperable as with Casbah/LDO.
Note specifically that any object can be used as a key in a dictionary,
as Python (and SmallTalk, for example) support. Most other DTDs I've
seen only support string keys.
As an implementation note, LDO's Python binary serialization uses pickle's
`dump' and `load' methods, it also can act as a stream-head so it supports
`flush' as well. The source is in CVS at:
CVSROOT=:pserver:anonymous@ntlug.org:/home/cbsrc/cvsroot
password: anonymous
module: LDO
or viewable at <http://www.ntlug.org/cgi-bin/cvsweb/>.
This is meant to be an open spec, please feel free to comment on it
and make suggestions, either here, on the Casbah list, or
to me.
Thanks,
-- Ken
-------- cut here --------
<!-- ...................................................................... -->
<!-- Lightweight Distributed Objects XML Serialization DTD V0.1 ........... -->
<!-- File ldo-xml.dtd ..................................................... -->
<!-- $Id: ldo-xml.dtd,v 1.1 1998/10/15 18:46:51 kmacleod Exp $ -->
<!-- Copyright 1998 The Casbah Project
<http://www.ntlug.org/casbah/>
Please direct all questions, bug reports, or suggestions for
changes to the casbah@ntlug.org mailing list or to the maintainer:
o Ken MacLeod
<ken@bitsko.slc.ut.us>
-->
<!-- ...................................................................... -->
<!-- This DTD defines an object serialization format for use by
messaging, remote procedure, distributed object system protocols,
and language or application data marshaling needs.
This DTD is intended to be minimal, flexible, reusable, and
targetable. Most applications will want to further specify how
internal objects are represented as types or languages will want
to specify how language-specific features are encoded.
One application that may be used to further specify
representation is the Lightweight Distributed Objects (LDO)
Request Encoding as Objects specification available at the Casbah
LDO web page:
<http://bitsko.slc.ut.us/~ken/casbah/ldo/> (for now)
<http://www.ntlug.org/casbah/ldo/> (soon)
XML Serialization provides four elements for encoding objects, a
`dictionary', a `list', a `value', and a `ref' element.
`dictionary', `list', and `scalar' elements support reusing
content with an `id' attributed that can be referred to using the
`ref' attribute of the `ref' element. `dictionary', `list' and
`value' elements support a `type' attribute to declare the type
or class of the object. `value' elements have an `encoding'
attribute to declare it's encoding (currently either `base64' or
unspecified). `dictionary' elements with a `class' attribute
marks the dictionary as an object, with the keys as field or
property names.
Untyped dictionaries are unordered and may be keyed by any item
and contain any item as values.
Untyped lists are ordered sequences of any items.
Untyped values are 8-bit strings. Strings that contain
characters that are not valid XML characters should be encoded
using MIME BASE64 and the `encoding' attribute should be set to
`base64'.
Note also that Tim Bray is promoting the use of an
`xml:packed="base64"' attribute for generic use.
TBD: Extended attributes are allowed, with or without XML
namespaces. XML namespaces would naturally avoid name space
conflict though :-).
TBD: Element-level extension hasn't been evaluated yet, but we
would like to support it.
TBD: `dictionary', `list', and `value' elements support a
`length' attribute that gives the number of pairs in a
dictionary, the number of elements in a list, or the parsed or
stored length of the data in a `value'.
TBD: I'm not sure `value' is the perfect name to convey what it
contains. Alternatives are `data', `datum', `scalar', or
`primitive'.
TBD: In some cases, it will be desirable to use references
(`ref') simply for compression (reuse of serialized data, such as
dictionary keys) as well as for marshaling objects that are
multiply-referenced. The distinction is not clarified here. One
solution is that applications will use `ref' for simple data
reuse and use an application defined object (via a dictionary) to
store multiple references to an object.
An example serialization of the following value:
record = ( month: 'April', day: 5, year: 1997 )
encode(record, "a day in the life")
would be:
<?xml version="1.0"?>
<!DOCTYPE list
PUBLIC "-//The Casbah Project//DTD LDO XML Serialization V1.0//EN"
"ldo-xml.dtd">
<list>
<dictionary>
<value>month</value><value>April</value>
<value>day</value><value>5</value>
<value>year</value><value>1997</value>
</dictionary>
<value>a day in the life</value>
</list>
-->
<!-- ...................................................................... -->
<!ENTITY % item "(dictionary | list | value | ref)">
<!ELEMENT dictionary (%item;, %item;)* >
<!ATTLIST dictionary
type CDATA #IMPLIED
length CDATA #IMPLIED
id ID #IMPLIED
>
<!ELEMENT list (%item;)* >
<!ATTLIST list
type CDATA #IMPLIED
length CDATA #IMPLIED
id ID #IMPLIED
>
<!ELEMENT value (#PCDATA) >
<!ATTLIST value
type CDATA #IMPLIED
length CDATA #IMPLIED
id ID #IMPLIED
encoding CDATA #IMPLIED
>
<!ELEMENT ref EMPTY >
<!ATTLIST ref
ref IDREF #REQUIRED
>