[XML-SIG] Mixed encodings and XML

M.-A. Lemburg mal@lemburg.com
Thu, 14 Dec 2000 12:10:08 +0100


uche.ogbuji@fourthought.com wrote:
> 
> > This is not really related to text encodings, but somewhat similar:
> >
> > Is there a standard way of including binary data in XML files ?
> 
> No.

Rich Salz pointed out in private mail that I could use base64 
as encoding (can '<' and '>' appear in base64 ?). Alas, I would
lose the search capability...

> > I would like to put a complete web-site into a (large) XML file.
> > The XML file should ideally contain not only the structure
> > information, attributes, etc. but also the HTML files, the images
> > and maybe even sound files or flash apps.
> 
> Ah.  This is similar to what the ebXML folks and the SOAP folks were at odds
> over.  Not, this is a well-known deficiency in XML.  The most common
> suggestion is: put it all into one file, separate them with form-feeds, and
> have the application process each bit separately.  Clearly this doesn't suit
> your needs, but there's not much more to go on right now.

Now thats about as non-XML like as it could get: form-feeds
to separate file parts... ;-)
 
> > Is something like this possible or will I have to use some
> > other storage method for the binary parts and reference these
> > from within the XML file (I would prefer not to, so that I can
> > include e.g. the HTML file content in XML searches) ?
> 
> Could you expand on this last bit about the searches?  It hints at what might
> be a work-around if that's your main concern.

I would like to be able to use XML searching machinery to scan
over web site structures. This includes limiting searches to
certain attributes, e.g. keywords or meta-descriptions of the content,
but should also cover full-text search of the content itself.

Even better would be a possible recursive application of this
scheme to embedded XML files, e.g. take a product catalog which
is stored as XML and made available on the site using special
site tools which only show the relevant parts of that file.

I think I would have to provide a special tag

	<content encoding="base64|hex|plain|..." mimetype="...">
	...
	</content>

to enable this.

Thanks,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/