[Expat-discuss] binary xml

Janez Zemva janezz55 at gmail.com
Mon Jul 12 07:51:59 CEST 2010


Good idea... Perhaps a better one would be to simply override the
length parameter expat delivers to the character handler. Now the big
question is... Does expat load all the data between CDATA tags and
_then_ calls the character data handler, even if length is less than
what it loads, i.e.:

<BinaryData length="1000" fixes="0 1 2"><![CDATA[ ..random binary
data...]]></BinaryData>

This way expat parsing would not matter, as length would already be
provided. The "]]>" strings in the binary data are "fixed" into
something else before saving them, the offsets of the fixes are then
stored into the fixes field.

I suppose, if the parse buffer were long enough to accommodate the
entire file, the approach would work. But what if the parse buffer is
not the same size as the file?

Say there is a NULL character as the first character of CDATA. expat
will then report a length of 0, but will it always load the entire
CDATA contents, before calling the character data handler? Did any of
you play with this? I'll have to check myself sooner or later.

2010/7/12 Nick MacDonald <nickmacd at gmail.com>:
> Basically I'm just suggesting some semi-intelligent front end filter:
>
> I'm assuming you might have something like this as input:
>
> <MyXMLFormat>
> <TagContainingBinaryData>
> <![CDATA[
> ..random binary data...
> ..random binary data...
> ]]>
> </TagContainingBinaryData>
> </MyXMLFormat>
>
>
> I'm thinking more like this pseudo-code:
>
> while (more data to process)
> ..read data file into buffer
> ..if (looks like binary data or expecting binary data)
> ....do something useful with binary data that doesn't involve XML parsing
> ..if (looks like XML or expecting it to be XML)
> ....pass data to eXpat for processing
>
> In theory, in the above scenario, your buffer could be as few as *one*
> character...  although honestly I have never tested eXpat out in such
> a scenario, I suspect its highly likely it would work...
>
> By passing known good XML into eXpat and watching its callbacks you
> detect when in your input the data in now binary, and then switch
> accordingly.  The only trick is to know when to switch back to normal
> XML, but I think that's doable...
>
> Nick
>
>
> On Sun, Jul 11, 2010 at 11:07 PM, Janez Zemva <janezz55 at gmail.com> wrote:
>>> Since *YOU* control the supply of data to eXpat... is there any way
>>> you can recognize your scenario and have the binary data shunted to a
>>> different buffer instead of supplying it to eXpat?
>>
>> Yes, I can provide a default handler maybe, for an unrecognized tag?
>> You've had that in mind? I was thinking more along the line of a
>> "binary" character encoding, not utf-8 or ucs-16, or anything else.
>>
>
>
>
> --
> Nick MacDonald
> NickMacD at gmail.com
>


More information about the Expat-discuss mailing list