[lxml-dev] Ebbeding image data in XML
I restart a thread here on this subject as in the meantime one of the SMPT servers of free.fr got onto a blacklist which means I can't post to XML-SIG anymore. Alexandre Fayolle wrote:
On Tue, Dec 20, 2005 at 07:33:29PM +0100, Werner F. Bruhin wrote:
I would like to embed an image in an XML file.
I was thinking of using PIL.tostring() and put it into the imageasstring element.
<imagedata> <imagemode>RGB</imagemode> <imagesize>(540, 982)</imagesize> <imageasstring>HERE GOES IMAGE DATA</imageasstring> </imagedata>
Currently I use lxml.etree and get encoding errors when I write the image into the element.
I would like the file to XML Schema complient, so what "xs:" type should one use for the imageasstring element?
You'll need to encode the imagestring data to base64, and use the appropriate type.
Yeap, that works. Also the resulting file is huge (e.g. a 88kb JPEG results in a XML of over 2MB). I brought this down by doing a PIL.image.thumbnail, but are there some other ways to reduce the XML file size? I also noticed that "</imageasstring>" is "overlapped" by the last line of the image data. This happens if I use (using lxml 0.8): def addImageElement(doc, name, value): new = et.SubElement(doc, name) new.text = value return new If I do this: def addImageElement(doc, name, value): new = et.SubElement(doc, name) new.text = value + '\n' return new Then it shows fine in Altova XMLSpy, but I wonder if this is the right thing to do. Another question I have is using the above solution (PIL and base64 encoding) is the data (within "imageasstring") usable on other developement platforms, e.g. Delphi, Java. Could they read this data and create an image file from it and vic versa put data into the XML with whatever tools they have and I could read it and create an image file. Thanks Werner P.S. is there some rough road map for future releases?
Hi, On Wed, 2005-12-21 at 11:54 +0100, Werner F. Bruhin wrote: [...]
You'll need to encode the imagestring data to base64, and use the appropriate type.
Yeap, that works. Also the resulting file is huge (e.g. a 88kb JPEG results in a XML of over 2MB). I brought this down by doing a PIL.image.thumbnail, but are there some other ways to reduce the XML file size?
Yes, zip it either using Libxml2's gzip-compression support or as an additional layer on your side.
I also noticed that "</imageasstring>" is "overlapped" by the last line of the image data.
With 'overlapped', you mean something like "b64b64b64gestring>" ?
This happens if I use (using lxml 0.8): def addImageElement(doc, name, value): new = et.SubElement(doc, name) new.text = value return new
If I do this: def addImageElement(doc, name, value): new = et.SubElement(doc, name) new.text = value + '\n' return new
Then it shows fine in Altova XMLSpy, but I wonder if this is the right thing to do.
Adding an '\n' from the viewpoint of W3C XML Schema datatypes is OK, since the whitespace of xs:base64Binary is collapsed. This means: "all leading and trailing whitespace will be stripped, and all internal whitespace collapsed to single space characters". http://www.w3.org/TR/xmlschema-2/#base64Binary A base64 en/decoder will ignore whitespace, so no problem here as well.
Another question I have is using the above solution (PIL and base64 encoding) is the data (within "imageasstring") usable on other developement platforms, e.g. Delphi, Java. Could they read this data and create an image file from it and vic versa put data into the XML with whatever tools they have and I could read it and create an image file.
As long as you have the components to process the image's binary format, then yes; base64 encoding/decoding should be available with e.g. Delphi (e.g. Indy components) and Java. Note that your approach of putting the binary data into an XML file is a rather uncommon approach, as one normally tends to attach the binary resources via references; i.e. something like <image-ref>mypic.jpg</image-ref>. Regards, Kasimier
Hi Kasimier, BTW, any chance of getting the lists mail to define the list as reply to address? So, doing reply will reply to the list and not the person having posted. (or is this a Mozilla setting which I am overlooking). Kasimier Buchcik wrote:
Hi,
On Wed, 2005-12-21 at 11:54 +0100, Werner F. Bruhin wrote:
[...]
You'll need to encode the imagestring data to base64, and use the appropriate type.
Yeap, that works. Also the resulting file is huge (e.g. a 88kb JPEG results in a XML of over 2MB). I brought this down by doing a PIL.image.thumbnail, but are there some other ways to reduce the XML file size?
Yes, zip it either using Libxml2's gzip-compression support or as an additional layer on your side.
I am not sure I understand, but will look up in the libxml2 doc. But I think I prefer going with your solution of the href, but see below.
I also noticed that "</imageasstring>" is "overlapped" by the last line of the image data.
With 'overlapped', you mean something like "b64b64b64gestring>" ?
I attached an image.
This happens if I use (using lxml 0.8): def addImageElement(doc, name, value): new = et.SubElement(doc, name) new.text = value return new
If I do this: def addImageElement(doc, name, value): new = et.SubElement(doc, name) new.text = value + '\n' return new
Then it shows fine in Altova XMLSpy, but I wonder if this is the right thing to do.
Adding an '\n' from the viewpoint of W3C XML Schema datatypes is OK, since the whitespace of xs:base64Binary is collapsed. This means: "all leading and trailing whitespace will be stripped, and all internal whitespace collapsed to single space characters". http://www.w3.org/TR/xmlschema-2/#base64Binary A base64 en/decoder will ignore whitespace, so no problem here as well.
Great.
Another question I have is using the above solution (PIL and base64 encoding) is the data (within "imageasstring") usable on other developement platforms, e.g. Delphi, Java. Could they read this data and create an image file from it and vic versa put data into the XML with whatever tools they have and I could read it and create an image file.
As long as you have the components to process the image's binary format, then yes; base64 encoding/decoding should be available with e.g. Delphi (e.g. Indy components) and Java.
Note that your approach of putting the binary data into an XML file is a rather uncommon approach, as one normally tends to attach the binary resources via references; i.e. something like <image-ref>mypic.jpg</image-ref>.
I don't like the base64 approach, but I saw this in some files also they used the CDATA element (which I think is DTD stuff?) You say attach? Is this happening by XML magic or would I have to ensure that "mypic.jpg" is in the same folder and/or uri location? In the XML Schema I have as a choice either base64 or the URL (wouldn't mind to get rid of the base64 stuff), which is defined as: <xs:element name="imageurl" type="xs:anyURI" minOccurs="0"> </xs:element> Is that how you would have defined image-ref? Thanks again for your time Werner
Hi, On Wed, 2005-12-21 at 15:15 +0100, Werner F. Bruhin wrote: [...]
Yeap, that works. Also the resulting file is huge (e.g. a 88kb JPEG results in a XML of over 2MB). I brought this down by doing a PIL.image.thumbnail, but are there some other ways to reduce the XML file size?
Yes, zip it either using Libxml2's gzip-compression support or as an additional layer on your side.
I am not sure I understand, but will look up in the libxml2 doc. But I think I prefer going with your solution of the href, but see below.
There's e.g. xmlSetDocCompressMode(xmlDocPtr doc, int mode) to specify the compression mode for saving: http://www.xmlsoft.org/html/libxml-tree.html#xmlSetDocCompressMode
I also noticed that "</imageasstring>" is "overlapped" by the last line of the image data.
With 'overlapped', you mean something like "b64b64b64gestring>" ?
I attached an image.
I assume the image shows only a snippet of the XML, right? If yes, then it looks normal to me. Have you tried viewing it with a simple text editor? [...]
Note that your approach of putting the binary data into an XML file is a rather uncommon approach, as one normally tends to attach the binary resources via references; i.e. something like <image-ref>mypic.jpg</image-ref>.
I don't like the base64 approach, but I saw this in some files also they used the CDATA element (which I think is DTD stuff?)
http://www.w3.org/TR/2004/REC-xml-20040204/#sec-cdata-sect CDATA is merely out there to ease direct editing of the XML file. Putting base64 into CDATA has zero benefit.
You say attach? Is this happening by XML magic or would I have to ensure that "mypic.jpg" is in the same folder and/or uri location?
The latter. On your side, you would read the URI from the XML and load the image file. If you need not sending the stuff to somewhere then there's no real need to squeeze the image data into a single XML file. The XML file holds the meta data, the image file holds the binary data.
In the XML Schema I have as a choice either base64 or the URL (wouldn't mind to get rid of the base64 stuff), which is defined as:
<xs:element name="imageurl" type="xs:anyURI" minOccurs="0"> </xs:element>
Is that how you would have defined image-ref?
Rather: <xs:choice> <xs:element name="data" type="xs:base64Binary"> <xs:element name="uri" type="xs:anyURI"> </xs:choice> Some people would prefer not to use the 'image' prefix for the tag-names: <image> <mode>RGB</mode> <size>(540, 982)</size> <uri>file:///data/images/pic.jpg</uri> This is a matter of preference of course. I recommend asking XML and its peripherals related questions at xml-dev@lists.xml.org. Regards, Kasimier
Hi Kasimier, Kasimier Buchcik wrote:
Hi,
On Wed, 2005-12-21 at 15:15 +0100, Werner F. Bruhin wrote:
[...]
Yeap, that works. Also the resulting file is huge (e.g. a 88kb JPEG results in a XML of over 2MB). I brought this down by doing a PIL.image.thumbnail, but are there some other ways to reduce the XML file size?
Yes, zip it either using Libxml2's gzip-compression support or as an additional layer on your side.
I am not sure I understand, but will look up in the libxml2 doc. But I think I prefer going with your solution of the href, but see below.
There's e.g. xmlSetDocCompressMode(xmlDocPtr doc, int mode) to specify the compression mode for saving: http://www.xmlsoft.org/html/libxml-tree.html#xmlSetDocCompressMode
I also noticed that "</imageasstring>" is "overlapped" by the last line of the image data.
With 'overlapped', you mean something like "b64b64b64gestring>" ?
I attached an image.
I assume the image shows only a snippet of the XML, right? If yes, then it looks normal to me. Have you tried viewing it with a simple text editor?
No, but as adding the "/n" has no bad side effect that is what I will do, unless I get rid of the embeded image.
[...]
Note that your approach of putting the binary data into an XML file is a rather uncommon approach, as one normally tends to attach the binary resources via references; i.e. something like <image-ref>mypic.jpg</image-ref>.
I don't like the base64 approach, but I saw this in some files also they used the CDATA element (which I think is DTD stuff?)
http://www.w3.org/TR/2004/REC-xml-20040204/#sec-cdata-sect CDATA is merely out there to ease direct editing of the XML file. Putting base64 into CDATA has zero benefit.
You say attach? Is this happening by XML magic or would I have to ensure that "mypic.jpg" is in the same folder and/or uri location?
The latter. On your side, you would read the URI from the XML and load the image file. If you need not sending the stuff to somewhere then there's no real need to squeeze the image data into a single XML file. The XML file holds the meta data, the image file holds the binary data.
Data would be up/downloaded by users, so I'll have to think about this.
In the XML Schema I have as a choice either base64 or the URL (wouldn't mind to get rid of the base64 stuff), which is defined as:
<xs:element name="imageurl" type="xs:anyURI" minOccurs="0"> </xs:element>
Is that how you would have defined image-ref?
Rather: <xs:choice> <xs:element name="data" type="xs:base64Binary"> <xs:element name="uri" type="xs:anyURI"> </xs:choice>
Yes that is what I have, just didn't show the choice in the above.
Some people would prefer not to use the 'image' prefix for the tag-names: <image> <mode>RGB</mode> <size>(540, 982)</size> <uri>file:///data/images/pic.jpg</uri>
This is a matter of preference of course.
I recommend asking XML and its peripherals related questions at xml-dev@lists.xml.org.
Regards,
Kasimier
Thanks again for your help Werner
participants (2)
-
Kasimier Buchcik
-
Werner F. Bruhin