[Tutor] pickle in unicode format

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Tue Apr 5 21:02:01 CEST 2005


> Are you trying to send it off to someone else as a part of an XML
> document?  If you are including some byte string into an XML document,
> you can encode those bytes as base64:
>
> ######
> >>> bytes = 'Fran\xe7ois'
> >>> encodedBytes = bytes.encode('base64')
> >>> encodedBytes
> 'RnJhbudvaXM=\n'
> ######

[note: this is an example of exploratory programming with Python.]


As a followup to this: this does appear to be a standard technique for
encoding binary data in XML.  Apple does this in their property list
implementation.

For example, in Apple's reference documentation on plists:

    http://developer.apple.com/documentation/CoreFoundation
        /Conceptual/CFPropertyLists/index.html


they use an example where they encode the following bytes:

/******/
   // Fake data to stand in for a picture of John Doe.
   const unsigned char pic[kNumBytesInPic] = {0x3c, 0x42, 0x81,
            0xa5, 0x81, 0xa5, 0x99, 0x81, 0x42, 0x3c};
/******/


into an ASCII string.  That string looks like this:

######
    <data>
        PEKBpYGlmYFCPA==
    </data>
######


and although they don't explicitely say it out loud, we can infer that
this is a pass through a base64 encoding, because when we decode that
string through base64:

######
>>> mysteryText = "        PEKBpYGlmYFCPA=="
>>> mysteryText.decode("base64")
'<B\x81\xa5\x81\xa5\x99\x81B<'
>>>
>>>
>>> [hex(ord(byte)) for byte in mysteryText.decode('base64')]
['0x3c', '0x42', '0x81', '0xa5', '0x81', '0xa5', '0x99', '0x81', '0x42',
'0x3c']
######

we get the same bytes back.


(Actually, Apple's documentation does briefly mention that they do use
base64 by default, in:

http://developer.apple.com/documentation/WebObjects/Reference/API5.2.2/com/webobjects/foundation/xml/NSXMLObjectOutput.html#setUseBase64ForBinaryData(boolean)

but that's really obscure.  *grin*)



Anyway, hope that was interesting to folks!



More information about the Tutor mailing list