Sending binary pickled data through TCP
Steve Holden
steve at holdenweb.com
Sat Oct 14 04:05:34 EDT 2006
David Hirschfield wrote:
> Thanks for the great response.
>
> Yeah, by "safe" I mean that it's all happening on an intranet with no
> chance of malicious individuals getting access to the stream of data.
>
> The chunks are arbitrary collections of python objects. I'm wrapping
> them up a little, but I don't know much about the actual formal makeup
> of the data, other than it pickles successfully.
>
> Are there any existing python modules that do the equivalent of pickling
> on arbitrary python data, but do it a lot faster? I wasn't aware of any
> that are as easy to use as pickle, or don't require implementing them
> myself, which is not something I have time for.
>
Marshal may achieve what you want, but on a more limited range of
datatypes than pickle.
regards
Steve
> Thanks again,
> -Dave
>
> Steve Holden wrote:
>
>>David Hirschfield wrote:
>>
>>
>>>I have a pair of programs which trade python data back and forth by
>>>pickling up lists of objects on one side (using
>>>pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket
>>>connection to the receiver, who unpickles the data and uses it.
>>>
>>>So far this has been working fine, but I now need a way of separating
>>>multiple chunks of pickled binary data in the stream being sent back and
>>>forth.
>>>
>>>Questions:
>>>
>>>Is it safe to do what I'm doing? I didn't think there was anything
>>>fundamentally wrong with sending binary pickled data, especially in the
>>>closed, safe environment these programs operate under...but maybe I'm
>>>making a poor assumption?
>>>
>>>
>>>
>>If there's no chance of malevolent attackers modifying the data stream
>>then you can safely ignore the otherwise dire consequences of unpickling
>>arbitrary chunks of data.
>>
>>
>>
>>>I was going to separate the chunks of pickled data with some well-formed
>>>string, but couldn't that string potentially randomly appear in the
>>>pickled data? Do I just pick an extremely
>>>unlikely-to-be-randomly-generated string as the separator? Is there some
>>>string that will definitely NEVER show up in pickled binary data?
>>>
>>>
>>>
>>I presumed each chunk was of a know structure. Couldn't you just lead of
>>with a pickled integer saying how many chunks follow?
>>
>>
>>
>>>I thought about base64 encoding the data, and then decoding on the
>>>opposite side (like what xmlrpclib does), but that turns out to be a
>>>very expensive operation, which I want to avoid, speed is of the essence
>>>in this situation.
>>>
>>>
>>>
>>Yes, base64 stuffs three bytes into four (six bits per byte) giving you
>>a 33% overhead. Having said that, pickle isn't all that efficient a
>>representation because it's designed to be portable. If you are using
>>machines of the same type there are almost certainly faster binary
>>encodings.
>>
>>
>>
>>>Is there a reliable way to determine the byte count of some pickled
>>>binary data? Can I rely on len(<pickled data>) == bytes?
>>>
>>>
>>>
>>Yes, since pickle returns a string of bytes, not a Unicode object.
>>
>>If bandwidth really is becoming a limitation you might want to consider
>>uses of the struct module to represent things more compactly (but this
>>may be too difficult if the objects being exchanged are at all complex).
>>
>>regards
>> Steve
>>
>>
>
> --
> Presenting:
> mediocre nebula.
>
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
More information about the Python-list
mailing list