Sending binary pickled data through TCP

David Hirschfield davidh at ilm.com
Fri Oct 13 13:58:09 EDT 2006


Thanks for the great response.

Yeah, by "safe" I mean that it's all happening on an intranet with no 
chance of malicious individuals getting access to the stream of data.

The chunks are arbitrary collections of python objects. I'm wrapping 
them up a little, but I don't know much about the actual formal makeup 
of the data, other than it pickles successfully.

Are there any existing python modules that do the equivalent of pickling 
on arbitrary python data, but do it a lot faster? I wasn't aware of any 
that are as easy to use as pickle, or don't require implementing them 
myself, which is not something I have time for.

Thanks again,
-Dave

Steve Holden wrote:
> David Hirschfield wrote:
>   
>> I have a pair of programs which trade python data back and forth by 
>> pickling up lists of objects on one side (using 
>> pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
>> connection to the receiver, who unpickles the data and uses it.
>>
>> So far this has been working fine, but I now need a way of separating 
>> multiple chunks of pickled binary data in the stream being sent back and 
>> forth.
>>
>> Questions:
>>
>> Is it safe to do what I'm doing? I didn't think there was anything 
>> fundamentally wrong with sending binary pickled data, especially in the 
>> closed, safe environment these programs operate under...but maybe I'm 
>> making a poor assumption?
>>
>>     
> If there's no chance of malevolent attackers modifying the data stream 
> then you can safely ignore the otherwise dire consequences of unpickling 
> arbitrary chunks of data.
>
>   
>> I was going to separate the chunks of pickled data with some well-formed 
>> string, but couldn't that string potentially randomly appear in the 
>> pickled data? Do I just pick an extremely 
>> unlikely-to-be-randomly-generated string as the separator? Is there some 
>> string that will definitely NEVER show up in pickled binary data?
>>
>>     
> I presumed each chunk was of a know structure. Couldn't you just lead of 
> with a pickled integer saying how many chunks follow?
>
>   
>> I thought about base64 encoding the data, and then decoding on the 
>> opposite side (like what xmlrpclib does), but that turns out to be a 
>> very expensive operation, which I want to avoid, speed is of the essence 
>> in this situation.
>>
>>     
> Yes, base64 stuffs three bytes into four (six bits per byte) giving you 
> a 33% overhead. Having said that, pickle isn't all that efficient a 
> representation because it's designed to be portable. If you are using 
> machines of the same type there are almost certainly faster binary 
> encodings.
>
>   
>> Is there a reliable way to determine the byte count of some pickled 
>> binary data? Can I rely on len(<pickled data>) == bytes?
>>
>>     
> Yes, since pickle returns a string of bytes, not a Unicode object.
>
> If bandwidth really is becoming a limitation you might want to consider 
> uses of the struct module to represent things more compactly (but this 
> may be too difficult if the objects being exchanged are at all complex).
>
> regards
>   Steve
>   

-- 
Presenting:
mediocre nebula.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20061013/2906b19d/attachment.html>


More information about the Python-list mailing list