Checking for binary data in a string
davea at ieee.org
Fri Jun 19 23:35:16 CEST 2009
Mitko Haralanov wrote:
> I have a question about finding out whether a string contains binary
> In my application, I am reading from a file which could contain
> binary data. After I have read the data, I transfer it using xmlrpclib.
> However, xmlrpclib has trouble unpacking XML which contains binary data
> and my application throws an exception. The solution to this problem is
> to use base64 encoding of the data but I don't know how to check
> whether the encoding will be needed?
> If I read in a string containing some binary data from the file, the
> type of that string is <type 'str'> which is not different from any
> other string, so I can't use that as a check.
> The only other check that I can think of is to check every character in
> the read-in string against string.printable but that will take a long
> Can anyone suggest a better way to handle the check? Thank you in
All the data is binary. But perhaps you mean ASCII (7 bits), or you
mean between 20-7f. or something.
The way I'd tackle it is to build a translation table for your
definition of "binary." Then simply do something like:
if data != data.translate(table):
..... Convert to bin64 or whatever...
The translation table would be defined such that table[ch] == ch for
all ch that are "nonbinary" and table[ch] != ch for all ch that are
"binary." And naturally you only build the table once, and reuse it
on each buffer.
This should be quicker than any for loop you could write, though there
may be other builltin functions that are even quicker. It's a start,
Note that you will probably also be escaping the xml special
characters, such as &, <, and >. So you might get clever about letting
a single translate pass tell you whether the data can be stored
unmodified, then do a second translate to decide which way to modify
it. Whether this is worthwhile depends in part on how often the buffer
fits into which category.
More information about the Python-list