Detecting Binary content in files
davea at ieee.org
Tue Mar 31 21:32:51 CEST 2009
All files are binary, but probably by binary you mean non-text.
There are lots of ways to decide if a file is non-text, but I don't know
of any "standard" way. You can detect a file as not-ascii by simply
searching for any character greater than 0x7f. But that doesn't handle
a UTF-8 file, which is an 8bit text file representing Unicode.
The way I've seen done many times is to search for regular occurrence of
the end-of-line character, and the lack of nulls. Most "binary" files
will have more nulls than linefeeds, and any null could be considered a
marker for a non-text file.
If you're happy with your particular perl script, probably it could be
readily translated to Python.
> I'm wondering if Python has a utility to detect binary content in
> files? Or if anyone has any ideas on how that can be accomplished? I
> haven't been able to find any useful information to accomplish this
> (my other option is to fire off a perl script from within m python
> script that will tell me whether the file is binary), so any pointers
> will be appreciated.
More information about the Python-list