Detecting Binary content in files

Dave Angel davea at ieee.org
Tue Mar 31 21:32:51 CEST 2009


All files are binary, but probably by binary you mean non-text.

There are lots of ways to decide if a file is non-text, but I don't know 
of any "standard" way.  You can detect a file as not-ascii by simply 
searching for any character greater than 0x7f.  But that doesn't handle 
a UTF-8 file, which is an 8bit  text file representing Unicode.

The way I've seen done many times is to search for regular occurrence of 
the end-of-line character, and the lack of nulls.   Most "binary" files 
will have more nulls than linefeeds, and any null could be considered a 
marker for a non-text file.

If you're happy with your particular perl script, probably it could be 
readily translated to Python.

ritu wrote:
> Hi,
>
> I'm wondering if Python has a utility to detect binary content in
> files? Or if anyone has any ideas on how that can be accomplished? I
> haven't been able to find any useful information to accomplish this
> (my other option is to fire off a perl script from within m python
> script that will tell me whether the file is binary), so any pointers
> will be appreciated.
>
> Thanks,
> Ritu
>
>   



More information about the Python-list mailing list