On Mon, Aug 12, 2013 at 3:32 PM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
On 08/12/2013 03:52 PM, Masklinn wrote:
On 2013-08-12, at 15:42 , Philipp A. wrote:
well, the only remotely valid thing to do is to test if the input data is decodable with any of the encodings python knows.
Most iso-8859 parts can decode any byte (and thus any byte sequence).
Are you sure about the null byte? '\0' But yes, just looking if there is a '\0' in the file isn't a good heuristic either.
I've often used the presence of a NUL in the data as a simple heuristic for "binary file", though only in places where it won't matter (for instance, showing file size in bytes rather than line count - if a binary file happens to have no \0 and its number of \n gets counted, big deal). Otherwise, not worth the hassle of finding out. ChrisA