Python version of perl's "if (-T ..)" and "if (-B ...)"?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Fri Feb 12 12:20:06 EST 2010
On Fri, 12 Feb 2010 15:14:07 +0100, Christian Heimes wrote:
> Lloyd Zusman wrote:
>> .... The -T and -B switches work as follows. The first block or so
>> .... of the file is examined for odd characters such as strange control
>> .... codes or characters with the high bit set. If too many strange
>> .... characters (>30%) are found, it's a -B file; otherwise it's a -T
>> .... file. Also, any file containing null in the first block is ....
>> considered a binary file. [ ... ]
>
> That's a butt ugly heuristic that will lead to lots of false positives
> if your text happens to be UTF-16 encoded or non-english text UTF-8
> encoded.
And a hell of a lot of false negatives if the file is binary.
The way I've always seen it, a file is binary if it contains a single
binary character *anywhere* in the file.
--
Steven
More information about the Python-list
mailing list