[Python-ideas] os.path.isbinary

Terry Reedy tjreedy at udel.edu
Wed Jul 31 23:23:42 CEST 2013


On 7/31/2013 3:03 PM, Ryan wrote:
> 1.The link I provided wasn't how I wanted it to be.

And there is no 'one way' that will satisfy everyone, or every most 
people, as they will have different use  cases for 'istext'.

> I was using it as an example to show it wasn't impossible.

It is obviously possible to apply any arbitrary predicate to any object 
within its input domain. No one has claimed otherwise that I know of.

> 2.You yourself stated it doesn't work on UTF-8 files. If you wanted one
> that worked on all text files, it wouldn't work right.

The problem is that the problem is ill-defined. Every file is (or can be 
viewed as) a sequence of binary bytes. Every file can be interpreted as 
a text file encoded with any of the encodings (like at least some 
latin-1 encodings, and the IBM PC Graphics encoding) that give a 
character meaning to every byte. So, to be strict, every file is both 
binary and text. Python allows us to open any file as either binary or 
text (with some encoding, with latin-1 one of the possible choices).

The pragmatic question is 'Is this file 'likely' *intended* to be 
interpreted as text, given that the creator is a member of our *local 
culture*. For the function you referenced, the 'local culture' is 
'closed Western European'. For 'closed American', the threshold of 
allowed non-ascii text and control chars should be more like 0 or 1%. 
For many cultures, the referenced function is nonsensical.

For an open global context, istext would have to try all standard text 
encodings and for those that worked, apply the grammar rules of the 
languages that normally are encoded with that encoding.

-- 
Terry Jan Reedy



More information about the Python-ideas mailing list