On 7/31/2013 3:03 PM, Ryan wrote:
1.The link I provided wasn't how I wanted it to be.
And there is no 'one way' that will satisfy everyone, or every most people, as they will have different use cases for 'istext'.
I was using it as an example to show it wasn't impossible.
It is obviously possible to apply any arbitrary predicate to any object within its input domain. No one has claimed otherwise that I know of.
2.You yourself stated it doesn't work on UTF-8 files. If you wanted one that worked on all text files, it wouldn't work right.
The problem is that the problem is ill-defined. Every file is (or can be viewed as) a sequence of binary bytes. Every file can be interpreted as a text file encoded with any of the encodings (like at least some latin-1 encodings, and the IBM PC Graphics encoding) that give a character meaning to every byte. So, to be strict, every file is both binary and text. Python allows us to open any file as either binary or text (with some encoding, with latin-1 one of the possible choices). The pragmatic question is 'Is this file 'likely' *intended* to be interpreted as text, given that the creator is a member of our *local culture*. For the function you referenced, the 'local culture' is 'closed Western European'. For 'closed American', the threshold of allowed non-ascii text and control chars should be more like 0 or 1%. For many cultures, the referenced function is nonsensical. For an open global context, istext would have to try all standard text encodings and for those that worked, apply the grammar rules of the languages that normally are encoded with that encoding. -- Terry Jan Reedy