[Python-ideas] os.path.isbinary
Terry Reedy
tjreedy at udel.edu
Wed Jul 31 23:23:42 CEST 2013
On 7/31/2013 3:03 PM, Ryan wrote:
> 1.The link I provided wasn't how I wanted it to be.
And there is no 'one way' that will satisfy everyone, or every most
people, as they will have different use cases for 'istext'.
> I was using it as an example to show it wasn't impossible.
It is obviously possible to apply any arbitrary predicate to any object
within its input domain. No one has claimed otherwise that I know of.
> 2.You yourself stated it doesn't work on UTF-8 files. If you wanted one
> that worked on all text files, it wouldn't work right.
The problem is that the problem is ill-defined. Every file is (or can be
viewed as) a sequence of binary bytes. Every file can be interpreted as
a text file encoded with any of the encodings (like at least some
latin-1 encodings, and the IBM PC Graphics encoding) that give a
character meaning to every byte. So, to be strict, every file is both
binary and text. Python allows us to open any file as either binary or
text (with some encoding, with latin-1 one of the possible choices).
The pragmatic question is 'Is this file 'likely' *intended* to be
interpreted as text, given that the creator is a member of our *local
culture*. For the function you referenced, the 'local culture' is
'closed Western European'. For 'closed American', the threshold of
allowed non-ascii text and control chars should be more like 0 or 1%.
For many cultures, the referenced function is nonsensical.
For an open global context, istext would have to try all standard text
encodings and for those that worked, apply the grammar rules of the
languages that normally are encoded with that encoding.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list