On Jul 31, 2013 8:26 PM, "Ryan" <rymg19@gmail.com> wrote:
>
> I just realized I misexpressed myself...again. I meant ASCII or binary, not text or binary. Kind of like the old FTP programs. The implementation would determine if it was ASCII or binary.
Even so, that raises the question, "Why ASCII? why not Unicode, or any of the other hundreds of text formats out there?"
If this is something to be included into the standard library, a collection used by people from all around the world, some forethought into the backgrounds of it's users should be taken into consideration.
>
> And, the '/nothingness/is/eternal' is a quote from Xemnas in Kingdom Hearts. I was hoping someone would pick it up.
>
>
> Terry Reedy <tjreedy@udel.edu> wrote:
>>
>> On 7/31/2013 3:03 PM, Ryan wrote:
>>>
>>> 1.The link I provided wasn't how I wanted it to be.
>>
>>
>> And there is no 'one way' that will satisfy everyone, or every most
>> people, as they will have different use cases for 'istext'.
>>
>>> I was using it as an example to show it wasn't impossible.
>>
>>
>> It is obviously possible to apply any arbitrary predicate to any object
>> within its input domain. No one has claimed otherwise that I know of.
>>
>>> 2.You yourself stated it doesn't work on UTF-8 files.
>>> If you wanted one
>>> that worked on all text files, it wouldn't work right.
>>
>>
>> The problem is that the problem is ill-defined. Every file is (or can be
>> viewed as) a sequence of binary bytes. Every file can be interpreted as
>> a text file encoded with any of the encodings (like at least some
>> latin-1 encodings, and the IBM PC Graphics encoding) that give a
>> character meaning to every byte. So, to be strict, every file is both
>> binary and text. Python allows us to open any file as either binary or
>> text (with some encoding, with latin-1 one of the possible choices).
>>
>> The pragmatic question is 'Is this file 'likely' *intended* to be
>> interpreted as text, given that the creator is a member of our *local
>> culture*. For the function you referenced, the 'local culture' is
>> 'closed Western European'. For 'closed American', the threshold of
>> allowed non-ascii text and control chars should be more like 0 or 1%.
>> For many cultures, the referenced function is nonsensical.
>>
>> For an open global context, istext would have to try all standard text
>> encodings and for those that worked, apply the grammar rules of the
>> languages that normally are encoded with that encoding.
>
>
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas@python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>