On Jul 31, 2013 12:22 PM, "Eli Bendersky" <eliben@gmail.com> wrote:
>
>
>
>
> On Wed, Jul 31, 2013 at 8:40 AM, Ryan <rymg19@gmail.com> wrote:
>>
>> Here's something more interesting than my shlex idea.
>>
>> os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish.
>>
>> So...
>>
>> What if os.path had a binary checker that works just like isfile:
>> os.path.isbinary('/nothingness/is/eternal') # Returns boolean
Besides the high chance of false positives, what makes this method (and the problem it tries to solve) so so difficult is that binary files may contain what is considered to be large amounts of text, and text files may contain pieces of binary data.
For example, consider a windows executable file - Much of the data in such a file is considered binary data, but there are defined sections where strings and text resources are stored. Any heuristic algorithm like the one mentioned will be insufficient in such cases.
Although I can't think of a situation off hand where the opposite may be true (binary data embedded in what is considered to be a text file) I'm pretty sure such a situation exists.
>
>
>
> Some time ago I put on a gas mask and dove into the Perl source code to figure out how its "is binary" and "is text" operators work: http://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python/
>
> I would recommend against including such a simplistic heuristic in the Python stdlib.
>
> Eli
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas@python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>