[Python-ideas] os.path.isbinary

Masklinn masklinn at masklinn.net
Wed Jul 31 18:23:47 CEST 2013


On 31 juil. 2013, at 18:02, Oleg Broytman <phd at phdru.name> wrote:
> On Wed, Jul 31, 2013 at 10:40:03AM -0500, Ryan <rymg19 at gmail.com> wrote:
>> Here's something more interesting than my shlex idea.
>> 
>> os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish.
>> 
>> So...
>> 
>> What if os.path had a binary checker that works just like isfile:
>> os.path.isbinary('/nothingness/is/eternal') # Returns boolean
> 
>   What is a binary file? Would Russian text in koi8-r encoding be
> considered binary? What about utf-16? UTF16-encoded files have many
> zero characters. UTF32-encoded have even more.

And the recipe linked is worse than that: even with no nul byte, if more than 30% of the files's bytes aren't ASCII it considers the file binary. 

Files in iso-8859 parts 5 to 8 (Cyrillic, Arabic,  Greek and Hebrew) are pretty much guaranteed to be inferred as binary. Part 11 (Thai) as well. UTF-8 for any non-Latin script will also be considered binary as the high bit is always set when encoding codepoints outside the ASCII range. 


More information about the Python-ideas mailing list