On 31 juil. 2013, at 18:02, Oleg Broytman <phd@phdru.name> wrote:
On Wed, Jul 31, 2013 at 10:40:03AM -0500, Ryan <rymg19@gmail.com> wrote:
Here's something more interesting than my shlex idea.
os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish.
So...
What if os.path had a binary checker that works just like isfile: os.path.isbinary('/nothingness/is/eternal') # Returns boolean
What is a binary file? Would Russian text in koi8-r encoding be considered binary? What about utf-16? UTF16-encoded files have many zero characters. UTF32-encoded have even more.
And the recipe linked is worse than that: even with no nul byte, if more than 30% of the files's bytes aren't ASCII it considers the file binary. Files in iso-8859 parts 5 to 8 (Cyrillic, Arabic, Greek and Hebrew) are pretty much guaranteed to be inferred as binary. Part 11 (Thai) as well. UTF-8 for any non-Latin script will also be considered binary as the high bit is always set when encoding codepoints outside the ASCII range.