Determine file type (binary or text)
news at yebu.de
Wed Aug 13 14:39:57 CEST 2003
Michael Peuser schrieb:
> yes there is more than just Unix in the world ;-)
> Windows directories have no means to specify their contents type in any way.
That's even more true with linux/unix, as there is no need to do
any stuff like line-terminator conversion.
> The approved method is using three-letter extensions, though this rule is
> not strictly followed (lot of files without extension nowadays!)
> When I had a similar problem I read 1000 characters, counted the amount of
> <32 and >255 characters and classified it "binary when this qota exceeded
> 20%. I have no idea whether it will work good with chinese unicode files or
> some funny depositories or project files that store uncompressed texts....
based on the idea from Mr. "bromden", why not use mimetypes.MimeTypes()
and guess_type('file://...') and analye the returned string.
This should work on windows / linux / unix / whatever.
> Michael P
> "Sami Viitanen" <none at none.net> schrieb im Newsbeitrag
> news:v7p_a.1558$k4.32814 at news2.nokia.com...
>>Works well in Unix but I'm making a script that works on both
>>Unix and Windows.
>>Win doesn't have that 'file -bi' command.
>>"bromden" <bromden at gazeta.pl.no.spam> wrote in message
>>news:bhd559$ku9$1 at absinth.dialog.net.pl...
>>>>How can I check if a file is binary or text?
>>> >>> import os
>>> >>> f = os.popen('file -bi test.py', 'r')
>>> >>> f.read().startswith('text')
>>>(btw, f.read() returns 'text/x-java; charset=us-ascii\n')
More information about the Python-list