Determine file type (binary or text)
brian_l at yahoo.com
Thu Aug 14 08:32:34 CEST 2003
Peter Hansen <peter at engcorp.com> wrote in message news:<3F3A8275.8B6EE8C4 at engcorp.com>...
> "Contains only printable characters" is probably a more useful definition
> of text in many cases. I can't say off the top of my head exactly when
> either definition might be a problem.... wait, how about this one: in
> CVS, if you don't have a file that is effectively line-oriented, human
> readable information, you probably don't want to let it be treated as
> "text" and stored as diffs. In that situation, "contains primarily
> printable characters organized in lines" is probably a more thorough,
> though less deterministic, definition.
We check for binary files in our CVS commitprep script like this:
look for -kb arg
open the file in binary mode, read 4k fom the file and...
for i in range(len(buff)):
a = ord(buff[i])
if (a < 8) or (a > 13 and a < 32) or (a > 126):
non_text = non_text + 1
If 10 percent of the characters are found to be non-text, we reject
the file if it was not commited with the -kb flag, or print a warning
if the file appears to be text but is being checked in as a binary.
We don't bother checking for charsets other than ascii, because
localized files have to be checked in as binaries or bad things
More information about the Python-list