Need to know if a file as only ASCII charaters

pdpi pdpinheiro at gmail.com
Tue Jun 16 10:17:27 EDT 2009


On Jun 16, 2:17 pm, Dave Angel <da... at ieee.org> wrote:
> Jorge wrote:
> > Hi there,
> > I'm making  a application that reads 3 party generated ASCII files, but some
> > times
> > the files are corrupted totally or partiality and I need to know if it's a
> > ASCII file with *nix line terminators.
> > In linux I can run the file command but the applications should run in
> > windows.
>
> > Any help will be great.
>
> > Thank you in advance.
>
> So, which is the assignment:
>    1) determine if a file has non-ASCII characters
>    2) determine whether the line-endings are crlf or just lf
>
> In the former case, look at translating the file contents to Unicode,
> specifying ASCII as source.  If it fails, you have non-ASCII
> In the latter case, investigate the 'u' attribute of the mode parameter
> in the open() function.
>
> You also need to ask yourself whether you're doing a validation of the
> file, or doing a "best guess" like the file command.

>From your requisites, you're already assuming something that _should_
be ASCII, so it's easiest to check for ASCIIness at the binary level:

Open the file as binary
Loop at the bytes
  exit with error upon reading a byte outside the printable range
(32-126 decimal)
  or any of a number of lower-range exceptions (\n, \t -- not \r since
you want UNIX-style linefeeds)
exit with success if the loop ended cleanly

This supposes you're dealing strictly with ASCII, and not a full 8 bit
codepage, of course.



More information about the Python-list mailing list