Need to know if a file as only ASCII charaters

Lie Ryan lie.1296 at gmail.com
Tue Jun 16 23:30:29 EDT 2009


Scott David Daniels wrote:
> norseman wrote:
>> Scott David Daniels wrote:
>>> Dave Angel wrote:
>>>> Jorge wrote: ...
>>>>> I'm making  a application that reads 3 party generated ASCII files,
>>>>> but some times the files are corrupted totally or partiality and I
>>>>> need to know if it's a ASCII file with *nix line terminators.
>>>>> In linux I can run the file command but the applications should run in
>>>>> windows.
>> you are looking for a \x0D (the Carriage Return) \x0A (the Line feed)
>> combination. If present you have Microsoft compatibility. If not you
>> don't.  If you think High Bits might be part of the corruption, filter
>> each byte with byte && \x7F  (byte AND'ed with hex 7F or 127 base 10)
>> then check for the \x0D \x0A combination.
> 
> Well  ASCII defines a \x0D as the return code, and \x0A as line feed.
> It is unix that is wrong, not Microsoft (don't get me wrong, I know
> Microsoft has often redefined what it likes invalidly).

The \r\n was originally a hack because teletype machines can only do one
thing at a time (i.e. do a line feed NAND carriage return) and trying to
do both at the same time or the wrong order would trigger a bug that
sends a HCF instruction on many ancient teletypes.

Unix decided that in virtual terminal, \r\n is unnecessary and redundant
since VTs can do both in a single instruction.

We can argue that Microsoft is "foolish consistency" here or Unix is
changing standards just for saving a few bytes, but objectively neither
side is right or wrong since the problem was a hack in the first place.

If anyone is wrong, it is Mac that decided to use \r when Unix have
already decided which characters to abandon (or maybe it's their usual
reason: "different just because we want to be different")

<ducks avoiding rotten tomato from Mac fanboys>



More information about the Python-list mailing list