Parsing ascii file

Peter Otten __peter__ at web.de
Thu Jun 17 09:42:54 CEST 2004


diablo wrote:

> Hello ,
> 
> I have a file that contains the following data (example) and does NOT have
> any line feeds:
> 
> 11    22    33    44    55    66    77    88    99    00    aa    bb    cc
> dd  ....to 128th byte     11    22    33    44    55    66    77    88   
> 99
> 00    aa    bb    cc    dd .... and so on
> 
> record 1 starts at 0 and finishes at 128, record 2 starts at 129 and
> finishes at 256 and so on. there can be as many as 5000 record per file. I
> would like to parse the file and retreive the value at field at byte 64-65
> and conduct an arithmetical operation on the field (sum them all up).
> 
> Can I do this with python?
> 
> if I was to use awk it would look something like this :
> 
> cat <filename> | fold -w 128 | awk ' { SUM=SUM + substr($0,64,2) } END
> {print SUM}'

Is it an ascii or a binary file? I'm not entire sure from your description.
In the following I assume binary data, but it should be easy to modify the
value() function if those two bytes are ascii digits.

import struct, sys
from itertools import imap

def fold(instream, width=80):
    while 1:
        line = instream.read(width)
        if not line: break
        yield line

def value(line, start=64): # may be an "off by one" bug
    # return int(line[start:start+2]))
    return struct.unpack("h", line[start:start+2])[0]

if __name__ == "__main__":
    try:
        filename = sys.argv[1]
    except IndexError:
        instream = sys.stdin
    else:
        instream = file(filename)

    print sum(imap(value, fold(instream, 128)))

Peter




More information about the Python-list mailing list