Scanning a file

Fri Oct 28 11:59:24 EDT 2005

<pinkfloydhomer at gmail.com> wrote in message 
news:1130497567.764104.125110 at g44g2000cwa.googlegroups.com...
>I want to scan a file byte for byte for occurences of the the four byte
> pattern 0x00000100. I've tried with this:
>
> # start
> import sys
>
> numChars = 0
> startCode = 0
> count = 0
>
> inputFile = sys.stdin
>
> while True:
>    ch = inputFile.read(1)
>    numChars += 1
>
>    if len(ch) < 1: break
>
>    startCode = ((startCode << 8) & 0xffffffff) | (ord(ch))
>    if numChars < 4: continue
>
>    if startCode == 0x00000100:
>        count = count + 1
>
> print count
> # end
>
> But it is very slow. What is the fastest way to do this? Using some
> native call? Using a buffer? Using whatever?
>
> /David

How about something like:

#!/usr/bin/env python

import sys

fn = 't.dat'
ss = '\x00\x00\x01\x00'

be = len(ss) - 1        # length of overlap to check
blocksize = 4 * 1024    # need to ensure that blocksize > overlap

fp = open(fn, 'rb')
b = fp.read(blocksize)
found = 0
while len(b) > be:
    if b.find(ss) != -1:
        found = 1
        break
    b = b[-be:] + fp.read(blocksize)
fp.close()
sys.exit(found)