Scanning a file
pwatson at redlinepy.com
Sat Oct 29 01:09:18 CEST 2005
<pinkfloydhomer at gmail.com> wrote in message
news:1130497567.764104.125110 at g44g2000cwa.googlegroups.com...
>I want to scan a file byte for byte for occurences of the the four byte
> pattern 0x00000100. I've tried with this:
> # start
> import sys
> numChars = 0
> startCode = 0
> count = 0
> inputFile = sys.stdin
> while True:
> ch = inputFile.read(1)
> numChars += 1
> if len(ch) < 1: break
> startCode = ((startCode << 8) & 0xffffffff) | (ord(ch))
> if numChars < 4: continue
> if startCode == 0x00000100:
> count = count + 1
> print count
> # end
> But it is very slow. What is the fastest way to do this? Using some
> native call? Using a buffer? Using whatever?
Here is an attempt at counting and using the mmap facility. There appear to
be some serious backward compatibility issues. I tried Python 2.1 on
Windows and AIX and had some odd results. If you are 2.4.1 or higher that
should not be a problem.
fn = 't.dat'
ss = '\x00\x00\x01\x00'
fp = open(fn, 'rb')
b = mmap.mmap(fp.fileno(), os.stat(fp.name).st_size,
count = 0
foundpoint = b.find(ss, 0)
while foundpoint != -1 and (foundpoint + 1) < b.size():
count = count + 1
foundpoint = b.find(ss, foundpoint + 1)
More information about the Python-list