[Numpy-discussion] searching binary data

Robert Kern robert.kern at gmail.com
Wed Sep 22 10:29:20 EDT 2010


On Wed, Sep 22, 2010 at 09:10, Neal Becker <ndbecker2 at gmail.com> wrote:
> A colleague of mine posed the following problem.  He wants to search large
> files of binary data for sequences.
>
> I thought of using mmap (to avoid reading all data into memory at once) and
> then turning this into a numpy array (using buffer=).
>
> But, how to then efficiently find a sequence?

mmap objects have most of the usual string methods:


[~]
|2> f = open('./scratch/foo.py', 'r+b')

[~]
|4> m = mmap.mmap(f.fileno(), 0)

[~]
|6> m.find('import')
11

[~]
|7> m[11:17]
'import'


-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the NumPy-Discussion mailing list