[Numpy-discussion] searching binary data

David Cournapeau cournape at gmail.com
Wed Sep 22 10:40:24 EDT 2010


On Wed, Sep 22, 2010 at 11:25 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> David Cournapeau wrote:
>
>> On Wed, Sep 22, 2010 at 11:10 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> A colleague of mine posed the following problem.  He wants to search
>>> large files of binary data for sequences.
>>>
>>
>> Is there a reason why you cannot use one of the classic string search
>> algorithms applied to the bytestream ?
>>
>
> What would you suggest?  Keep in mind the file is to big to fit into memory
> all at once.

Do you care about speed ? String search and even regular expression
are supposed to work on mmap data, but I have never used them on large
datasets, so I don't know how they would perform.

Otherwise, depending on the data and whether you can afford
pre-computing, algorithms like Knuth Morris Pratt can speed things up.
But I would assume you would have to do it in C to hope any speed gain
compared to python string search .

cheers,

David



More information about the NumPy-Discussion mailing list