finding/replacing a long binary pattern in a .bin file

yaipa yaipa at yahoo.com
Wed Jan 19 16:08:57 EST 2005


John,

Thanks for reminding me of the mmap module.  The following worked as
expected.
#--------------------------------------------------------
import mmap

source_data = open("source_file.bin", 'rb').read()
search_data = open("search_data.bin", 'rb').read()
replace_data = open("replace_data.bin", 'rb').read()

# copy source.bin to modified.bin
open("modified.bin", 'wb').write(open("source_file.bin", 'rb').read())

fp = open("modified.bin", 'r+')
mm = mmap.mmap(fp.fileno(), 0)

start_addr = mm.find(search_data)
end_addr =  start_addr + len(replace_data)
mm[start_addr:end_addr] = replace_data

mm.close()
#--------------------------------------------------------

Although, I choose impliment string method approach in the build tool
because there are two occurances of *Pattern* in the .bin file to be
updated and the string method did both in one shot.

Cheers,

--Alan


John Lenton wrote:
> On Wed, Jan 12, 2005 at 10:36:54PM -0800, yaipa wrote:
> > What would be the common sense way of finding a binary pattern in a
> > .bin file, say some 200 bytes, and replacing it with an updated
pattern
> > of the same length at the same offset?
> >
> > Also, the pattern can occur on any byte boundary in the file, so
> > chunking through the code at 16 bytes a frame maybe a problem.  The
> > file itself isn't so large, maybe 32 kbytes is all and the need for
> > speed is not so great, but the need for accuracy in the
> > search/replacement is very important.
>
> ok, after having read the answers, I feel I must, once again, bring
> mmap into the discussion. It's not that I'm any kind of mmap expert,
> that I twirl mmaps for a living; in fact I barely have cause to use
it
> in my work, but give me a break!  this is the kind of thing mmap
> *shines* at!
>
> Let's say m is your mmap handle, a is the pattern you want to find,
> b is the pattern you want to replace, and n is the size of both a and
> b.
>
> You do this:
>
>   p = m.find(a)
>   m[p:p+n] = b
>
> and that is *it*. Ok, so getting m to be a mmap handle takes more
work
> than open() (*) A *lot* more work, in fact, so maybe you're justified
> in not using it; some people can't afford the extra
>
>   s = os.stat(fn).st_size
>   m = mmap.mmap(f.fileno(), s)
>
> and now I'm all out of single-letter variables.
>
> *) why isn't mmap easier to use? I've never used it with something
> other than the file size as its second argument, and with its access
> argument in sync with open()'s second arg.
>
> --
> John Lenton (john at grulic.org.ar) -- Random fortune:
> If the aborigine drafted an IQ test, all of Western civilization
would
> presumably flunk it.
> 		-- Stanley Garn




More information about the Python-list mailing list