Looking for lots of words in lots of files

Francis Girard francis.girardpython at gmail.com
Wed Jun 18 17:10:15 CEST 2008


Use a suffix tree. First make yourself a suffix tree of your thousand files
and the use it.
This is a classical problem for that kind of structure.

Just search "suffix tree" or "suffix tree python" on google to find a
definition and an implementation.

(Also Jon Bentley's "Programming Pearls" is a great book to read)


Francis Girard

2008/6/18 brad <byte8bits at gmail.com>:

> Just wondering if anyone has ever solved this efficiently... not looking
> for specific solutions tho... just ideas.
> I have one thousand words and one thousand files. I need to read the files
> to see if some of the words are in the files. I can stop reading a file once
> I find 10 of the words in it. It's easy for me to do this with a few dozen
> words, but a thousand words is too large for an RE and too inefficient to
> loop, etc. Any suggestions?
> Thanks
> --
> http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080618/f5e906c6/attachment.html>

More information about the Python-list mailing list