Advice for a little search engine

Max Haas Max.Haas at unibas.ch
Sun Apr 15 08:17:12 EDT 2001


Hi all,

I tried to implement a little search engine. Python is not the language I
know thoroughly and I¹ve never written a search engine in any language.
That¹s why I would like to ask (after having read as many Python documents I
could) if the following sketch is somewhat reasonable or if there are basic
mistakes. (The program runs but it should be as fast as possible! Hm. Not
just a good job for an outsider.)

Problem: 738 files with Latin text. Every file represents a singular source.
Find every match of a term. (I need e.g. structura, structuram, structurae
etc.)

To do: Enter a question in form of a regex and look for all occurrences.  If
there is a match: 

(a) if it¹s the first match then give the contents of the file (in the
canonical form of those who prepared the files this is always line 10-15)
and then 
(aa) give every match with (user defined) x words before the match, the
matched word, and y words after the match;

(b) if there was already a match in this source then do only (aa)

The program:

1. Enter the question and x and y. The question will then be the compiled
object p.
2. Read every file in (something like fp.readlines()).
3. Transform the file to a string (string.join(list_of_file)).
4. m = p.findall(string). If m is not None then:
a. Give the file contents (lines 10-15)
b. look for the occurrence of every word in m and note the position (with
string.find)
c.  Give x words before the matched word, the matched word and then y words
after
...

The main problem for me is: do I understand correctly the function of
p.findall(string) in combination with string.find?

Many thanks in advance

Max






More information about the Python-list mailing list