[Tutor] (no subject) [search engines and vector space models]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Mon May 5 01:29:06 2003


> > How are search engines created for example google or the word searches
> > in a help index?
>
> This is quite a general question and I am not sure how this is
> connected with learning more about Python.


No problem, we can fix that.  *grin*

Cameron, there's a pretty nice article on IBM's Developerworks by David
mertz that talks about the fundamentals on writing a search engine in
Python.

    http://www-106.ibm.com/developerworks/xml/library/l-pyind.html




There is also a very cool article from the Perl folks on a different
approach to search engines, by using a "vector space" model:

    http://www.perl.com/pub/a/2003/02/19/engine.html?page=1

The engine the Maciej Ceglowski describes sounds really cool; I think you
might like it a lot.  (It might make a fun project to implement that Perl
code in Python!)  I think you'll find that it'll give you a chance to play
with some new Python modules.

The article mentions the use of a "stemmer" function to transform things
like:

    cats --> cat
    pets --> pet

I've ported over a similar "Lovins stemmer" that does the same sort of
thing:

    http://hkn.eecs.berkeley.edu/~dyoo/python/py_lovins/

Maciej also mentions a Perl module for doing matrix calculations called
PDL, and he uses it to do the vector space stuff.  Python has an
equivalent module called Numeric Python:

    http://www.pfdubois.com/numpy/

The second page of Maciej's article has lots of awesome references to
other introductory material on search engines and document indexing:

    http://www.perl.com/pub/a/2003/02/19/engine.html?page=2

Anyway, I hope these links give you something to chew on.  Good luck to
you!