[Tutor] (no subject) [search engines and vector space models]
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Mon May 5 01:29:06 2003
> > How are search engines created for example google or the word searches
> > in a help index?
>
> This is quite a general question and I am not sure how this is
> connected with learning more about Python.
No problem, we can fix that. *grin*
Cameron, there's a pretty nice article on IBM's Developerworks by David
mertz that talks about the fundamentals on writing a search engine in
Python.
http://www-106.ibm.com/developerworks/xml/library/l-pyind.html
There is also a very cool article from the Perl folks on a different
approach to search engines, by using a "vector space" model:
http://www.perl.com/pub/a/2003/02/19/engine.html?page=1
The engine the Maciej Ceglowski describes sounds really cool; I think you
might like it a lot. (It might make a fun project to implement that Perl
code in Python!) I think you'll find that it'll give you a chance to play
with some new Python modules.
The article mentions the use of a "stemmer" function to transform things
like:
cats --> cat
pets --> pet
I've ported over a similar "Lovins stemmer" that does the same sort of
thing:
http://hkn.eecs.berkeley.edu/~dyoo/python/py_lovins/
Maciej also mentions a Perl module for doing matrix calculations called
PDL, and he uses it to do the vector space stuff. Python has an
equivalent module called Numeric Python:
http://www.pfdubois.com/numpy/
The second page of Maciej's article has lots of awesome references to
other introductory material on search engines and document indexing:
http://www.perl.com/pub/a/2003/02/19/engine.html?page=2
Anyway, I hope these links give you something to chew on. Good luck to
you!