[Tutor] Re: [quicky intro to vector search engines]
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Mon Jul 21 13:37:19 2003
On Sun, 20 Jul 2003, Alexandre Ratti wrote:
> Vector search engines looked fun; I just had to give it a try :-) I
> uploaded a basic implementation to:
>
> http://www.gabuzomeu.net/alex/py/vsse/SearchEngine.zip
Hi Alexandre,
Very cool; I will have to take a look at this!
> >>> app.search("beowulf cluster", 0.1)
> Searching in 304 files...
> ---------------------
> Beowulf-HOWTO.txt 34.56%
> SSI-UML-HOWTO.txt 26.92%
> openMosix-HOWTO.txt 18.16%
> Cluster-HOWTO.txt 12.80%
> Parallel-Processing-HOWTO.txt 11.69%
> CPU-Design-HOWTO.txt 10.59%
>
> Memory usage is quite high (about 100 MB for the PythonWin process).
> When saving the index instance to a file as a binary pickle, the file is
> quite large too (70 MB).
I've been reading a little more about Maciej Ceglowski's work on vector
search engines; I've been collecting some of my notes here:
http://hkn.eecs.berkeley.edu/~dyoo/python/svd/
The "Latent Semantic Analysis" technique that Maciej briefly mentions at
the end of his article talks about ways of compressing the vector space
using some vector techniques. At the moment, I don't yet feel comfortable
enough with the linear algebra to understand SVD yet, but I can collect
links pretty well. *grin* If I have time, I'll see if I can cook up a
wrapper module for SVDPACK.
Talk to you later!