Comments: full-text indexing of RDBMS

Thomas Weholt thomas at cintra.no
Tue Jun 27 06:15:34 EDT 2000


Hi,

I`ve cooked up some plans for a simple, note! simple, full-text
index-engine and would like some comments. I got little or no
experience in the matter, some in database and a pretty happy
python-camper, so the things here might be dumb and ineffective, but
since it`s not outthere yet I have to do it myself.

I use PostgreSQL 7.0.2, Darcy`s database-module, latest version, all
under Linux.

I want to do simple searches like "Alien,Widescreen" and the result
should be all records were both "Alien" and "Widescreen" occurr.

This is how I thoguht it could be done :

I get a list of all the tables in a specified database from the
databasemodule, remove all database-system tables, goes thru each
table, select * from table. The result ends up in a dictionary-object.
I check all the values for plain text. Integers and numbers in general
are ignored. A function take the plain text and returns a list of
words found in that record. I put each word in a new table, gives it a
unique id. A record in PostgreSQL has a OID, an uniqe
object-identifier.  The OID is mapped against the ids of the words
that occurred in that record. 

When I look for a set of records I get an intersection of all the ids
where those words appears. I go thru the list of ids and fetches them
from the database.

I might have to keep track of which table the id appears in too. Don`t
know if PostgreSQL can fetch a record based on an oid alone or if the
python-module supports that function. 

The Python part should be pretty simple, but I`d like to hear how
others would do a thing like this too.

I can email any working code to interested parties.

Thanks.

Thomas



More information about the Python-list mailing list