Graham's spam filter
oren-py-l at hishome.net
Thu Aug 22 10:24:39 CEST 2002
On Wed, Aug 21, 2002 at 10:48:59PM -0700, Erik Max Francis wrote:
> > What this program momentarily tries to implement is a client/server
> > based protocol with authentication that allows some program to contact
> > the server for classifying text that is passed in, working around the
> > limitation that was discussed on the mailing-list that it is quite bad
> > for response time to always have to reload the database on scanning.
> I don't that this is necessarily true; certainly and without a doubt,
> reloading the _entire_ database each time is a non-starter. The
> possibility of using a gdbm or similar database system might shorten
> those times to very reasonable amounts, but this is something I haven't
> researched yet.
Reloading the entire database is not necessarily a non-starter. If the
database is represented as some kind of hash table in a linear memory block
without using any pointers it can be mmapped. The page cache will take
care of the rest. I think this is easier to implement and manage than a
client-server solution. I won't be surprised if it's faster, too.
This assumes that updating the probabilities database is a batch operation
done periodically that creates a new databsae and then does a rename and
More information about the Python-list