On Thu, Nov 7 2002 Just van Rossum wrote:
François Granger wrote:
after each message you have to wait (up to 10 seconds on my machine with my database) before you can continue. Maybe an explicit "Save database" button is an idea?
With the -d parameter, you can use a anydbm instead of Pickle. With some hack it can probably use gdbm as the anydbm db.
Ok, so I did it. With my current setup anydbm uses dbhash/bsddb, and training (on a single message) performance seems _worse_ than with the pickle (about 20 seconds now, around 10 with pickle). Don't know whether the training itself is slower or updating the database. Training with my entire corpus took many times longer as well. Not to mention that the database is now 20 megs instead of 5... Would gdbm be expected to work faster? (I currently don't even have it.)
The problem with training is that the update_probabilities() method which is called at the end goes through the whole database and updates just about every word. So the whole database is touched and needs to be written to disk. -- Sjoerd Mullender <sjoerd@acm.org>