My storage method is most efficient when given a pre-sorted list of words, so, in _getclues, I would want wordstream to be sorted first. I guess I'll have to override _getclues, add_msg and friends in my subclass ;-) Which .py file in CVS generates the comparative time test for db and pickle training/classifying? If its not in .cvs, could someone email it to me? Thanks Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements
[Brad]
Which .py file in CVS generates the comparative time test for db and pickle training/classifying?
I don't know whether such a thing exists - I produced my results the old-fashioned way, with a command prompt and a watch. 8-) -- Richie Hindle richie@entrian.com
oh, ok. which test modules did you time? On 2 Dec 2002 at 22:29, Richie Hindle wrote:
[Brad]
Which .py file in CVS generates the comparative time test for db and pickle training/classifying?
I don't know whether such a thing exists - I produced my results the old-fashioned way, with a command prompt and a watch. 8-)
-- Richie Hindle richie@entrian.com
Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements
[Brad]
which test modules did you time?
For training, I ran: hammiebulk.py -g 500-hams.mbox -s 500-spams.mbox -d -p temp.bsddb3 hammiebulk.py -g 500-hams.mbox -s 500-spams.mbox -D -p temp.pickle For classifying, I ran: hammiebulk.py -u 500-hams.mbox -d -p richie-500.bsddb3 hammiebulk.py -u 500-hams.mbox -D -p richie-500.pickle (because I didn't have an mbox of 500 random ham/spam messages to hand). In each of the four cases I ran the command twice and timed the second one. I'm using a hacked version of the software that uses bsddb3 - if you need my patches, let me know. -- Richie Hindle richie@entrian.com
So then, Richie Hindle <richie@entrian.com> is all like:
[Brad]
which test modules did you time?
For training, I ran:
hammiebulk.py -g 500-hams.mbox -s 500-spams.mbox -d -p temp.bsddb3 hammiebulk.py -g 500-hams.mbox -s 500-spams.mbox -D -p temp.pickle
For classifying, I ran:
hammiebulk.py -u 500-hams.mbox -d -p richie-500.bsddb3 hammiebulk.py -u 500-hams.mbox -D -p richie-500.pickle
That is what I did, too. Unix has a "time" command you can put in front of a command line, which will tell you all sorts of neat statistics. I did five runs of each (pickle and non) and averaged the times by hand. Neale
participants (3)
-
Brad Clements -
Neale Pickett -
Richie Hindle