[spambayes-dev] Mozilla SpamBayes "porting"

Miguel miguel at vargas.com
Fri Feb 20 18:13:50 EST 2004


OK, I made all the suggested changes and re-tested.  The fn rate dropped by half, which is amazing considering that it 
was already about half of the original.  Unfourtunately, the fp rate did not improve and might have even gone up a bit.

To try to pinpoint my problem I've been trying to debug into classifier.py and feed it some numbers.  Unfourtunately I 
don't know my way around the python debugger very well so I haven't been able to pull this off.

Is there a kind Python soul in here that could help me with this?
Feed these numbers into classifier.py and see if you get the same results

ngood = 861, nbad = 759

spam score = 0.809734

token 1: hamcount = 13 spamcount = 103, prob=0.898333
token 2: hamcount = 44 spamcount = 99, prob=0.717812
token 3: hamcount = 22 spamcount = 96, prob=0.830673
token 4: hamcount = 0 spamcount = 0, discarded
token 5: hamcount = 5802 spamcount = 4680, discarded
token 6: hamcount = 0 spamcount = 0, discarded
token 7: hamcount = 1 spamcount = 3, prob=0.745295
token 8: hamcount = 513 spamcount = 1353, prob=0.749430
token 9: hamcount = 0 spamcount = 1, prob=0.844828
token 10: hamcount = 2440 spamcount = 908, prob=0.296862
token 11: hamcount = 1079 spamcount = 901, discarded
token 12: hamcount = 320 spamcount = 305, discarded
token 13: hamcount = 1 spamcount = 0, prob=0.155172
token 14: hamcount = 1 spamcount = 1, discarded
token 15: hamcount = 0 spamcount = 0, discarded
token 16: hamcount = 2986 spamcount = 6224, prob=0.702770
token 17: hamcount = 2272 spamcount = 852, prob=0.298469
token 18: hamcount = 3774 spamcount = 2822, discarded
token 19: hamcount = 1878 spamcount = 1929, discarded
token 20: hamcount = 23 spamcount = 17, discarded
token 21: hamcount = 25 spamcount = 15, discarded
token 22: hamcount = 4374 spamcount = 4524, discarded
token 23: hamcount = 231 spamcount = 215, discarded
token 24: hamcount = 0 spamcount = 0, discarded
token 25: hamcount = 32 spamcount = 120, prob=0.808753
token 26: hamcount = 1075 spamcount = 1231, discarded
token 27: hamcount = 995 spamcount = 628, discarded
token 28: hamcount = 2 spamcount = 0, prob=0.091837
token 29: hamcount = 0 spamcount = 0, discarded
token 30: hamcount = 915 spamcount = 514, prob=0.389251
token 31: hamcount = 0 spamcount = 0, discarded
token 32: hamcount = 6051 spamcount = 5895, discarded
token 33: hamcount = 6 spamcount = 30, prob=0.845796
token 34: hamcount = 2409 spamcount = 1251, prob=0.370725
token 35: hamcount = 324 spamcount = 620, prob=0.684528
token 36: hamcount = 791 spamcount = 735, discarded
token 37: hamcount = 355 spamcount = 1660, prob=0.841306
token 38: hamcount = 7425 spamcount = 4191, prob=0.390359
token 39: hamcount = 311 spamcount = 734, prob=0.727963
token 40: hamcount = 0 spamcount = 0, discarded
token 41: hamcount = 1029 spamcount = 934, discarded
token 42: hamcount = 0 spamcount = 0, discarded
token 43: hamcount = 0 spamcount = 0, discarded
token 44: hamcount = 548 spamcount = 735, prob=0.603372
token 45: hamcount = 3 spamcount = 0, prob=0.065217
token 46: hamcount = 217 spamcount = 132, discarded
token 47: hamcount = 106 spamcount = 58, prob=0.383304
token 48: hamcount = 0 spamcount = 0, discarded
token 49: hamcount = 0 spamcount = 3, prob=0.934783
token 50: hamcount = 5 spamcount = 1, prob=0.206905
token 51: hamcount = 28 spamcount = 234, prob=0.903889
token 52: hamcount = 6939 spamcount = 5645, discarded
token 53: hamcount = 135 spamcount = 502, prob=0.808147
token 54: hamcount = 0 spamcount = 0, discarded
token 55: hamcount = 323 spamcount = 501, prob=0.637544
token 56: hamcount = 10034 spamcount = 9013, discarded
token 57: hamcount = 7 spamcount = 2, prob=0.256930
token 58: hamcount = 0 spamcount = 0, discarded
token 59: hamcount = 10 spamcount = 6, discarded
token 60: hamcount = 21 spamcount = 28, prob=0.601065
token 61: hamcount = 736 spamcount = 1119, prob=0.632955
token 62: hamcount = 784 spamcount = 3206, prob=0.822622
token 63: hamcount = 2428 spamcount = 16895, prob=0.887550
token 64: hamcount = 0 spamcount = 0, discarded
token 65: hamcount = 689 spamcount = 272, prob=0.309399
token 66: hamcount = 80 spamcount = 341, prob=0.828279
token 67: hamcount = 3190 spamcount = 3140, discarded
token 68: hamcount = 0 spamcount = 0, discarded
token 69: hamcount = 116 spamcount = 631, prob=0.860326
token 70: hamcount = 477 spamcount = 442, discarded
token 71: hamcount = 2538 spamcount = 2167, discarded
token 72: hamcount = 30 spamcount = 43, prob=0.618456
token 73: hamcount = 26 spamcount = 7, prob=0.237537
token 74: hamcount = 0 spamcount = 0, discarded
token 75: hamcount = 22 spamcount = 50, prob=0.719156
token 76: hamcount = 210 spamcount = 15, prob=0.075803
token 77: hamcount = 15 spamcount = 172, prob=0.927581
token 78: hamcount = 1557 spamcount = 2412, prob=0.637313
token 79: hamcount = 23 spamcount = 2, prob=0.097039
token 80: hamcount = 1343 spamcount = 326, prob=0.215985
token 81: hamcount = 750 spamcount = 881, discarded
token 82: hamcount = 846 spamcount = 1160, prob=0.608651
token 83: hamcount = 48 spamcount = 36, discarded
token 84: hamcount = 55 spamcount = 47, discarded
token 85: hamcount = 27 spamcount = 28, discarded
token 86: hamcount = 1585 spamcount = 641, prob=0.314526
token 87: hamcount = 114 spamcount = 36, prob=0.264453
token 88: hamcount = 896 spamcount = 724, discarded
token 89: hamcount = 0 spamcount = 0, discarded
token 90: hamcount = 0 spamcount = 0, discarded
token 91: hamcount = 92 spamcount = 76, discarded
token 92: hamcount = 11 spamcount = 178, prob=0.947273
token 93: hamcount = 62 spamcount = 79, discarded
token 94: hamcount = 1937 spamcount = 1618, discarded
token 95: hamcount = 16 spamcount = 44, prob=0.755341
token 96: hamcount = 606 spamcount = 620, discarded
token 97: hamcount = 552 spamcount = 265, prob=0.352659
token 98: hamcount = 473 spamcount = 302, discarded
token 99: hamcount = 3002 spamcount = 5390, prob=0.670692
token 100: hamcount = 0 spamcount = 0, discarded
token 101: hamcount = 8 spamcount = 3, prob=0.306362
token 102: hamcount = 13 spamcount = 2, prob=0.158824
token 103: hamcount = 732 spamcount = 826, discarded
token 104: hamcount = 474 spamcount = 3408, prob=0.890738
token 105: hamcount = 0 spamcount = 0, discarded
token 106: hamcount = 0 spamcount = 0, discarded
token 107: hamcount = 1191 spamcount = 5958, prob=0.850161
token 108: hamcount = 337 spamcount = 3033, prob=0.910735
token 109: hamcount = 258 spamcount = 222, discarded
token 110: hamcount = 1 spamcount = 1, discarded
token 111: hamcount = 300 spamcount = 865, prob=0.765750
token 112: hamcount = 526 spamcount = 1327, prob=0.740998
token 113: hamcount = 7 spamcount = 25, prob=0.797846
token 114: hamcount = 956 spamcount = 1957, prob=0.698961
token 115: hamcount = 2 spamcount = 0, prob=0.091837
token 116: hamcount = 0 spamcount = 0, discarded
token 117: hamcount = 863 spamcount = 1208, prob=0.613559
token 118: hamcount = 469 spamcount = 282, discarded
token 119: hamcount = 230 spamcount = 57, prob=0.219879
token 120: hamcount = 0 spamcount = 0, discarded
token 121: hamcount = 32 spamcount = 33, discarded
token 122: hamcount = 736 spamcount = 840, discarded
token 123: hamcount = 1 spamcount = 0, prob=0.155172
token 124: hamcount = 41 spamcount = 14, prob=0.280994
token 125: hamcount = 3 spamcount = 2, discarded
token 126: hamcount = 10 spamcount = 46, prob=0.836477
token 127: hamcount = 12 spamcount = 115, prob=0.914295
token 128: hamcount = 48 spamcount = 28, prob=0.398815
token 129: hamcount = 388 spamcount = 308, discarded
token 130: hamcount = 2466 spamcount = 1232, prob=0.361746
token 131: hamcount = 0 spamcount = 0, discarded
token 132: hamcount = 61 spamcount = 473, prob=0.897584
token 133: hamcount = 3 spamcount = 11, prob=0.796645
token 134: hamcount = 0 spamcount = 0, discarded
token 135: hamcount = 471 spamcount = 561, discarded
token 136: hamcount = 7218 spamcount = 8674, discarded
token 137: hamcount = 43 spamcount = 34, discarded
token 138: hamcount = 21 spamcount = 29, prob=0.609385
token 139: hamcount = 8 spamcount = 39, prob=0.843574
token 140: hamcount = 0 spamcount = 0, discarded
token 141: hamcount = 27 spamcount = 32, discarded
token 142: hamcount = 23 spamcount = 3, prob=0.135206
token 143: hamcount = 5895 spamcount = 4656, discarded
token 144: hamcount = 11 spamcount = 2, prob=0.181994
token 145: hamcount = 1718 spamcount = 2754, prob=0.645181
token 146: hamcount = 32 spamcount = 4, prob=0.128828
token 147: hamcount = 721 spamcount = 388, prob=0.379109
token 148: hamcount = 45 spamcount = 125, prob=0.758415
token 149: hamcount = 767 spamcount = 1010, discarded
token 150: hamcount = 319 spamcount = 338, discarded
token 151: hamcount = 1071 spamcount = 1628, prob=0.632918
token 152: hamcount = 36 spamcount = 109, prob=0.773655
token 153: hamcount = 188 spamcount = 160, discarded
token 154: hamcount = 0 spamcount = 0, discarded
token 155: hamcount = 87 spamcount = 276, prob=0.782200
token 156: hamcount = 16 spamcount = 3, prob=0.182902
token 157: hamcount = 0 spamcount = 0, discarded
token 158: hamcount = 1106 spamcount = 510, prob=0.343484
token 159: hamcount = 861 spamcount = 759, discarded
token 160: hamcount = 354 spamcount = 326, discarded
token 161: hamcount = 154 spamcount = 791, prob=0.853346
token 162: hamcount = 5 spamcount = 6, discarded
token 163: hamcount = 8 spamcount = 20, prob=0.735524
token 164: hamcount = 543 spamcount = 1595, prob=0.769110
token 165: hamcount = 180 spamcount = 1293, prob=0.890575
token 166: hamcount = 0 spamcount = 0, discarded
token 167: hamcount = 1730 spamcount = 5246, prob=0.774751
token 168: hamcount = 87 spamcount = 19, prob=0.199825
token 169: hamcount = 22 spamcount = 1, prob=0.057689
token 170: hamcount = 35 spamcount = 225, prob=0.878753
token 171: hamcount = 0 spamcount = 0, discarded
token 172: hamcount = 475 spamcount = 495, discarded
token 173: hamcount = 192 spamcount = 86, prob=0.337182
token 174: hamcount = 1723 spamcount = 1518, discarded
token 175: hamcount = 2990 spamcount = 1730, prob=0.396273
token 176: hamcount = 539 spamcount = 4562, prob=0.905636
token 177: hamcount = 0 spamcount = 0, discarded
token 178: hamcount = 1156 spamcount = 1529, prob=0.600049
token 179: hamcount = 0 spamcount = 0, discarded
token 180: hamcount = 3 spamcount = 0, prob=0.065217
token 181: hamcount = 0 spamcount = 0, discarded
token 182: hamcount = 4 spamcount = 2, prob=0.371550
token 183: hamcount = 670 spamcount = 882, discarded
token 184: hamcount = 873 spamcount = 687, discarded
token 185: hamcount = 1 spamcount = 3, prob=0.745295
token 186: hamcount = 5678 spamcount = 8736, prob=0.635741
token 187: hamcount = 4 spamcount = 12, prob=0.765425
token 188: hamcount = 1 spamcount = 0, prob=0.155172
token 189: hamcount = 81 spamcount = 76, discarded
token 190: hamcount = 355 spamcount = 427, discarded
token 191: hamcount = 1149 spamcount = 1266, discarded
token 192: hamcount = 2034 spamcount = 611, prob=0.254198
token 193: hamcount = 110 spamcount = 16, prob=0.142908
token 194: hamcount = 0 spamcount = 0, discarded
token 195: hamcount = 1 spamcount = 0, prob=0.155172
token 196: hamcount = 806 spamcount = 710, discarded
token 197: hamcount = 0 spamcount = 0, discarded
token 198: hamcount = 24 spamcount = 10, prob=0.323296
token 199: hamcount = 55 spamcount = 85, prob=0.636341
token 200: hamcount = 585 spamcount = 342, prob=0.398791
token 201: hamcount = 3 spamcount = 0, prob=0.065217
token 202: hamcount = 28 spamcount = 37, discarded
token 203: hamcount = 1 spamcount = 4, prob=0.793041
token 204: hamcount = 2375 spamcount = 1331, prob=0.388667
token 205: hamcount = 233 spamcount = 43, prob=0.173642
token 206: hamcount = 0 spamcount = 0, discarded
token 207: hamcount = 24 spamcount = 57, prob=0.728036
token 208: hamcount = 28 spamcount = 23, discarded
token 209: hamcount = 4 spamcount = 2, prob=0.371550
token 210: hamcount = 87 spamcount = 98, discarded
token 211: hamcount = 30 spamcount = 222, prob=0.892853
token 212: hamcount = 1167 spamcount = 1133, discarded
token 213: hamcount = 303 spamcount = 168, prob=0.386223
token 214: hamcount = 384 spamcount = 557, prob=0.621935
token 215: hamcount = 10 spamcount = 0, prob=0.021531
token 216: hamcount = 1722 spamcount = 1738, discarded
token 217: hamcount = 3000 spamcount = 2582, discarded
token 218: hamcount = 134 spamcount = 449, prob=0.791487
token 219: hamcount = 138 spamcount = 982, prob=0.889617
token 220: hamcount = 4561 spamcount = 5959, discarded
token 221: hamcount = 966 spamcount = 4736, prob=0.847570
token 222: hamcount = 641 spamcount = 419, discarded
token 223: hamcount = 204 spamcount = 27, prob=0.131259
token 224: hamcount = 13681 spamcount = 10641, discarded
token 225: hamcount = 60 spamcount = 436, prob=0.891457
token 226: hamcount = 862 spamcount = 759, discarded
token 227: hamcount = 56 spamcount = 141, prob=0.740131
token 228: hamcount = 8 spamcount = 5, discarded
token 229: hamcount = 872 spamcount = 758, discarded
token 230: hamcount = 4986 spamcount = 3565, discarded
token 231: hamcount = 8932 spamcount = 8166, discarded
token 232: hamcount = 1090 spamcount = 823, discarded
token 233: hamcount = 29 spamcount = 16, prob=0.386083
token 234: hamcount = 1457 spamcount = 1461, discarded
token 235: hamcount = 472 spamcount = 1564, prob=0.789802
token 236: hamcount = 4052 spamcount = 2179, prob=0.378901
token 237: hamcount = 2325 spamcount = 7625, prob=0.788136
token 238: hamcount = 10 spamcount = 42, prob=0.823721
token 239: hamcount = 11 spamcount = 0, prob=0.019651
token 240: hamcount = 90 spamcount = 21, prob=0.210466
token 241: hamcount = 0 spamcount = 0, discarded
token 242: hamcount = 516 spamcount = 828, prob=0.645379




More information about the spambayes-dev mailing list