[spambayes-bugs] [ spambayes-Bugs-1051081 ] uncaught socket timeout exception slurping URLs

SourceForge.net noreply at sourceforge.net
Mon Oct 25 18:26:17 CEST 2004


Bugs item #1051081, was opened at 2004-10-20 16:49
Message generated for change (Comment added) made by jmgilligan
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1051081&group_id=61702

Category: None
Group: Source code - CVS
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Jonathan M. Gilligan (jmgilligan)
Assigned to: Tony Meyer (anadelonbrin)
Summary: uncaught socket timeout exception slurping URLs

Initial Comment:
I occasionally get the following failure when spambayes 
tries to slurp an URL while classifying and the URL times 
out.

It would be better to catch the exception and fail 
gracefully to slurp the URL rather than bailing out of the 
whole script.

Slurping bestsellingreplicas.info
Traceback (most recent call last):
  File "C:\Python23\Scripts\sb_imapfilter.py", line 1020, 
in ?
    run()
  File "C:\Python23\Scripts\sb_imapfilter.py", line 1010, 
in run
    imap_filter.Filter()
  File "C:\Python23\Scripts\sb_imapfilter.py", line 879, 
in Filter
    self.unsure_folder, self.ham_folder)
  File "C:\Python23\Scripts\sb_imapfilter.py", line 767, 
in Filter
    evidence=True)
  File "C:\Python23\Lib\site-
packages\spambayes\classifier.py", line 246, in slu
rping_spamprob
    slurp_tokens = list(self._generate_slurp())
  File "C:\Python23\Lib\site-
packages\spambayes\classifier.py", line 559, in _ge
nerate_slurp
    tokens = self.slurp(*slurp_wordstream)
  File "C:\Python23\Lib\site-
packages\spambayes\classifier.py", line 725, in slu
rp
    page = f.read()
  File "C:\Python23\lib\socket.py", line 283, in read
    data = self._sock.recv(recv_size)
socket.timeout: timed out

Running CVS version under python 2.3.4 (#53) under 
Win2K.



----------------------------------------------------------------------

>Comment By: Jonathan M. Gilligan (jmgilligan)
Date: 2004-10-25 11:26

Message:
Logged In: YES 
user_id=11595

You're right. The latest CVS version does not experience this 
problem.

I'm not sure how much slurping helps. I'm getting very good 
discrimination using it, but I haven't compared to working 
without the slurping, so I don't have any good measure of 
how much difference it makes.

My false-negative rate (SPAM misclassified as HAM) is a few 
messages per week and my false positive rate (HAM 
misclassified as SPAM) is much smaller---maybe a few per 
month, with traffic of one to two hundred messages per day, 
with consisting of about 50 SPAM and 50-150 HAM.

Importantly, the false positives are almost entirely opt-in 
bulk email advertising that I want to receive. I have yet to 
see a personal email misclassified as SPAM in over six 
months of SpamBayes use.

This is quite acceptable to me. The only significant problems 
I'm seeing are various things that make the scripts crash.

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-10-20 17:10

Message:
Logged In: YES 
user_id=552329

This should be fixed in classifier.py 1.27.  If you could
CVS up once anon-CVS catches up and try it out, that would
be great.

BTW, do you find that the slurping helps?  It's still an
experimental option for a number of reasons, but one of
those is that we don't really know whether it really helps
anyone.  Feedback would be great!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1051081&group_id=61702


More information about the Spambayes-bugs mailing list