[spambayes-bugs] [ spambayes-Bugs-1051081 ] uncaught socket timeout exception slurping URLs

SourceForge.net noreply at sourceforge.net
Mon Oct 25 18:26:17 CEST 2004

Bugs item #1051081, was opened at 2004-10-20 16:49
Message generated for change (Comment added) made by jmgilligan
You can respond by visiting: 

Category: None
Group: Source code - CVS
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Jonathan M. Gilligan (jmgilligan)
Assigned to: Tony Meyer (anadelonbrin)
Summary: uncaught socket timeout exception slurping URLs

Initial Comment:
I occasionally get the following failure when spambayes 
tries to slurp an URL while classifying and the URL times 

It would be better to catch the exception and fail 
gracefully to slurp the URL rather than bailing out of the 
whole script.

Slurping bestsellingreplicas.info
Traceback (most recent call last):
  File "C:\Python23\Scripts\sb_imapfilter.py", line 1020, 
in ?
  File "C:\Python23\Scripts\sb_imapfilter.py", line 1010, 
in run
  File "C:\Python23\Scripts\sb_imapfilter.py", line 879, 
in Filter
    self.unsure_folder, self.ham_folder)
  File "C:\Python23\Scripts\sb_imapfilter.py", line 767, 
in Filter
  File "C:\Python23\Lib\site-
packages\spambayes\classifier.py", line 246, in slu
    slurp_tokens = list(self._generate_slurp())
  File "C:\Python23\Lib\site-
packages\spambayes\classifier.py", line 559, in _ge
    tokens = self.slurp(*slurp_wordstream)
  File "C:\Python23\Lib\site-
packages\spambayes\classifier.py", line 725, in slu
    page = f.read()
  File "C:\Python23\lib\socket.py", line 283, in read
    data = self._sock.recv(recv_size)
socket.timeout: timed out

Running CVS version under python 2.3.4 (#53) under 


>Comment By: Jonathan M. Gilligan (jmgilligan)
Date: 2004-10-25 11:26

Logged In: YES 

You're right. The latest CVS version does not experience this 

I'm not sure how much slurping helps. I'm getting very good 
discrimination using it, but I haven't compared to working 
without the slurping, so I don't have any good measure of 
how much difference it makes.

My false-negative rate (SPAM misclassified as HAM) is a few 
messages per week and my false positive rate (HAM 
misclassified as SPAM) is much smaller---maybe a few per 
month, with traffic of one to two hundred messages per day, 
with consisting of about 50 SPAM and 50-150 HAM.

Importantly, the false positives are almost entirely opt-in 
bulk email advertising that I want to receive. I have yet to 
see a personal email misclassified as SPAM in over six 
months of SpamBayes use.

This is quite acceptable to me. The only significant problems 
I'm seeing are various things that make the scripts crash.


Comment By: Tony Meyer (anadelonbrin)
Date: 2004-10-20 17:10

Logged In: YES 

This should be fixed in classifier.py 1.27.  If you could
CVS up once anon-CVS catches up and try it out, that would
be great.

BTW, do you find that the slurping helps?  It's still an
experimental option for a number of reasons, but one of
those is that we don't really know whether it really helps
anyone.  Feedback would be great!


You can respond by visiting: 

More information about the Spambayes-bugs mailing list