[spambayes-bugs] [ spambayes-Patches-830290 ] url detection

SourceForge.net noreply at sourceforge.net
Sat Oct 25 19:30:09 EDT 2003


Patches item #830290, was opened at 2003-10-26 00:30
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=830290&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Toby Dickenson (htrd)
Assigned to: Nobody/Anonymous (nobody)
Summary: url detection

Initial Comment:
Ive been looking into a couple of unsures that generated 
suprisingly few tokens.... My mail reader detects some text 
as links because it begins "www.", but spambayes needed 
the http:// prefix too. Replacing this with a skip token was 
a big loss. 
 
In fixing that, I found that this re had always matched a 
little too much.... It would match urls that start in the 
middle of words. It always generated tokens "xxx" and 
"url:www" for messages that contained "xxxhttp://www". 
That wasnt so bad, but I guess we should avoid 
generating the same tokens for messages that contain 
"xxxwww." 
 
To address this I have also changed the re to require that 
urls must start following a non-alphanumeric character. 
 
Sadly my end result is a much messier re.  
 
 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=830290&group_id=61702



More information about the Spambayes-bugs mailing list