[spambayes-bugs] [ spambayes-Patches-830290 ] url detection
SourceForge.net
noreply at sourceforge.net
Sat Oct 25 19:30:09 EDT 2003
Patches item #830290, was opened at 2003-10-26 00:30
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=830290&group_id=61702
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Toby Dickenson (htrd)
Assigned to: Nobody/Anonymous (nobody)
Summary: url detection
Initial Comment:
Ive been looking into a couple of unsures that generated
suprisingly few tokens.... My mail reader detects some text
as links because it begins "www.", but spambayes needed
the http:// prefix too. Replacing this with a skip token was
a big loss.
In fixing that, I found that this re had always matched a
little too much.... It would match urls that start in the
middle of words. It always generated tokens "xxx" and
"url:www" for messages that contained "xxxhttp://www".
That wasnt so bad, but I guess we should avoid
generating the same tokens for messages that contain
"xxxwww."
To address this I have also changed the re to require that
urls must start following a non-alphanumeric character.
Sadly my end result is a much messier re.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=830290&group_id=61702
More information about the Spambayes-bugs
mailing list