April 17, 2013
9:10 a.m.
On 17 April 2013 21:02, Terri Oda <terri@zone12.com> wrote:
On 13-04-17 6:56 AM, Avik Pal wrote:
Meanwhile It would be much appreciated if someone can direct
me to an labeled dataset available on line.
Leaving aside entirely the question of whether we should (or will)
support any project that requires learning on this scale, as a former anti-spam researcher, I can at least answer this question.
Unfortunately, the answer is largely "good luck with that" -- good labelled email data is surprisingly hard to come by, and that challenge is one of the reasons I stopped doing research in that area.
Don't lose hope Terri, after digging for a couple of hours came
across this and its pretty much updated. http://untroubled.org/spam/