[Spambayes] full o' spaces

bill parducci bill at parducci.net
Fri Mar 7 17:19:01 EST 2003


i know that fixed length delimiting has been tried, but i wonder how well it would work for something like this if all the non 'a-zA-Z0-9' chars were removed first (basically creating 1 'superword' per region). it would seem to speak to a number of issues like:

s p a c e s  i n  p l a c e s

l.o.w..p.r.o.f.i.l.e,,c,h,a,r,s
and_low_profile_chars

CamelCaseTyping

(bracketing){and}[bracketing] 
(a)(n)(d) (b)(r)(a)(c)(k)(e)(t)(i)(n)(g)

fence|posting|!fence!posting

this is the direction of thinking that i started down when i was first confronted with this because the power of wetware to absorb a MEME; it led me to many hours of fruitless delimiter selection examination. this is not at all to say that this will be the case here but as new ideas are bandied about, i posit that it is a good idea to make sure that previously discarded methodologies be reexamined periodically.

b




More information about the Spambayes mailing list