[Spambayes] Back to language issue (long)

Matthew Dixon Cowles matt at mondoinfo.com
Sat Mar 29 19:13:25 EST 2003


Dear=20Tim,

>=20How=20interesting.=20=20I=20wonder=20if=20a=20weakness=20of=20spambayes=
=20is=20to=20include
>=20a=20bunch=20of=20gibberish=20tokens=20that=20would=20almost=20surely=20=
not=20be=20in
>=20someone's=20database,=20which=20would=20tend=20to=20drive=20the=20spamp=
rob=20strongly
>=20towards=20unknown=20prob,=20which=20is=20.5=20by=20default...

I=20don't=20think=20it=20is.=20The=20point=20of=20ignoring=20all=20the=20cl=
ues=20but=20the=20most
extreme=20ones=20is=20that=20bland=20or=20gibberish=20words=20are=20unlikel=
y=20to=20be
counted.

I=20think=20that=20the=20problem=20in=20this=20case=20is=20that=20Francois=
=20doesn't=20get
much=20spam=20in=20French.=20If=20he=20did,=20the=20bland=20French=20words=
=20(which=20is
almost=20all=20of=20them=20listed=20in=20the=20clues)=20would=20likely=20be=
=20ignored=20and
the=20ones=20that=20are=20indicative=20of=20this=20sort=20of=20spam=20("arg=
ent",=20"tu=E9",
"gouvernement",=20etc)=20would=20be=20scored=20correctly.

I=20suspect=20that=20the=20error=20is=20just=20a=20matter=20of=20spambayes=
=20not
recognizing=20a=20sort=20of=20spam=20that=20it=20hasn't=20been=20trained=20=
on.

Regards,
Matt




More information about the Spambayes mailing list