[spambayes-dev] siickkk and deprrravved stufff totallly grossssse

Glenn Brown gbrown at alumni.caltech.edu
Mon Dec 22 11:49:23 EST 2003


I fear my email box is seeing a reliable Spam attack on Bayesian filters,
starting in the past week: the tweaking of spam tokens by repeating
characters.

If spammers use 0-3 repetitions of each letter, a spam token like
"investment" can be spelled 4^10 (a million) different ways.  I don't want
to suffer a million spam messages to train my filter for this one word.

 

A simple solution would be to eliminate character repetitions in the spam
database.  This produces 163 ambiguities out of the 25143 words in the
Solaris /usr/dict/words list of words in the English language, but probably
none of these are spam tokens.  I've appended a list of the ambiguous tokens
below.  For example, "be" represents "be" and "bee".

 

I won't be implementing adding this feature myself, but would sure like to
see this feature in my favorite spam filter.

 

Cheers to all the SpamBayes developers,

--Glenn

 

Alan

Alison

Barnet

Bela

Burt

De

Diane

Douglas

Eliot

Eliot

Emanuel

Gary

Godwin

Greg

Haley

Herman

Kaufman

Kenan

Liget

Lilian

Marieta

Mathews

Matson

McConel

NW

Nichols

Paterson

Philip

SE

SW

Scot

Shafer

Shepard

Simons

Wals

Whitaker

ad

advise

apointe

as

bare

bat

be

bel

below

bel

below

bet

bib

bit

bled

boby

bogy

bon

both

bred

bus

but

canister

canon

canvas

carton

chery

chose

col

coma

con

con

cop

coral

cot

desert

desicate

devise

devote

discus

divorce

dragon

drol

drop

duly

el

el

escape

fed

fel

fiance

filet

fogy

fury

gable

gal

glom

god

gripe

grove

hel

his

hop

hot

i

i

in

inbred

invite

ken

knel

later

legate

lop

lose

lot

mana

marque

mate

met

milenia

mortgage

mot

ne

non

nose

of

pal

parole

pep

pepy

per

pol

pol

pop

pose

put

red

refuge

retire

rifle

robin

rod

rot

salon

sen

shot

slop

son

sped

step

stop

tapa

ten

the

til

to

todle

tol

tor

tot

very

vi

vi

we

wed

whop

willful

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20031222/8aea057d/attachment-0001.html


More information about the spambayes-dev mailing list