URGENT: REALLY NEED HELP: Feel Helpless

dont bother dontbotherworld at yahoo.com
Tue Mar 9 04:25:05 EST 2004


Hey Friends,
I am stuck up. I have to finish this class project. I
went up trying out python in the only few days I had,
and now I feel a bit nervous about . I have to
complete my project amidst finals. The only problem I
have is in generating feature vectors for spam
classification. The rest I can do myself with a C
engine.
Heres the python code, for generating a dictionary and
a vector . My problem is that when a new email
arrives, I have to parse it, remove html tags and
compare the words in the payload with the words in the
dictionary. (which my program is doing)
If there is a match, I want the exact index of the
word in the dictionary. I have to figure out only
this:
Rest is not so difficult.
I am listing my previous email here, and I will be
really grateful if someone can help me getting around
with this.
I promise myself to be a big python player in the days
ahead.

Thanks
Dont
----------------------------------------------------
# python code for creating dictionary of words from an
input file
import string, StringIO
import mailbox, email, re
import os
import sys
import re
import mailbox
import email.Parser
import email.Message
import getopt


fp=open(sys.argv[1], 'r')

msg=email.message_from_file(fp)

msg=msg.get_payload()

dictpos={}
wordcount={}
#get rid of anything that isn't a letter, and make it
all lowercase:
lower = ''.join(map(chr, range(97, 123)))
fixed_body = msg.translate(65*' '+lower+6*'
'+lower+133*' ')

#words_in_body = fixed_body.split()

msg = fixed_body.split()


for i, w in enumerate(file('dictionary_index')):
	dictpos[w.strip()]=i
	#print i
	#print w

for w in msg:
	try:
		wordcount[w]+=1
		#print wordcount
	except KeyError:
		wordcount[w]=1
		#print wordcount

for w, c in wordcount.iteritems():
	try:
		print dictpos[w],':',c
	except KeyError:
		pass



#print wordcount
#print dictpos
#print '\n'

But this does not give me anything. I get no output at
all. I dont really understand, if this is doing the
matching in the words in the email message with the
words in the dictionary  and "Yes" if it does,  it
should give me the corresponding index.
I have a piece of code, which does check for matching
but the problem as I mentioned, I need the index in
the dictionary not in the index of the word in the
message.

heres the code which gives me the vector, matching the
word in the email message by comparing  with the words
in the dictionary:


import string, StringIO
import mailbox, email, re
import os
import sys
import re
import mailbox
import email.Parser
import email.Message
import getopt



#load up external dictionary:
words = open('dictionary_index', 'r').read().split()
dct = {}
for i in xrange(len(words)):
     dct[words[i]] = i

print dct.values()

#make vector:
vector = {}

fp=open(sys.argv[1], 'r')

msg=email.message_from_file(fp)

msg=msg.get_payload()

#a = float(len(fp))

#a = float(len(words_in_body))


#get rid of anything that isn't a letter, and make it
all lowercase:
lower = ''.join(map(chr, range(97, 123)))
fixed_body = msg.translate(65*' '+lower+6*'
'+lower+133*' ')

#words_in_body = fixed_body.split()

msg = fixed_body.split()

a = float(len(msg))
print a

for i in msg:
     if i in dct:
         try:
             vector[i] += 1

         except:
             vector[i] = 1

for v,i in enumerate(vector):
    vector[i] /= a
    print v,i, vector[i]
    #; if u want to see the word too that was commmon
    #print v, ":",vector[i]


    #rint "\n"

#1.write(s)
#1.close()

-----------------------------------------------

__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com




More information about the Python-list mailing list