CAN You help Re: Writing dictionary output to a file
Ruud de Jong
ruud.de.jong at consunet.nl
Sat Mar 6 08:57:40 EST 2004
dont bother schreef:
> Hi Jong,
> Yes I really want the location of the number matching
> in the dictionary.
> This is because I have to input these feature vectors
> to another program which takes [index: value ] where
> index: is the value specific to dictionary.
> I dont care about the addition/extension of the words
> in the dictionary but for now, I really want the index
> of the word in the dictionary. This is also equivalent
> to the line number of the word in the dictionary.
OK. Back to basics. You have:
- a dictionary with one word per line
- a message with words that may or may not be
words from the dictionary
- another program that takes [index: value] as input,
and presumably does something useful.
So, you want to have a program that does the following:
for each dictionary word that is present in the message,
output the "index: count", where index is the position of the
word in the dictionary, and count is the number of times
the word occurs in the message.
Side note: your original program divides the count by the
total number of words in the message. Since both are integers,
this division will always give 0. I will ignore this division
for now, but in the actual program you'll need to address that.
Assuming your dictionary is too large to do a search to find the
position of an individual word, you basically need two mappings,
both keyed by actual words:
dictpos = {}, which maps dictionary words to dictionary positions
wordcount = {}, which maps message words to frequence counts
dictpos you can fill from your dictionary file:
for i, w in file('dictionary')
dictpos[w.strip()] = i
(strip removes the trailing newline)
wordcount can be filled from the message, like:
for w in msg.split():
try:
wordcount[w] += 1
except KeyError:
wordcount[w] = 1
Now the output can be generated by:
for w, c in wordcount.iteritems():
try:
print dictpos[w], ':', c
except KeyError:
pass
This output is not sorted according to dictionary position.
If you need such sorting, that you'll have to capture everything
in a list first, and sort list that before printing.
Hope this helps.
Ruud.
More information about the Python-list
mailing list