CAN You help Re: Writing dictionary output to a file

Ruud de Jong ruud.de.jong at consunet.nl
Sat Mar 6 14:57:40 CET 2004


dont bother schreef:
> Hi Jong,
> Yes I really want the location of the number matching
> in the dictionary.
> This is because I have to input these feature vectors
> to another program which takes [index: value ] where 
> index: is the value specific to dictionary.
> I dont care about the addition/extension of the words
> in the dictionary but for now, I really want the index
> of the word in the dictionary. This is also equivalent
> to the line number of the word in the dictionary.

OK. Back to basics. You have:

- a dictionary with one word per line
- a message with words that may or may not be
   words from the dictionary
- another program that takes [index: value] as input,
   and presumably does something useful.

So, you want to have a program that does the following:

   for each dictionary word that is present in the message,
   output the "index: count", where index is the position of the
   word in the dictionary, and count is the number of times
   the word occurs in the message.

Side note: your original program divides the count by the
total number of words in the message. Since both are integers,
this division will always give 0. I will ignore this division
for now, but in the actual program you'll need to address that.

Assuming your dictionary is too large to do a search to find the
position of an individual word, you basically need two mappings,
both keyed by actual words:

dictpos = {}, which maps dictionary words to dictionary positions
wordcount = {}, which maps message words to frequence counts

dictpos you can fill from your dictionary file:

for i, w in file('dictionary')
     dictpos[w.strip()] = i

(strip removes the trailing newline)

wordcount can be filled from the message, like:

for w in msg.split():
     try:
         wordcount[w] += 1
     except KeyError:
         wordcount[w] = 1

Now the output can be generated by:

for w, c in wordcount.iteritems():
     try:
         print dictpos[w], ':', c
     except KeyError:
         pass

This output is not sorted according to dictionary position.
If you need such sorting, that you'll have to capture everything
in a list first, and sort list that before printing.

Hope this helps.

Ruud.




More information about the Python-list mailing list