CAN You help Re: Writing dictionary output to a file

Ruud de Jong ruud.de.jong at consunet.nl
Sat Mar 6 09:03:23 EST 2004


Small but essential correction on my previous post

Ruud de Jong schreef:

> dont bother schreef:
> 
>> Hi Jong,
>> Yes I really want the location of the number matching
>> in the dictionary.
>> This is because I have to input these feature vectors
>> to another program which takes [index: value ] where index: is the 
>> value specific to dictionary.
>> I dont care about the addition/extension of the words
>> in the dictionary but for now, I really want the index
>> of the word in the dictionary. This is also equivalent
>> to the line number of the word in the dictionary.
> 
> 
> OK. Back to basics. You have:
> 
> - a dictionary with one word per line
> - a message with words that may or may not be
>   words from the dictionary
> - another program that takes [index: value] as input,
>   and presumably does something useful.
> 
> So, you want to have a program that does the following:
> 
>   for each dictionary word that is present in the message,
>   output the "index: count", where index is the position of the
>   word in the dictionary, and count is the number of times
>   the word occurs in the message.
> 
> Side note: your original program divides the count by the
> total number of words in the message. Since both are integers,
> this division will always give 0. I will ignore this division
> for now, but in the actual program you'll need to address that.
> 
> Assuming your dictionary is too large to do a search to find the
> position of an individual word, you basically need two mappings,
> both keyed by actual words:
> 
> dictpos = {}, which maps dictionary words to dictionary positions
> wordcount = {}, which maps message words to frequence counts
> 
> dictpos you can fill from your dictionary file:
> 
> for i, w in file('dictionary')
>     dictpos[w.strip()] = i

This should obviously be:

for i, w in enumerate(file('dictionary'))
     dictpos[w.strip()] = i

> 
> (strip removes the trailing newline)
> 
> wordcount can be filled from the message, like:
> 
> for w in msg.split():
>     try:
>         wordcount[w] += 1
>     except KeyError:
>         wordcount[w] = 1
> 
> Now the output can be generated by:
> 
> for w, c in wordcount.iteritems():
>     try:
>         print dictpos[w], ':', c
>     except KeyError:
>         pass
> 
> This output is not sorted according to dictionary position.
> If you need such sorting, that you'll have to capture everything
> in a list first, and sort list that before printing.
> 
> Hope this helps.
> 
> Ruud.
> 




More information about the Python-list mailing list