CAN You help Re: Writing dictionary output to a file
Ruud de Jong
ruud.de.jong at consunet.nl
Sat Mar 6 09:03:23 EST 2004
Small but essential correction on my previous post
Ruud de Jong schreef:
> dont bother schreef:
>
>> Hi Jong,
>> Yes I really want the location of the number matching
>> in the dictionary.
>> This is because I have to input these feature vectors
>> to another program which takes [index: value ] where index: is the
>> value specific to dictionary.
>> I dont care about the addition/extension of the words
>> in the dictionary but for now, I really want the index
>> of the word in the dictionary. This is also equivalent
>> to the line number of the word in the dictionary.
>
>
> OK. Back to basics. You have:
>
> - a dictionary with one word per line
> - a message with words that may or may not be
> words from the dictionary
> - another program that takes [index: value] as input,
> and presumably does something useful.
>
> So, you want to have a program that does the following:
>
> for each dictionary word that is present in the message,
> output the "index: count", where index is the position of the
> word in the dictionary, and count is the number of times
> the word occurs in the message.
>
> Side note: your original program divides the count by the
> total number of words in the message. Since both are integers,
> this division will always give 0. I will ignore this division
> for now, but in the actual program you'll need to address that.
>
> Assuming your dictionary is too large to do a search to find the
> position of an individual word, you basically need two mappings,
> both keyed by actual words:
>
> dictpos = {}, which maps dictionary words to dictionary positions
> wordcount = {}, which maps message words to frequence counts
>
> dictpos you can fill from your dictionary file:
>
> for i, w in file('dictionary')
> dictpos[w.strip()] = i
This should obviously be:
for i, w in enumerate(file('dictionary'))
dictpos[w.strip()] = i
>
> (strip removes the trailing newline)
>
> wordcount can be filled from the message, like:
>
> for w in msg.split():
> try:
> wordcount[w] += 1
> except KeyError:
> wordcount[w] = 1
>
> Now the output can be generated by:
>
> for w, c in wordcount.iteritems():
> try:
> print dictpos[w], ':', c
> except KeyError:
> pass
>
> This output is not sorted according to dictionary position.
> If you need such sorting, that you'll have to capture everything
> in a list first, and sort list that before printing.
>
> Hope this helps.
>
> Ruud.
>
More information about the Python-list
mailing list