[Tutor] Extracting words.. [sort | uniq]
dman
dsh8290@rit.edu
Sun, 24 Mar 2002 16:01:38 -0600
On Sun, Mar 24, 2002 at 01:48:23PM -0800, Danny Yoo wrote:
| > | My first question:
| > |
| > | What do I have to do that each word appears only once in the list,i.e. is
| > | found only once??
| > l = ['JavaScript', 'MacWeek', 'MacWeek', 'CompuServe', 'CompuServe' ]
| > m = []
| > for item in l :
| > if item not in m :
| > m.append( item )
| > l = m
| > print l
|
| To offer a counterpoint: the list structure doesn't stop us from finding
| unique elements effectively. We can also do this uniqueness filter by
| first sorting the words. What ths does is bring all the duplicates right
| next to each other. Once we have this sorted list, just taking its unique
| members is just a matter of picking them out:
Good point! I wasn't thinking of this.
| People who run Unix will recognize this as the "sort | uniq" approach.
For a while (when I read the subject) I thought you were going to
suggest piping the words out to 'uniq' :-).
| For people who are interested: here's a similar problem: "Given a text
| file and an integer K, you are to print the K most common words in the
| file (and the number of their occurences) in decreasing frequency."
<cheater's hint>
Search the tutor archives. With only a couple minor modifications the
answer is already there.
</cheater's hint>
| It's a fun problem to work on, and useful when one is writing an essay and
| wants to avoid overusing words. *grin*
Oh, you want to count words in an essay ... that means filtering out
the LaTeX markup too :-).
mostly just babbling at the moment,
-D
--
I tell you the truth, everyone who sins is a slave to sin. Now a slave
has no permanent place in the family, but a son belongs to it forever.
So if the Son sets you free, you will be free indeed.
John 8:34-36