[Tutor] Extracting words.. [sort | uniq]

Sun, 24 Mar 2002 16:01:38 -0600

On Sun, Mar 24, 2002 at 01:48:23PM -0800, Danny Yoo wrote:
| > | My first question:
| > |
| > | What do I have to do that each word appears only once in the list,i.e. is
| > | found only once??

| > l = ['JavaScript', 'MacWeek', 'MacWeek', 'CompuServe', 'CompuServe' ]
| > m = []
| > for item in l :
| >     if item not in m :
| >         m.append( item )
| > l = m
| > print l
| 
| To offer a counterpoint: the list structure doesn't stop us from finding
| unique elements effectively.  We can also do this uniqueness filter by
| first sorting the words.  What ths does is bring all the duplicates right
| next to each other.  Once we have this sorted list, just taking its unique
| members is just a matter of picking them out:

Good point!  I wasn't thinking of this.

| People who run Unix will recognize this as the "sort | uniq" approach.

For a while (when I read the subject) I thought you were going to
suggest piping the words out to 'uniq' :-).

| For people who are interested: here's a similar problem: "Given a text
| file and an integer K, you are to print the K most common words in the
| file (and the number of their occurences) in decreasing frequency."

<cheater's hint>
Search the tutor archives.  With only a couple minor modifications the
answer is already there.
</cheater's hint>

| It's a fun problem to work on, and useful when one is writing an essay and
| wants to avoid overusing words.  *grin*

Oh, you want to count words in an essay ... that means filtering out
the LaTeX markup too :-).

mostly just babbling at the moment,
-D

-- 

I tell you the truth, everyone who sins is a slave to sin.  Now a slave
has no permanent place in the family, but a son belongs to it forever.
So if the Son sets you free, you will be free indeed.
        John 8:34-36