how to remove the same words in the paragraph
s.selvamsiva at gmail.com
Wed Nov 4 15:04:27 CET 2009
On Wed, Nov 4, 2009 at 4:27 AM, Tim Chase <python.list at tim.thechases.com>wrote:
> kylin wrote:
>> I need to remove the word if it appears in the paragraph twice. could
>> some give me some clue or some useful function in the python.
> Sounds like homework. To fail your class, use this one:
> >>> p = "one two three four five six seven three four eight"
> >>> s = set()
> >>> print ' '.join(w for w in p.split() if not (w in s or s.add(w)))
> one two three four five six seven eight
> which is absolutely horrible because it mutates the set within the list
> comprehension. The passable solution would use a for-loop to iterate over
> each word in the paragraph, emitting it if it hadn't already been seen.
> Maintain those words in set, so your words know how not to be seen. ("Mr.
> Nesbitt, would you please stand up?")
Can we use inp_paragraph.count(iter_word) to make it simple ?
This also assumes your paragraph consists only of words and whitespace. But
> since you posted your previous homework-sounding question on stripping out
> non-word/whitespace characters, you'll want to look into using a regexp like
> "[\w\s]" to clean up the cruft in the paragraph. Neither solution above
> preserves non white-space/word characters, for which I'd recommend using a
> re.sub() with a callback. Such a callback class might look something like
> >>> class Dedupe:
> ... def __init__(self):
> ... self.s = set()
> ... def __call__(self, m):
> ... w = m.group(0)
> ... if w in self.s: return ''
> ... self.s.add(w)
> ... return w
> >>> r.sub(Dedupe(), p)
> where I leave the definition of "r" to the student. Also beware of
> case-differences for which you might have to normalize.
> You'll also want to use more descriptive variable names than my one-letter
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list