how to remove the same words in the paragraph

Tim Chase python.list at
Mon Nov 9 13:13:30 CET 2009

> I think simple regex may come handy,
>   p=re.compile(r'(.+) .*\1')    #note the space
>"python and i love python")
>   s.groups()
>   (' python',)
> But that matches for only one double word.Someone else could light up here
> to extract all the double words.Then they can be removed from the original
> paragraph.

This has multiple problems:

 >>> p = re.compile(r'(.+) .*\1')
 >>> s ="python one two one two python")
 >>> s.groups()
 >>> s ="python one two one two python one")
 >>> s.groups() # guess what happened to the 2nd "one"...
('python one',)

and even once you have the list of theoretical duplicates (by 
changing the regexp to r'\b(\w+)\b.*?\1' perhaps), you still have 
to worry about emitting the first instance but not subsequent 


More information about the Python-list mailing list