how to remove the same words in the paragraph
Tim Chase
python.list at tim.thechases.com
Mon Nov 9 07:13:30 EST 2009
> I think simple regex may come handy,
>
> p=re.compile(r'(.+) .*\1') #note the space
> s=p.search("python and i love python")
> s.groups()
> (' python',)
>
> But that matches for only one double word.Someone else could light up here
> to extract all the double words.Then they can be removed from the original
> paragraph.
This has multiple problems:
>>> p = re.compile(r'(.+) .*\1')
>>> s = p.search("python one two one two python")
>>> s.groups()
('python',)
>>> s = p.search("python one two one two python one")
>>> s.groups() # guess what happened to the 2nd "one"...
('python one',)
and even once you have the list of theoretical duplicates (by
changing the regexp to r'\b(\w+)\b.*?\1' perhaps), you still have
to worry about emitting the first instance but not subsequent
instances.
-tkc
More information about the Python-list
mailing list