[Tutor] Avoiding repetetive pattern match in re module

Intercodes intercodes at gmail.com
Thu Jan 5 11:41:22 CET 2006

Hello everyone,

    Iam new to this mailing list as well as python(uptime-3 weeks).Today I
learnt about RE from
one was really helpful. I started working out with few examples on my own.
The first one was to collect all the HTML tags used in an HTML file. I wrote
this code.

import re
file1=open(raw_input("\nEnter The path of the HTML file: "),"r")
while 1:
    if data=="":

ans1=re.sub(r' .*?',">",ans)  # to make tags such as <link rel..> to
print match

I get the output but with tags repeated. I want to display all the tags used
in a file ,but no repetitions.Say the output to one of the HTML file I got
was : "<html><link><a><br><a><br>"

Instead of writing a new 'for,if' loop to filter the repetetive tags from
the list, is there something that I can add in the re itself to match the
pattern only once?

Thank You
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20060105/40570287/attachment.htm 

More information about the Tutor mailing list