[Tutor] Avoiding repetetive pattern match in re module

Thu Jan 5 11:41:22 CET 2006

Hello everyone,

    Iam new to this mailing list as well as python(uptime-3 weeks).Today I
learnt about RE from
http://www.amk.ca/python/howto/regex/<http://www.amk.ca/python/howto/regex/%22RE%27s>.This
one was really helpful. I started working out with few examples on my own.
The first one was to collect all the HTML tags used in an HTML file. I wrote
this code.

------------------------------
import re
file1=open(raw_input("\nEnter The path of the HTML file: "),"r")
ans=""
while 1:
    data=file1.readline()
    if data=="":
        break
    ans=ans+data

ans1=re.sub(r' .*?',">",ans)  # to make tags such as <link rel..> to
<link>rel
match=re.findall(r'<[^/]?[a-zA-Z]+.*?>',ans1)
print match
---------------------------------

I get the output but with tags repeated. I want to display all the tags used
in a file ,but no repetitions.Say the output to one of the HTML file I got
was : "<html><link><a><br><a><br>"

Instead of writing a new 'for,if' loop to filter the repetetive tags from
the list, is there something that I can add in the re itself to match the
pattern only once?

Thank You
--
Intercodes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20060105/40570287/attachment.htm