[Tutor] Avoiding repetetive pattern match in re module
Intercodes
intercodes at gmail.com
Thu Jan 5 11:41:22 CET 2006
Hello everyone,
Iam new to this mailing list as well as python(uptime-3 weeks).Today I
learnt about RE from
http://www.amk.ca/python/howto/regex/<http://www.amk.ca/python/howto/regex/%22RE%27s>.This
one was really helpful. I started working out with few examples on my own.
The first one was to collect all the HTML tags used in an HTML file. I wrote
this code.
------------------------------
import re
file1=open(raw_input("\nEnter The path of the HTML file: "),"r")
ans=""
while 1:
data=file1.readline()
if data=="":
break
ans=ans+data
ans1=re.sub(r' .*?',">",ans) # to make tags such as <link rel..> to
<link>rel
match=re.findall(r'<[^/]?[a-zA-Z]+.*?>',ans1)
print match
---------------------------------
I get the output but with tags repeated. I want to display all the tags used
in a file ,but no repetitions.Say the output to one of the HTML file I got
was : "<html><link><a><br><a><br>"
Instead of writing a new 'for,if' loop to filter the repetetive tags from
the list, is there something that I can add in the re itself to match the
pattern only once?
Thank You
--
Intercodes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20060105/40570287/attachment.htm
More information about the Tutor
mailing list