Regular expression help needed
Torkil Grindstein
torkil at fast.no
Tue Sep 17 06:56:52 EDT 2002
Harvey Thomas wrote:
> 1) There are no comments or processing instructions that could
> contain similar data
> 2) The attributes always come in the order specified i.e. name
> first, content second, any others following
> 3) The attributes are always double quoted and there are no
> spaces surrounding the =
Thank you very much!
I have set up a little extended version:
re.compile("""<meta\s+name\s*=\s*[\"\']{0,1}michael[\"\']{0,1}\s+content\s*=\s*[\"\']{0,1}(\w*)[\"\']{0,1}\s*\/\s*>""",
re.IGNORECASE)
Which should rule out point 3) (except that it accepts "michael'
and michael' etc). 1) and 2) I think I can live with for my application
(due to knowledge of the documents).
Btw, I have no knowledge of SAX. Perhaps I should dig into that?
Torkil.
More information about the Python-list
mailing list