Regular expression help needed

Torkil Grindstein torkil at fast.no
Tue Sep 17 06:56:52 EDT 2002


Harvey Thomas wrote:
> 1) There are no comments or processing instructions that could
> contain similar data
> 2) The attributes always come in the order specified i.e. name
> first, content second, any others following
> 3) The attributes are always double quoted and there are no
> spaces surrounding the =

Thank you very much!
I have set up a little extended version:

re.compile("""<meta\s+name\s*=\s*[\"\']{0,1}michael[\"\']{0,1}\s+content\s*=\s*[\"\']{0,1}(\w*)[\"\']{0,1}\s*\/\s*>""",
re.IGNORECASE)

Which should rule out point 3) (except that it accepts "michael'
and michael' etc). 1) and 2) I think I can live with for my application
(due to knowledge of the documents).

Btw, I have no knowledge of SAX. Perhaps I should dig into that?

Torkil.



More information about the Python-list mailing list