[Tutor] re question

tpc at csua.berkeley.edu tpc at csua.berkeley.edu
Fri Aug 8 11:11:46 EDT 2003


hello Jonathan, you should use re.findall as re.match only returns the
first instance.  By the way I would recommend the htmllib.HTMLParser
module instead of reinventing the wheel.

On Fri, 8 Aug 2003, Jonathan Hayward http://JonathansCorner.com wrote:

> I'm trying to use regexps to find the contents of all foo tags. So, if I
> gave the procedure I'm working on an HTML document and asked for
> "strong" tags, it would return a list of strings enclosed in <strong>
> </strong> in the original.
>
> I'm having trouble with the re; at the moment the re seems to return
> only the first instance. What am I doing wrong?
>
>     def get_tag_contents_internal(self, tag, file_contents):
>         result = []
>         # At present only matches first occurrence. Regexp should be
> worked on.
>         my_re = re.compile(".*?(<" + tag + ".*?>(.*?)</" + tag + \
>           ".*?>.*?)+.*?", re.IGNORECASE)
>         if my_re.match(file_contents) != None:
>             result = my_re.match(file_contents(group(2))
>         return result
>
> --
> ++ Jonathan Hayward, jonathan.hayward at pobox.com
> ** To see an award-winning website with stories, essays, artwork,
> ** games, and a four-dimensional maze, why not visit my home page?
> ** All of this is waiting for you at http://JonathansCorner.com
>
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>




More information about the Tutor mailing list