Undocumented regex behaviour in re module

Darrell Gallion darrell at dorb.com
Wed Jul 5 02:56:21 EDT 2000


I think you need to match all the attributes then split them
>>> re.findall("<\?(\w+)([^>]+?)\?>", s)
[('browse', ' start=0 num=25')]


>>> res=re.findall("<\?(\w+)([^>]+?)\?>", s)
>>> string.split(res[0][1])
['start=0', 'num=25']

Although re.split or another findall would work better than string.split to
avoid problems with white space in the tag.

--Darrell
----- Original Message -----
From: "Dave Cole" <djc at object-craft.com.au>
>
> The idea here is that I want to be able to extract the special tag
> names and their attributes.  Everything work fine, except for tag:
>
>         <?browse start=0 num=25?>
>
> The match object only saves the last attr=value matched.  The only way
> that I think of to get all of the attr=value returned is to change the
> regex to:
>
>         <\?(\w+)((?:\s+\w+=\w+)*)\?>
>
> Unfortunately, that is not as useful as I would like since it returns
> a string which needs further processing: ' start=0 num=25'
>
> Any hint / explanation at this point would be gratefully accepted.
>







More information about the Python-list mailing list