[Tutor] regex help
Kent Johnson
kent37 at tds.net
Mon Feb 23 12:45:23 CET 2009
On Sun, Feb 22, 2009 at 10:49 PM, ish_ling <ish_ling at yahoo.com> wrote:
> I have a string:
>
> 'a b c<H d e f gH> h<H i j kH>'
>
> I would like a regex to recursively match all alpha letters that are between <H and [a-z]H>. That is, I would like the following list of matches:
>
> ['d', 'e', 'f', 'i', 'j']
>
> I do not want the 'g' or the 'k' matched.
>
> I have figured out how to do this in a multiple-step process, but I would like to do it in one step using only one regex (if possible). My multiple step process is first to use the regex
>
> '(?<=H )[a-z][^H]+(?!H)'
I would use a slightly different regex, it seems more explicit to me.
r'<H([a-z ]+?)[a-z]H>'
>
> with re.findall() in order to find two strings
>
> ['d e f ', 'i j ']
>
> I can then use another regex to extract the letters out of the strings.
str.split() will pull out the individual strings. You can still write
it as a one-liner if you want:
In [1]: import re
In [2]: s = 'a b c<H d e f gH> h<H i j kH>'
In [3]: regex = r'<H([a-z ]+?)[a-z]H>'
In [5]: [m.split() for m in re.findall(regex, s)]
Out[5]: [['d', 'e', 'f'], ['i', 'j']]
Kent
More information about the Tutor
mailing list