how to write this re?

Bengt Richter bokr at oz.net
Thu Nov 7 01:51:42 EST 2002


On Wed, 6 Nov 2002 21:14:10 -0800 (PST), Mindy <csshi99 at yahoo.com> wrote:

>Hey, I want to find strings which includes "string1"
>but not includes "string2" in a bunch of strings
>called "lines", how can I write this regular
>expression in Python?
>For "axystring1bdestring2lkj",
>"aaastring1bbbccc","aaabbbstring2poi","aaabbbcccddd" I
>only want to get the string "aaastring1bbbccc" but my
>program got two strings: "axystring1bdestring2lkj" and
>"aaastring1bbbccc". So in my regular expression,
>(?!string2) didn't work at all. I know I shouldn't
>write this,but I really don't know how to express "not
>including string2".
>
>My regular expression is like:
>r=re.search(r'(.*)string1(.*)(?!string2)(.*)',lines)
>if r:
>	string1 = r.string
>
 >>> sl= ["axystring1bdestring2lkj","aaastring1bbbccc","aaabbbstring2poi","aaabbbcccddd"]
 >>> import re
 >>> rx = re.compile(r'^.*string2.*$|(^.*string1.*$)')
 >>> for s in sl:
 ...     m = rx.search(s)
 ...     if m: m = m.group(1)
 ...     if m is not None: print m
 ...
 aaastring1bbbccc

I.e., this specifies a greedy match to what you want to exclude first and outside the group parens,
and puts in the group parens what you want to find and keep. Anything else matches neither.

Or if your strings were in a file, so you could have done

    file_text = file('the_file').read()

which we'll simulate using the above data by

 >>> file_text ='\n'.join(sl+[''])
 >>> file_text
 'axystring1bdestring2lkj\naaastring1bbbccc\naaabbbstring2poi\naaabbbcccddd\n'

then
 >>> rx = re.compile(r'^.*string2.*$|(^.*string1.*$)', re.MULTILINE)
 >>> filter(None, rx.findall(file_text))
 ['aaastring1bbbccc']

should give you the list of lines, unless I forgot something ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list