Match First Sequence in Regular Expression?

Christos Georgiou tzot at sil-tec.gr
Thu Jan 26 11:47:09 EST 2006


On Thu, 26 Jan 2006 16:26:57 GMT, rumours say that "Roger L. Cauvin"
<roger at deadspam.com> might have written:

>"Christos Georgiou" <tzot at sil-tec.gr> wrote in message 
>news:boqht19rs7946mtk5s64hqrieq44he5aq7 at 4ax.com...

>> On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin"
>> <roger at deadspam.com> might have written:

>>>Say I have some string that begins with an arbitrary sequence of 
>>>characters
>>>and then alternates repeating the letters 'a' and 'b' any number of times,
>>>e.g.
>>>
>>>"xyz123aaabbaabbbbababbbbaaabb"
>>>
>>>I'm looking for a regular expression that matches the first, and only the
>>>first, sequence of the letter 'a', and only if the length of the sequence 
>>>is
>>>exactly 3.
>>>
>>>Does such a regular expression exist?  If so, any ideas as to what it 
>>>could
>>>be?
>>
>> Is this what you mean?
>>
>> ^[^a]*(a{3})(?:[^a].*)?$
>
>Close, but the pattern should allow "arbitrary sequence of characters" that 
>precede the alternating a's and b's to contain the letter 'a'.  In other 
>words, the pattern should accept:
>
>"xayz123aaabbab"
>
>since the 'a' between the 'x' and 'y' is not directly followed by a 'b'.
>
>Your proposed pattern  rejects this string.

1.

(a{3})(?:b[ab]*)?$

This finds the first (leftmost) "aaa" either at the end of the string or
followed by 'b' and then arbitrary sequences of 'a' and 'b'.

This will also match "aaaa" (from second position on).

2.

If you insist in only three 'a's and you can add the constraint that:

* let s be the "arbitrary sequence of characters" at the start of your
searched text
* len(s) >= 1 and not s.endswith('a')

then you'll have this reg.ex.

(?<=[^a])(a{3})(?:b[ab]*)?$

3.

If you want to allow for a possible empty "arbitrary sequence of characters"
at the start and you don't mind search speed

^(?:.?*[^a])?(a{3})(?:b[ab]*)?$

This should cover you:

>>> s="xayzbaaa123aaabbab"
>>> r=re.compile(r"^(?:.*?[^a])?(a{3})(?:b[ab]*)?$")
>>> m= r.match(s)
>>> m.group(1)
'aaa'
>>> m.start(1)
11
>>> s[11:]
'aaabbab'
-- 
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians



More information about the Python-list mailing list