Match First Sequence in Regular Expression?

Roger L. Cauvin roger at deadspam.com
Thu Jan 26 12:11:39 EST 2006


"Christos Georgiou" <tzot at sil-tec.gr> wrote in message 
news:t9uht1l1c37c6plgvc7senk1h6tnktg419 at 4ax.com...
> On Thu, 26 Jan 2006 16:26:57 GMT, rumours say that "Roger L. Cauvin"
> <roger at deadspam.com> might have written:
>
>>"Christos Georgiou" <tzot at sil-tec.gr> wrote in message
>>news:boqht19rs7946mtk5s64hqrieq44he5aq7 at 4ax.com...
>
>>> On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin"
>>> <roger at deadspam.com> might have written:
>
>>>>Say I have some string that begins with an arbitrary sequence of
>>>>characters
>>>>and then alternates repeating the letters 'a' and 'b' any number of 
>>>>times,
>>>>e.g.
>>>>
>>>>"xyz123aaabbaabbbbababbbbaaabb"
>>>>
>>>>I'm looking for a regular expression that matches the first, and only 
>>>>the
>>>>first, sequence of the letter 'a', and only if the length of the 
>>>>sequence
>>>>is
>>>>exactly 3.
>>>>
>>>>Does such a regular expression exist?  If so, any ideas as to what it
>>>>could
>>>>be?
>>>
>>> Is this what you mean?
>>>
>>> ^[^a]*(a{3})(?:[^a].*)?$
>>
>>Close, but the pattern should allow "arbitrary sequence of characters" 
>>that
>>precede the alternating a's and b's to contain the letter 'a'.  In other
>>words, the pattern should accept:
>>
>>"xayz123aaabbab"
>>
>>since the 'a' between the 'x' and 'y' is not directly followed by a 'b'.
>>
>>Your proposed pattern  rejects this string.
>
> 1.
>
> (a{3})(?:b[ab]*)?$
>
> This finds the first (leftmost) "aaa" either at the end of the string or
> followed by 'b' and then arbitrary sequences of 'a' and 'b'.
>
> This will also match "aaaa" (from second position on).
>
> 2.
>
> If you insist in only three 'a's and you can add the constraint that:
>
> * let s be the "arbitrary sequence of characters" at the start of your
> searched text
> * len(s) >= 1 and not s.endswith('a')
>
> then you'll have this reg.ex.
>
> (?<=[^a])(a{3})(?:b[ab]*)?$
>
> 3.
>
> If you want to allow for a possible empty "arbitrary sequence of 
> characters"
> at the start and you don't mind search speed
>
> ^(?:.?*[^a])?(a{3})(?:b[ab]*)?$
>
> This should cover you:
>
>>>> s="xayzbaaa123aaabbab"
>>>> r=re.compile(r"^(?:.*?[^a])?(a{3})(?:b[ab]*)?$")
>>>> m= r.match(s)
>>>> m.group(1)
> 'aaa'
>>>> m.start(1)
> 11
>>>> s[11:]
> 'aaabbab'

Thanks for continuing to follow up, Christos.  Please see my reply to your 
other post (in which you applied the test cases).

-- 
Roger L. Cauvin
nospam_roger at cauvin.org (omit the "nospam_" part)
Cauvin, Inc.
Product Management / Market Research
http://www.cauvin-inc.com





More information about the Python-list mailing list