reusing parts of a string in RE matches?
John Salerno
johnjsal at NOSPAMgmail.com
Thu May 11 10:16:06 EDT 2006
Mirco Wahab wrote:
> Py:
> import re
> tx = 'a1a2a3A4a35a6b7b8c9c'
> rg = r'(\w)(?=(.\1))'
> print re.findall(rg, tx)
The only problem seems to be (and I ran into this with my original
example too) that what gets returned by this code isn't exactly what you
are looking for, i.e. the numbers '1', '2', etc. You get a list of
tuples, and the second item in this tuple contains the number, but also
the following \w character.
So there still seems to be some work that must be done when dealing with
overlapping patterns/look-ahead/behind.
Oh wait, a thought just hit me. Instead of doing it as you did:
rg = r'(\w)(?=(.\1))'
Could you do:
rg = r'(\w)(?=(.)\1)'
That would at least isolate the number, although you'd still have to get
it out of the list/tuple.
More information about the Python-list
mailing list