regular expression question: last occurence

Tim Peters tim_one at email.msn.com
Sat Jun 12 19:35:32 CEST 1999


[Gerrit Holl]
> I want to get the _last_ occurence of some string in some other
> string.

The best thing is to use string.rfind ("reverse find"):

>>> import string
>>> string.rfind("abcabcabc", "abc")
6
>>> string.rfind("abcabcabc", "xyz")
-1
>>>

Note that (like string.find, its forward-searching relative) it returns -1
if it can't find the string it's looking for.

This is faster and much more obvious than regexp tricks.

> What I want exactly, is having the current directory (for example,
> /home/gerrit/linuxgames/links/) and than put everything after the last
> "/linuxgames" in a string.

>>> have = "/home/gerrit/linuxgames/links/"
>>> want = "/linuxgames"
>>> i = string.rfind(have, want)
>>> have[i+len(want):]
'/links/'
>>>

> But when someone is in the directory
> "/home/gerrit/linuxgames/cvs/linuxgames",
> my current regexp is getting /cvs/linuxgames, and that's not what I want.

What you do want is an empty string?  Continuing the above,

>>> have = "/home/gerrit/linuxgames/cvs/linuxgames"
>>> i = string.rfind(have, want)
>>> have[i+len(want):]
''
>>>


> Now, I have:
>
> currentdir = os.getcwd()
> mo = re.search('/linuxgames', currentdir)
> eind = mo.end()
> subdir = currentdir[eind:] + '/'
>
> But that doesn't solve my problem.

Right, each part of a regexp matches at the leftmost position possible:

>>> m = re.search("a", "aaaaaaaaaaaa")
>>> m.span()
(0, 1)
>>>

The way to *trick* it is to stick ".*" at the front:

>>> m = re.search(".*a", "aaaaaaaaaaaa")
>>> m.span()
(0, 12)
>>>

First the ".*" part matches at the leftmost position possible (which is the
start of the string!).  Then the other obscure part of regexps kicks in:
each part of a regexp matches the *longest* string possible such that the
*rest* of the regexp is still able to match.  So, above, ".*" matches the
first 11 "a"s, and the "a" in the regexp matches the last "a" in the string.

This is subtle!  That's why I recommend string.rfind <wink>.  Still, you can
trick regexps into working for this:

>>> pattern = re.compile(".*/linuxgames")
>>> for current in ("/home/gerrit/linuxgames/links/",
                    "/home/gerrit/linuxgames/cvs/linuxgames"):
        mo = pattern.search(current)
        eind = mo.end()
        subdir = current[eind:] + '/'
        print current, "->", subdir

/home/gerrit/linuxgames/links/ -> /links//
/home/gerrit/linuxgames/cvs/linuxgames -> /
>>>

regexps-the-international-corrupter-of-youth-ly y'rs  - tim






More information about the Python-list mailing list