regular expression question: last occurence
Tim Peters
tim_one at email.msn.com
Sat Jun 12 13:35:32 EDT 1999
[Gerrit Holl]
> I want to get the _last_ occurence of some string in some other
> string.
The best thing is to use string.rfind ("reverse find"):
>>> import string
>>> string.rfind("abcabcabc", "abc")
6
>>> string.rfind("abcabcabc", "xyz")
-1
>>>
Note that (like string.find, its forward-searching relative) it returns -1
if it can't find the string it's looking for.
This is faster and much more obvious than regexp tricks.
> What I want exactly, is having the current directory (for example,
> /home/gerrit/linuxgames/links/) and than put everything after the last
> "/linuxgames" in a string.
>>> have = "/home/gerrit/linuxgames/links/"
>>> want = "/linuxgames"
>>> i = string.rfind(have, want)
>>> have[i+len(want):]
'/links/'
>>>
> But when someone is in the directory
> "/home/gerrit/linuxgames/cvs/linuxgames",
> my current regexp is getting /cvs/linuxgames, and that's not what I want.
What you do want is an empty string? Continuing the above,
>>> have = "/home/gerrit/linuxgames/cvs/linuxgames"
>>> i = string.rfind(have, want)
>>> have[i+len(want):]
''
>>>
> Now, I have:
>
> currentdir = os.getcwd()
> mo = re.search('/linuxgames', currentdir)
> eind = mo.end()
> subdir = currentdir[eind:] + '/'
>
> But that doesn't solve my problem.
Right, each part of a regexp matches at the leftmost position possible:
>>> m = re.search("a", "aaaaaaaaaaaa")
>>> m.span()
(0, 1)
>>>
The way to *trick* it is to stick ".*" at the front:
>>> m = re.search(".*a", "aaaaaaaaaaaa")
>>> m.span()
(0, 12)
>>>
First the ".*" part matches at the leftmost position possible (which is the
start of the string!). Then the other obscure part of regexps kicks in:
each part of a regexp matches the *longest* string possible such that the
*rest* of the regexp is still able to match. So, above, ".*" matches the
first 11 "a"s, and the "a" in the regexp matches the last "a" in the string.
This is subtle! That's why I recommend string.rfind <wink>. Still, you can
trick regexps into working for this:
>>> pattern = re.compile(".*/linuxgames")
>>> for current in ("/home/gerrit/linuxgames/links/",
"/home/gerrit/linuxgames/cvs/linuxgames"):
mo = pattern.search(current)
eind = mo.end()
subdir = current[eind:] + '/'
print current, "->", subdir
/home/gerrit/linuxgames/links/ -> /links//
/home/gerrit/linuxgames/cvs/linuxgames -> /
>>>
regexps-the-international-corrupter-of-youth-ly y'rs - tim
More information about the Python-list
mailing list