find nth occurance in a string?

Tue Jan 15 16:59:04 EST 2002

I often want to find the nth occurance of a pattern in a string. I don't
know of a built-in way to do that with string methods. It probably is easy
with regular expressions, but that has its own problems. A common task to to
want to find the offset of say line 3 in a string which I could do with
splitlines but that could be a memory hog with a large string. So, my first
hack looks like this:

    def findNth(self, pattern, n, txt):
        """find the nth occurance of pattern in txt and return the offset"""
        #print "findNth", pattern, n
        start = 0
        offset = -1
        i = 1
        while i <= n:
            offset = txt.find(pattern, start)
            #print 'offset', offset
            # if the pattern isn't found before the end of the text is
reached
            # return -1
            if offset == -1:
                break
            start = offset + 1
            i = i + 1
        return offset

This probably doesn't deal with all the boundary conditions; also n is not 0
based. It does seem to work for the minimal tests I've done.

I'm seriously considering a PEP to have a method like this added to string
so a faster c version could be put in the library, but I want to make sure
I'm not just being dumb. Also, a general version should probably support
[start], [end] like find.

Mark Pilgrim pointed me towards a thread from April 2001 (sorry about the
long URL):

http://groups.google.com/groups?hl=en&threadm=mailman.987695340.21312.python
-list%40python.org&rnum=1&prev=/groups%3Fq%3Dnth%2Boccurrence%26hl%3Den%26bt
nG%3DGoogle%2BSearch%26meta%3Dgroup%253Dcomp.lang.python.*

before he found the URL above Mark provided an alternative solution:
"This is the best I can do.  You can use an optional second parameter in
split that only splits a certain number of times (thus reducing some
overhead).  It will still suck for large strings, though, since you're
creating a list whose total element size is as big as the original
string.

>>> def nthOccur(n, searchString, theString):
...   "finds nth occurence of searchString in theString, or
len(theString) if < n occurrences"
...   return len(searchString.join(theString.split(searchString,
n)[:n]))
...
>>> s = 'abc def abc ghi abc jkl abc'
>>> nthOccur(3, 'abc', s)
16
>>> nthOccur(2, 'def', s)
27

Don't use regular expressions if you can help it.  They're evil, and
they'll probably take longer and take more memory too."

Could some language experts please explain the relative merits of the
various approaches? Would this make a good addition to the string methods?

ka