[Python-Dev] string find(substring) vs. substring in string

A.M. Kuchling amk at amk.ca
Wed Feb 16 21:54:31 CET 2005


On Wed, Feb 16, 2005 at 01:34:16PM -0700, Mike Brown wrote:
> time. But I would also hope that it would be smart enough to know that it 
> doesn't need to look past the 2nd character in 'not the xyz' when it is 
> searching for 'not there' (due to the lengths of the sequences).

Assuming stringobject.c:string_contains is the right function, the
code looks like this:

	size = PyString_GET_SIZE(el);
	rhs = PyString_AS_STRING(el);
	lhs = PyString_AS_STRING(a);

	/* optimize for a single character */
	if (size == 1)
		return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;

	end = lhs + (PyString_GET_SIZE(a) - size);
	while (lhs <= end) {
		if (memcmp(lhs++, rhs, size) == 0)
			return 1;
	}

So it's doing a zillion memcmp()s.  I don't think there's a more
efficient way to do this with ANSI C; memmem() is a GNU extension that
searches for blocks of memory.  Perhaps saving some memcmps by writing

	 if ((*lhs  == *rhs) && memcmp(lhs++, rhs, size) == 0)

would help.

--amk



More information about the Python-Dev mailing list