[Python-Dev] string find(substring) vs. substring in string
Guido van Rossum
gvanrossum at gmail.com
Wed Feb 16 22:03:10 CET 2005
> Assuming stringobject.c:string_contains is the right function, the
> code looks like this:
>
> size = PyString_GET_SIZE(el);
> rhs = PyString_AS_STRING(el);
> lhs = PyString_AS_STRING(a);
>
> /* optimize for a single character */
> if (size == 1)
> return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL;
>
> end = lhs + (PyString_GET_SIZE(a) - size);
> while (lhs <= end) {
> if (memcmp(lhs++, rhs, size) == 0)
> return 1;
> }
>
> So it's doing a zillion memcmp()s. I don't think there's a more
> efficient way to do this with ANSI C; memmem() is a GNU extension that
> searches for blocks of memory. Perhaps saving some memcmps by writing
>
> if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0)
>
> would help.
Which is exactly how s.find() wins this race. (I guess it loses when
it's found by having to do the "find" lookup.) Maybe string_contains
should just call string_find_internal()?
And then there's the question of how the re module gets to be faster
still; I suppose it doesn't bother with memcmp() at all.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list