[Python-ideas] string codes & substring equality

Steven D'Aprano steve at pearwood.info
Fri Nov 29 01:12:48 CET 2013


On Thu, Nov 28, 2013 at 12:50:50PM -0800, Ethan Furman wrote:

> >startswith and endswith are not suitable for arbitrary substring
> >comparisons.
> 
> Sure they are.  Just do it right.  :)

At the point you "do it right" you've lost any advantage over slicing, 
except in the extreme case that the slice you are interested in is so 
enormous that the copying cost is extreme. And that is the point of the 
OP's post, he believes that slicing is inefficient because it creates a 
new string object. (I'm giving him the benefit of the doubt that he has 
profiled his application and that this actually is the case.)

Having said that, I suppose it's up to me to prove what I say with some 
benchmarks...


py> def match(s, substr, start=0, end=None):
...     if end is None: end = len(s)
...     if start < 0: start += len(s)
...     if end < 0: end += len(s)
...     if len(substr) != end - start: return False
...     return s.startswith(substr, start, end)
...
py> match("abcde", "bcd", 1, -1)
True
py> match("abcdefg", "bcd", 1, -1)
False


And some benchmarks:

py> from timeit import Timer
py> setup = "from __main__ import match"
py> t1 = Timer("match('abcdef', 'cde', 2, -1)", setup)
py> t2 = Timer("s[2:-1] == 'cde'", "s = 'abcdef'")
py> min(t1.repeat(repeat=5))
1.2987589836120605
py> min(t2.repeat(repeat=5))
0.25656223297119141

Slicing is about three times faster.


[...]
> Which simply shows an easy mistake to make.  The proper way, if using 
> startswith (or endswith) is to be careful of the length of both pieces.

My point exactly. startswith does not test *substring equality*, which 
is what the OP is asking for, but *prefix equality*. Only in the case 
where the length of the substring equals the length of the prefix are 
they the same thing.


-- 
Steven


More information about the Python-ideas mailing list