[Python-Dev] memcmp performance
Richard Saunders
richismyname at me.com
Fri Oct 21 20:23:24 CEST 2011
>>> Richard Saunders
>>> I have been doing some performance experiments with memcmp, and I was
>>> surprised that memcmp wasn't faster than it was in Python. I did a whole,
>>> long analysis and came up with some very simple results.
>>
>>Antoine Pitrou, 20.10.2011 23:08:
>> Thanks for the analysis. Non-bugfix work now happens on Python 3, where
>> the str type is Python 2's unicode type. Your recommendations would
>> have to be revisited under that light.
>
> Stefan Behnel <stefan_ml at behnel.de>
>Well, Py3 is quite a bit different now that PEP393 is in. It appears to use
>memcmp() or strcmp() a lot less than before, but I think unicode_compare()
>should actually receive an optimisation to use a fast memcmp() if both
>string kinds are equal, at least when their character unit size is less
>than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has
>such an optimisation.
I started looking at the most recent 3.x baseline: a lot of places,
the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse):
all still use memcmp in about the same way. But I agree that there are
some major differences in the unicode portion.
As long as the two strings are the same unicode "kind", you can use a
memcmp to compare. In that case, I would almost argue some memcmp
optimization is even more important: unicode strings are potentially 2
to 4 times larger, so the amount of time spent in memcmp may be more
(i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).
I went ahead a quick string_test3.py for comparing strings
(similar to what I did in Python 2.7)
# Simple python string comparison test for Python 3.3
a = []; b = []; c = []; d = []
for x in range(0,1000) :
a.append("the quick brown fox"+str(x))
b.append("the wuick brown fox"+str(x))
c.append("the quick brown fox"+str(x))
d.append("the wuick brown fox"+str(x))
count = 0
for x in range(0,200000) :
if a==c : count += 1
if a==c : count += 2
if a==d : count += 3
if b==c : count += 5
if b==d : count += 7
if c==d : count += 11
print(count)
Timings on On My FC14 machine (Intel Xeon W3520 at 2.67Ghz):
29.18 seconds: Vanilla build of Python 3.3
29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp:
No change: a little investigation shows unicode_compare is where all
the work is: Here's currently the main loop inside unicode_compare:
for (i = 0; i < len1 && i < len2; ++i) {
Py_UCS4 c1, c2;
c1 = PyUnicode_READ(kind1, data1, i);
c2 = PyUnicode_READ(kind2, data2, i);
if (c1 != c2)
return (c1 < c2) ? -1 : 1;
}
return (len1 < len2) ? -1 : (len1 != len2);
If both loops are the same unicode kind, we can add memcmp
to unicode_compare for an optimization:
Py_ssize_t len = (len1<len2) ? len1: len2;
/* use memcmp if both the same kind */
if (kind1==kind2) {
int result=memcmp(data1, data2, ((int)kind1)*len);
if (result!=0)
return result<0 ? -1 : +1;
}
Rerunning the test with this small change to unicode_compare:
17.84 seconds: -fno-builtin-memcmp
36.25 seconds: STANDARD memcmp
The standard memcmp is WORSE that the original unicode_compare
code, but if we compile using memcmp with -fno-builtin-memcmp, we get that
wonderful 2x performance increase again.
I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...
(after we put memcmp in unicode_compare)
Gooday,
Richie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111021/35f4473c/attachment-0001.html>
More information about the Python-Dev
mailing list