[Python-Dev] memcmp performance

Fri Oct 21 20:23:24 CEST 2011

>>> Richard Saunders
>>> I have been doing some performance experiments with memcmp, and I was
>>> surprised that memcmp wasn't faster than it was in Python. I did a whole,
>>> long analysis and came up with some very simple results.
>>
>>Antoine Pitrou, 20.10.2011 23:08:
>> Thanks for the analysis. Non-bugfix work now happens on Python 3, where
>> the str type is Python 2's unicode type. Your recommendations would
>> have to be revisited under that light.
>
> Stefan Behnel <stefan_ml at behnel.de>
>Well, Py3 is quite a bit different now that PEP393 is in. It appears to use 
>memcmp() or strcmp() a lot less than before, but I think unicode_compare() 
>should actually receive an optimisation to use a fast memcmp() if both 
>string kinds are equal, at least when their character unit size is less 
>than 4 (i.e. especially for ASCII strings). Funny enough, tailmatch() has 
>such an optimisation.

I started looking at the most recent 3.x baseline: a lot of places, 
the memcmp analysis appears relevant (zlib, arraymodule, datetime, xmlparse):
all still use memcmp in about the same way.  But I agree that there are 
some major differences in the unicode portion.

As long as the two strings are the same unicode "kind", you can use a 
memcmp to compare.  In that case, I would almost argue some memcmp
optimization is even more important: unicode strings are potentially 2
to 4 times larger, so the amount of time spent in memcmp may be more
(i.e., I am still rooting for -fno-builtin-memcmp on the compile lines).

I went ahead a quick string_test3.py for comparing strings
(similar to what I did in Python 2.7)

# Simple python string comparison test for Python 3.3
a = []; b = []; c = []; d = []
for x in range(0,1000) :
    a.append("the quick brown fox"+str(x))
    b.append("the wuick brown fox"+str(x))
    c.append("the quick brown fox"+str(x))
    d.append("the wuick brown fox"+str(x))
count = 0
for x in range(0,200000) :
    if a==c : count += 1
    if a==c : count += 2
    if a==d : count += 3
    if b==c : count += 5
    if b==d : count += 7
    if c==d : count += 11
print(count)

Timings on On My FC14 machine (Intel Xeon W3520 at 2.67Ghz):

29.18 seconds:  Vanilla build of Python 3.3 
29.17 seconds: Python 3.3 compiled with -fno-builtin-memcmp: 

No change: a little investigation shows unicode_compare is where all
the work is: Here's currently the main loop inside unicode_compare:

    for (i = 0; i < len1 && i < len2; ++i) {
        Py_UCS4 c1, c2;
        c1 = PyUnicode_READ(kind1, data1, i);
        c2 = PyUnicode_READ(kind2, data2, i);

        if (c1 != c2)
            return (c1 < c2) ? -1 : 1;
    }

    return (len1 < len2) ? -1 : (len1 != len2);

If both loops are the same unicode kind, we can add memcmp
to unicode_compare for an optimization:

    Py_ssize_t len = (len1<len2) ? len1: len2;

    /* use memcmp if both the same kind */
    if (kind1==kind2) {
      int result=memcmp(data1, data2, ((int)kind1)*len);
      if (result!=0) 
	return result<0 ? -1 : +1; 
    }

Rerunning the test with this small change to unicode_compare:

17.84 seconds:  -fno-builtin-memcmp 
36.25 seconds:  STANDARD memcmp

The standard memcmp is WORSE that the original unicode_compare
code, but if we compile using memcmp with -fno-builtin-memcmp, we get that
wonderful 2x performance increase again.

I am still rooting for -fno-builtin-memcmp in both Python 2.7 and 3.3 ...
(after we put memcmp in unicode_compare)

  Gooday,

  Richie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20111021/35f4473c/attachment-0001.html>