<html><body><div><div>Hey,</div><div><br></div><div>> I have been doing some performance experiments with memcmp, and I was</div><div>> surprised that memcmp wasn't faster than it was in Python. I did a whole, </div><div>> long analysis and came up with some very simple results.</div><div><br></div><div>Paul Svensson suggested I post as much as I can as text, as people would be more likely to read it.</div><div>So, here's the basic ideas:</div><div><br></div><div>(1) memcmp is surprisingly slow on some Intel gcc platforms (Linux)</div><div> On several Linux, Intel platforms, memcmp was 2-3x slower than </div><div> a simple, portable C function (with some optimizations).</div><div><br></div><div>(2) The problem: If you compile C programs with gcc with any optimization on, </div><div> it will replace all memcmp calls with an assembly language stub: rep cmpsb</div><div> instead of the memcmp call.</div><div><br></div><div>(3) rep cmpsb seems like it would be faster, but it really isn't: </div><div> this completely bypasses the memcmp.S, memcmp_sse3.S</div><div> and memcmp_sse4.S in glibc which are typically faster.</div><div><br></div><div>(4) The basic conclusion is that the Python baseline on </div><div> Intel gcc platforms should probably be compiled with -fno-builtin-memcmp</div><div> so we "avoid" gcc's memcmp optimization.</div><div><br></div><div>The numbers are all in the paper: I will endeavor to try to generate a text form</div><div>of all the tables so it's easier to read. This is much first in the Python dev</div><div>arena, so I went a little overboard with my paper below. ;)</div><div><br></div><div> Gooday,</div><div><br></div><div> Richie</div><div><br></div><div>> Before I put in a tracker bug report, I wanted to present my findings</div><div>> and make sure they were repeatable to others (isn't that the nature</div><div>> of science? ;) as well as offer discussion.</div><div>></div><div>> The analysis is a pdf and is here: </div><div>> http://www.picklingtools.com/study.pdf</div><div>> The testcases are a tarball here:</div><div>> http://www.picklingtools.com/PickTest5.tar.gz</div><div>></div><div>> I have three basic recommendations in the study: I am</div><div>> curious what other people think.</div></div><div><br></div></body></html>