Re: [Python-Dev] Slices and "==" optimization

Maybe I'm missing some context here: where is this fragment supposed to go to? into ceval.c:COMPARE_OP? What is the "return 0;" doing then? In any case, I think measurements should show how much speed improvement is gained by taking this short-cut. It sounds nice in theory, but ... I just added the block else if ((v == w) && (oparg == EQ) && PyString_CheckExact(v)) { x = Py_True; Py_INCREF(x); } into the code. In an application that almost exclusively does COMPARE_OPs on identical strings, I got a 30% speed-up. OTOH, this same code caused a 10% slowdown if I converted the "==" into "<>". In a real application, total speed-up will depend on two things: - how many COMPARE_OPs are done in the code? - how many of those compare identical strings for equality? Running the PyXML test suite, I counted 120000 cases where slow_compare was done, and only 700 cases identical strings were compared for equality. Regards, Martin

Martin v. Loewis writes:
It's very questionable whether we'll find an optimization that works well for any serious variety of applications. Perhaps there's some way to build up a lookup table of fast paths, and have the slow path try that first, but I'm sceptical.
I wouldn't expect XML applications to gain from such an optimization, since many compares will involve Unicode objects, either to each other or to 8-bit strings. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

"Martin v. Loewis" wrote:
It's pseudo code I made up. The real thing will look very much like it though.
You should first test for oparg == EQ, then for v == w and PyString_CheckExact().
into the code. In an application that almost exclusively does COMPARE_OPs on identical strings, I got a 30% speed-up.
Nice :-)
OTOH, this same code caused a 10% slowdown if I converted the "==" into "<>".
This sounds like an alignment problem caused by the compiler. 10% speedups/slowdowns can easily be produced by randomly moving about a few cases in the ceval loop. All depends on the platform and compiler though.
As I explained in my other reply, I have code which does: if variable == 'constant string': ... Since the compiler interns the 'constant string' and I can make sure that variable is interned too (by calling intern()), I can easily take advantage of the optimization without giving away any semantics. If variable happens to be some other constant which is used in a Python module (e.g. some kind of flag or parameter), then it will be interned per-se too. Note that the dictionary implementation relies heavily on the same optimization (interning was added to enhance string compare performance in dict lookups).
As Fred mentioned, this is probably due to the fact that Unicode objects are not interned. Neither are typical XML tag names; could make a difference... don't know. I believe that PyXML spends most of the time in Python calls and not so much in string compares (and that's what I'm trying to avoid in my XML parser approach ;-). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[MvL]
OTOH, this same code caused a 10% slowdown if I converted the "==" into "<>".
[MAL]
I expect a more obvious cause is at work: every one of Martin's "<>" compares was slowed by the new batch of failing "is it an EQ and are these strings and are they identical?" tests and branches. Fast paths lose when when they don't win -- they aren't neutral then. Indeed, we could speed your particular app by getting rid of the int-compare fast path already there (but I expect that one is a *net* win across apps, so don't want to get rid of it <wink>).
Whatever it's due to, it means the fast-path code cost extra cycles in the 119300 (of 120000) cases it didn't pay off. If it costs F extra cycles when it fails, and saves S cycles when it succeeds, the question for this test is whether 119300*F < 700*S. S has to about about 170X larger than F for a net win in this case, else there's a net loss.

Tim Peters wrote:
I'd be interested in the overall performance change in the PyXML test suite run. If you look at the code that gets executed for the checking the fast path criteria down in slow_compare, then your equation doesn't look that bad anymore. I'll run pybench on this and post the results here. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

Before we special-case strings more, we should fix what we've got: Martin (IIRC) inserted a fast path at the start of do_richcmp early in the 2.2 cycle: if (v->ob_type == w->ob_type && (f = v->ob_type->tp_compare) != NULL && !PyInstance_Check(v)) { int c; richcmpfunc f1; if ((f1 = RICHCOMPARE(v->ob_type)) != NULL) { /* If the type has richcmp, try it first. try_rich_compare would try it two-sided, which is not needed since we've a single type only. */ res = (*f1)(v, w, op); if (res != Py_NotImplemented) return res; Py_DECREF(res); } c = (*f)(v, w); if (c < 0 && PyErr_Occurred()) return NULL; return convert_3way_to_object(op, c); } Unfortunately, strings lost their tp_compare slot (due to some other "optimization"?), so despite that this is *trying* to special-case rich compares of same-type builtin objects, for strings the v->ob_type->tp_compare != NULL guard at the start fails today, so this fast path isn't taken for string compares of any kind anymore. If it were taken, string_richcompare would get out quickly (without a strcmp) for EQ compare of identical string objects. Note that saying x == x is true regardless of the type of x would prevent defining a correct IEEE-754 equality operator (IOW, if we're saying __eq__ is user-defined, we have to let users define it any way they want -- and at least one major std requires an insane <0.3 wink> definition of equality). BTW, PyInstance_Check() there looks suspect now too, since instances of new-style classes don't pass PyInstance_Check() (so *can* get into this fast path).

Martin v. Loewis writes:
It's very questionable whether we'll find an optimization that works well for any serious variety of applications. Perhaps there's some way to build up a lookup table of fast paths, and have the slow path try that first, but I'm sceptical.
I wouldn't expect XML applications to gain from such an optimization, since many compares will involve Unicode objects, either to each other or to 8-bit strings. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

"Martin v. Loewis" wrote:
It's pseudo code I made up. The real thing will look very much like it though.
You should first test for oparg == EQ, then for v == w and PyString_CheckExact().
into the code. In an application that almost exclusively does COMPARE_OPs on identical strings, I got a 30% speed-up.
Nice :-)
OTOH, this same code caused a 10% slowdown if I converted the "==" into "<>".
This sounds like an alignment problem caused by the compiler. 10% speedups/slowdowns can easily be produced by randomly moving about a few cases in the ceval loop. All depends on the platform and compiler though.
As I explained in my other reply, I have code which does: if variable == 'constant string': ... Since the compiler interns the 'constant string' and I can make sure that variable is interned too (by calling intern()), I can easily take advantage of the optimization without giving away any semantics. If variable happens to be some other constant which is used in a Python module (e.g. some kind of flag or parameter), then it will be interned per-se too. Note that the dictionary implementation relies heavily on the same optimization (interning was added to enhance string compare performance in dict lookups).
As Fred mentioned, this is probably due to the fact that Unicode objects are not interned. Neither are typical XML tag names; could make a difference... don't know. I believe that PyXML spends most of the time in Python calls and not so much in string compares (and that's what I'm trying to avoid in my XML parser approach ;-). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

[MvL]
OTOH, this same code caused a 10% slowdown if I converted the "==" into "<>".
[MAL]
I expect a more obvious cause is at work: every one of Martin's "<>" compares was slowed by the new batch of failing "is it an EQ and are these strings and are they identical?" tests and branches. Fast paths lose when when they don't win -- they aren't neutral then. Indeed, we could speed your particular app by getting rid of the int-compare fast path already there (but I expect that one is a *net* win across apps, so don't want to get rid of it <wink>).
Whatever it's due to, it means the fast-path code cost extra cycles in the 119300 (of 120000) cases it didn't pay off. If it costs F extra cycles when it fails, and saves S cycles when it succeeds, the question for this test is whether 119300*F < 700*S. S has to about about 170X larger than F for a net win in this case, else there's a net loss.

Tim Peters wrote:
I'd be interested in the overall performance change in the PyXML test suite run. If you look at the code that gets executed for the checking the fast path criteria down in slow_compare, then your equation doesn't look that bad anymore. I'll run pybench on this and post the results here. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

Before we special-case strings more, we should fix what we've got: Martin (IIRC) inserted a fast path at the start of do_richcmp early in the 2.2 cycle: if (v->ob_type == w->ob_type && (f = v->ob_type->tp_compare) != NULL && !PyInstance_Check(v)) { int c; richcmpfunc f1; if ((f1 = RICHCOMPARE(v->ob_type)) != NULL) { /* If the type has richcmp, try it first. try_rich_compare would try it two-sided, which is not needed since we've a single type only. */ res = (*f1)(v, w, op); if (res != Py_NotImplemented) return res; Py_DECREF(res); } c = (*f)(v, w); if (c < 0 && PyErr_Occurred()) return NULL; return convert_3way_to_object(op, c); } Unfortunately, strings lost their tp_compare slot (due to some other "optimization"?), so despite that this is *trying* to special-case rich compares of same-type builtin objects, for strings the v->ob_type->tp_compare != NULL guard at the start fails today, so this fast path isn't taken for string compares of any kind anymore. If it were taken, string_richcompare would get out quickly (without a strcmp) for EQ compare of identical string objects. Note that saying x == x is true regardless of the type of x would prevent defining a correct IEEE-754 equality operator (IOW, if we're saying __eq__ is user-defined, we have to let users define it any way they want -- and at least one major std requires an insane <0.3 wink> definition of equality). BTW, PyInstance_Check() there looks suspect now too, since instances of new-style classes don't pass PyInstance_Check() (so *can* get into this fast path).
participants (5)
-
Fred L. Drake, Jr.
-
M.-A. Lemburg
-
Martin v. Loewis
-
Tim Peters
-
Tim Peters