[Python-Dev] other "magic strings" issues

Fri Nov 7 17:25:29 EST 2003

On Friday 07 November 2003 22:03, Fred L. Drake, Jr. wrote:
> Alex Martelli writes:
>  > Very interesting!  To me, this suggests fixing this performance bug --
>  > there is no reason that I can see why the .is* methiods should be
>  > _slower_.  Would a performance bugfix (no implementation change, just
>  > a speedup) be OK for 2.3.3, I hope?  That would motivate me to work on
>  > it soonest...
>
> People keep hinting that these methods should be faster, but I see no
> reason to think they would be.  Think about it: using the method
> requires the creation of a bound method object.  No matter how fast
> PyMalloc is, that's still a fair bit of work.

Good point!  So, a first little trick to accelerate this might be to use 
getsets (unfortunately this gives a marginally Python-level-observable
alteration for e.g. "print 'x'.isdigit.__name__", so perhaps it's only 
suitable for 2.4, not 2.3.3, alas... I dunno...).  I tried a little 
experiment adding a new test .isabit() that says if a string is entirely
made up of '0' and '1':

static PyGetSetDef string_getsets[] = {
	{"isabit", (getter)string_isabit, 0, 0},
	{0}
};
    ...
	string_getsets,				/* tp_getset */

where:

static PyObject * _return_true = 0;
static PyObject * _return_false = 0;
static PyObject * _true_returner(PyObject* ignore_self)
{
	Py_RETURN_TRUE;
}
static PyObject * _false_returner(PyObject* ignore_self)
{
	Py_RETURN_FALSE;
}
static PyMethodDef _str_bool_returners[] = {
	{"_str_return_false", (PyCFunction)_false_returner, METH_NOARGS},
	{"_str_return_true", (PyCFunction)_true_returner, METH_NOARGS},
	{0}
};

static PyObject *
string_isabit(PyStringObject *s)
{

	char* p = PyString_AS_STRING(s);
	int len = PyString_GET_SIZE(s);
	int i;

	for(i=0; i<len; ++i) {
		if(p[i]!='0' && p[i]!='1') {
			if(!_return_false) {
				_return_false = PyCFunction_New(
				    _str_bool_returners+0, 0);
			}
			Py_INCREF(_return_false);
			return _return_false;
		}
	}
	if(!_return_true) {
		_return_true = PyCFunction_New(
		    _str_bool_returners+1, 0);
	}
	Py_INCREF(_return_true);
	return _return_true;
}

i.e., exploit the peculiarity of strings' .is...() methods -- called on 
immutable objects, w/o args, so at construction time they might almost as 
well be replaced by the C-coded equivalent of "lambda: return False" or
"lambda: return True".  Of course, we'd still have to supply str.is... 
unbound methods (the tp_getset isn't looked at for class-level access,
right...?) for compatibility with idioms such as filter(str.isdigit, words).

The performance does get some increase this way, though it does not become 
quite as good as an 'in' test yet -- about, I'd say, in-between...:

[alex at lancelot src]$ ./python ~/bin/timeit.py -c '"0".isdigit()'
1000000 loops, best of 3: 0.52 usec per loop
[alex at lancelot src]$ ./python ~/bin/timeit.py -c '"0".isabit()'
1000000 loops, best of 3: 0.39 usec per loop
[alex at lancelot src]$ ./python ~/bin/timeit.py -c '"0" in "01"'
1000000 loops, best of 3: 0.25 usec per loop

and about the same for failed tests:

[alex at lancelot src]$ ./python ~/bin/timeit.py -c '"z" in "01"'
1000000 loops, best of 3: 0.25 usec per loop
[alex at lancelot src]$ ./python ~/bin/timeit.py -c '"z".isabit()'
1000000 loops, best of 3: 0.39 usec per loop
[alex at lancelot src]$ ./python ~/bin/timeit.py -c '"z".isdigit()'
1000000 loops, best of 3: 0.55 usec per loop

Even though to fix the 'x'.is....__name__ issue we'd have to
keep several PyCFunctions corresponding to _true_returner
and _false_returner w/different names and docs, maybe this
is still worth doing for 2.3.something, not just for 2.4... opinions?

Alex