[Python-bugs-list] [ python-Bugs-604716 ] faster [None]n or []n

Fri, 06 Sep 2002 11:38:47 -0700

Bugs item #604716, was opened at 2002-09-04 15:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=604716&group_id=5470

Category: Python Interpreter Core
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Gregory Smith (gregsmith)
Assigned to: Nobody/Anonymous (nobody)
Summary: faster [None]*n or []*n

Initial Comment:
This is a suggestion to speed up multiplying
a list by an integer, in the case where 
the length of the list is 0 or 1. currently there
is a lot of overhead in the loop for these cases.
The current loop ( in list_repeat, in listobject.c) :

	p = np->ob_item;
	for (i = 0; i < n; i++) {
		for (j = 0; j < a->ob_size; j++) {
			*p = a->ob_item[j];
			Py_INCREF(*p);
			p++;
		}
	}
	return (PyObject *) np;

Suggested  ( where Py_INCREF_N(o,n) bumps
the refcount of  'o' by n): 

	p = np->ob_item;
        if( a->ob_size <= 1 ){
		if( a->ob_size ){  // is 1
			Pyobject *a0 = a->ob_item[0];
			for (i = 0; i < n; i++) {
				*p++ = a0;
			}
		        Py_INCREF_N(a0,n);
		}
        }else{
		for (i = 0; i < n; i++) {
			for (j = 0; j < a->ob_size; j++) {
				*p = a->ob_item[j];
				Py_INCREF(*p);
				p++;
			}
		}
        }
	return (PyObject *) np;

You could also do special cases for len=2, etc
(with *p++ = a0; *p++ =  a1; inside) but len=1
is a common case, with things like [None]*1000
used to create lists.

The best approach in general is to check
if the list being replicated is sufficiently smaller than
the replication count; and if so, use a different set
of loops nested in the other order, so that the
outer loop runs less often. 

 With the inverted loop case, you have 
'a->ob_size' calls to Py_INCREF_N instead of
'a->ob_size*n' calls to Py_INCREF. 

Exact same comments apply to tuplerepeat().

If any interest, I will code this and test it. 

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2002-09-06 13:38

Message:
Logged In: YES 
user_id=80475

There was a typo in my code fragment.  It should have 
read:  
 memcpy(p + i*n, a->ob_item, a->ob_size * sizeof
(PyObject *));

The memcpy speeds up cases where there are many 
items in the list.  It is actually slower for a list of length 
one which should still be special cased as the OP 
suggested.  

Also, his idea for adding Py_INCREF_N is great.  It can be 
used throughout python.  Since Py_INCREF has lots of 
macro magic for updating debug statistics, the OP's idea 
may have to be implemented as a function.

I think there is interest in the OP's ideas and that he 
should go-ahead and develop a tested patch along with 
timing statistics.  

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-09-06 04:53

Message:
Logged In: YES 
user_id=21627

Can you report what speed-up you get, for what example?

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-09-04 20:45

Message:
Logged In: YES 
user_id=80475

Here's  a faster approach that doesn't involve a special 
case:

	for (i = 0; i < n; i++)
		memcpy(p + i*n, p, size);
	for (j = 0; j < a->ob_size; j++){
		Py_INCREF_N(*p, n);
		p++;
	}

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=604716&group_id=5470

[Python-bugs-list] [ python-Bugs-604716 ] faster [None]*n or []*n

[Python-bugs-list] [ python-Bugs-604716 ] faster [None]n or []n