[Tutor] List Splicing

Luke Paireepinart rabidpoobear at gmail.com
Thu Jun 18 07:34:19 CEST 2009


Robert Berman wrote:
> Emille,
>
> Thank you for the example of list splicing.  Do you know if this is  
> faster than a more conventional loop statement as in my code for 
> primearray which is in my original post (reprinted here)
As has been mentioned, you will want to profile your code to know what 
is truly faster.
however, the main idea in speeding up your code is to move as much of 
the actual operations to C functions as possible.
For example,
x = []
for i in range(10000):
    x.append(0)

is going to be quite a bit slower than
x = [0] * 10000


 >>> t = Timer("""x = [0] * 1000""")
 >>> t.timeit()
6.623384202829115
 >>> t = Timer("""x = []
for i in range(1000):
    x.append(0)""")
 >>> t.timeit()
141.14222554321785

As you can see, the for loop version is about 20 times slower than the 
other version.
But they both accomplish the same thing.
The key here is that the first one, using the "[] * var" syntax, does 
not iterate over items individually.
This is a gross oversimplification, but think of the python interpreter 
like this:
you take your code to a copy shop (a la Kinkos).  however, they do not 
have any copy machines... instead they will rewrite it by hand, line by 
line.
and say they're going to charge you 20 cents per line.
Therefore you'd want to minimize the lines of code to save money, right?

However, in Python, the interpreter isn't executing just line-by-line 
(otherwise we'd have no loops)
so what it's essentially seeing  in the two examples are, first:
x = [0] * 1000

and for the second example:
x = []
x.append(0)
x.append(0)
x.append(0)
x.append(0)
x.append(0)
..... about 992 more times ...
x.append(0)
x.append(0)
x.append(0)

So you are getting charged to "copy" each of these lines.  So what 
originally looked like only 2 or 3 more lines of code is in fact many 
hundreds more instructions.
This analogy refers to the overhead of the interpreter - just to call 
functions and do other basic things, the overhead is much higher than it 
would be in a compiled language such as C.

Now you may be thinking "but behind the scenes, x = [0] * 1000 is a lot 
of instructions as well!"  and you're correct.
However, here's where the key of Python comes in: that "behind the 
scenes" stuff is written over a period of years by very experienced 
coders, and has been optimized to heck.  Also, it's typically going to 
be written in C and compiled to native code, which is *a lot* faster 
than interpreted Python is.

So basically when you're trying to optimize Python, remember that the 
cost of executing each individual line is not that high, for the most 
part (of course there are exceptions).
What you want to do is minimize the amount of lines that are interpreted 
and maximize the amount of code that is executed by the interpreter's 
libraries, which are highly efficient C code.  While in another language 
you may think "I can do this faster if I do it myself, because the 
language's libraries are probably too generic to be faster than a custom 
implementation" but if you want to write fast Python, you have to think 
"I want to try to do as little as possible most of the time, so I can 
transfer most of the computational load over to the libraries and away 
from my code."

Just some generic speculation on making stuff more efficient...



More information about the Tutor mailing list