How naive is Python?

Sun Jan 14 23:08:08 EST 2007

    John> How naive (in the sense that compiler people use the term) is the
    John> current Python system?  For example:

    John>       def foo() :
    John>           s = "This is a test"
    John>           return(s)

    John>          s2 = foo()

    John> How many times does the string get copied?

Never.  s and s2 just refer to the same object (strings are immutable).
Executing this:

    def foo() :
        print id("This is a test")
        s = "This is a test"
        print id(s)
        return(s)

    s2 = foo()
    print id(s2)

prints the same value three times.

    John> Or, for example:

    John>       s1 = "Test1"
    John>       s2 = "Test2"
    John>       s3 = "Test3"
    John>       s = s1 + s2 + s3

    John> Any redundant copies performed, or is that case optimized?

Not optimized.  You can see that using the dis module:

  4           0 LOAD_CONST               1 ('Test1')
              3 STORE_FAST               0 (s1)

  5           6 LOAD_CONST               2 ('Test2')
              9 STORE_FAST               1 (s2)

  6          12 LOAD_CONST               3 ('Test3')
             15 STORE_FAST               2 (s3)

  7          18 LOAD_FAST                0 (s1)
             21 LOAD_FAST                1 (s2)
             24 BINARY_ADD          
             25 LOAD_FAST                2 (s3)
             28 BINARY_ADD          
             29 STORE_FAST               3 (s)
             32 LOAD_CONST               0 (None)
             35 RETURN_VALUE        

The BINARY_ADD opcode creates a new string.

    John> How about this?

    John>       kcount = 1000
    John>       s = ''
    John>       for i in range(kcount) :
    John>           s += str(i) + ' '

    John> Is this O(N) or O(N^2) because of recopying of "s"?

O(N).  Here's a demonstration of that:

    #!/usr/bin/env python

    from __future__ import division

    def foo(kcount):
        s = ''
        for i in xrange(kcount) :
            s += str(i) + ' '

    import time

    for i in xrange(5,25):
        t = time.time()
        foo(2**i)
        t = time.time() - t
        print 2**i, t, t/2**i

On my laptop t roughly doubles for each iteration and prints around 5e-06
for t/2**i in all cases.

Skip