Performance of list comprehensions vs. map

Thu Sep 6 18:39:14 EDT 2001

    >> No, this is Python's principle of least astonishment at work.
    >> Python's language is dynamic all over the place.  Placing arbitrary
    >> restrictions on those semantics because they happen to occur between
    >> "[" and "]" would have been wrong.

    Chris> but it's not wrong to put in the same restrictions when map is
    Chris> called?  The list comprehension syntax is new.. it's not legal in
    Chris> older version of Python. Therefore, you could put any
    Chris> restrictions you wanted in there, and the restiction that the
    Chris> lists being comprehended don't change in the middle ofhte process
    Chris> sounds like a pretty darn good one to me.

I think you're still missing what I was saying.  map() *can't* possibly have
the same semantics as for loops precisely because it doesn't return control
to the interpreter where the name->object lookup takes place.  Could list
comprehensions behave that way?  Yes, but in my opinion (and apparently in
the opinion of the various powers that be), that would be wrong.  If you
don't like all that dynamism, maybe Python's not the language for you.  I
like it and am interested in speeding up the language as it exists, not some
other language with Python's syntax but less dynamic semantics.

    Chris> If it's OK from a design standpoint to require that the function
    Chris> in a map is bound once, it seems pretty reasonable to have the
    Chris> same requirement for a list comprehension. In fact, it seems like
    Chris> a darn good idea! The fact that the current implimentation
    Chris> doesn't enforce it is a pretty poor design justification...or am
    Chris> I missing something?

    >> map() is like any other function written in C.  It does its thing,
    >> then returns.

    Chris> The fact that it is written in C should ideally not determine
    Chris> it's behaviour. Ideally the behavious or map(), and list
    Chris> comprehensions, should be determined by some sensible design
    Chris> criteria, including performance, of course.

This is not an ideal world.  The fact that map() is a function (defined in C
or Python makes no difference) determines that.  The mapping from the name
"str" to the function object that performs stringification takes place one
time only, when map()'s input parameters are evaluated.  As far as map() is
concerned the name of the function may as well be
"the_moon_is_made_of_green_cheese".

    >> >> But there is no guarantee that the list you are operating on isn't
    >> >> modified by another thread during execution.
    >> 
    Chris> but there should be !!
    >> 
    >> Why here and not elsewhere in the language?  

    Chris> I thought you said that map() doesn't allow it?

map() gets a snapshot at the time it's called.  The binding of the name
"str" can change during map()'s execution.  It just won't notice.  For loops
and list comprehensions reevaluate the name-to-object mapping each time
around the loop.  Let me see if I can cook up a simple example...  Okay,
toss this in a .py file and run it from the command line:

    def a(i):
        global c
        c = b
        return "a"

    def b(i):
        global c
        c = a
        return "b"

    def pymap(f, l):
        newl = []
        for elt in l:
            newl.append(f(elt))
        return newl

    n = 5

    c = a
    print "maploop:", pymap(c, range(n))

    c = a
    l = []
    for i in range(n):
        l.append(c(i))
    print "forloop:", l

    c = a
    print "lcloop:", [c(i) for i in range(n)]

and execute it.  You should see

    maploop: ['a', 'a', 'a', 'a', 'a']
    forloop: ['a', 'b', 'a', 'b', 'a']
    lcloop: ['a', 'b', 'a', 'b', 'a']

Note that I don't call the real builtin map(), but a pseudo-map written in
Python.  It still doesn't notice the change in binding of the name "c",
because what c referred to at the point it was called was bound to pymap's
formal parameter "f".  The same thing happens with the real map() or with
any other function.  In the for loop and list comprehension examples, c is
not a parameter, so they look it up each time around the loop.

    >> I'm afraid you're tilting at windmills.

    Chris> Perhaps I am. It may be that a highly dynamic language isn't well
    Chris> suited to multi-threading, but this stuff sounds pretty scary to
    Chris> me... I'll have to remember to be very careful with any
    Chris> multi-threaded code I write.

Python is as well-suited to multi-threading as any other scripting language
I'm at all familiar with.  Actually, this discussion we're having doesn't
really have anything to do with threads at all.  I suggested threads as a
way that object bindings could change during the execution of a chunk of
code, but the example above demonstrates it just as well without using
threads.

    >> One approach might be to propose that the array object be "brought
    >> into the fold" as a builtin object

    Chris> This is in PEP 209 (with NumPY arrays, whcih is much better that
    Chris> the library array). I'm looking forward to it, I wish I could do
    Chris> more to help out. Note that PEP 211 and 225 are relevent too. The
    Chris> three together would do a lot to make Python more
    Chris> "array-oriented" which I think would be wonderful.

I'm not familiar with PEP 209 (one can't read everything).  In scanning the
abstract it looks like the authors propose a redesign of Numeric, not
incorporation of Numeric into the language.  I was suggesting that perhaps
the array object could be elevated to the same status as lists, tuples and
dicts, with appropriate changes made to the virtual machine to take
advantage of arrays' homegeneity.  Moving array into the language would
probably be a lot more modest bit of work than moving Numeric into the
language.  Of course, if that step is to be taken, the array object could
probably benefit from experience with Numeric.

    >> and modifications be made to the virtual machine so that homogeneity
    >> can be assumed when operating on arrays.

    Chris> That would be a great next step.... it's not in the PEP yet,
    Chris> maybe I'll send a note to authors.

    Chris> further off. Personally, I think a little static type
    Chris> declarations could help Py2C and the like a lot.
    >> 
    >> Sure, but then it wouldn't be Python. ;-)

    Chris> I'm not so sure about that... be definintion any change or
    Chris> enhancement makes something that is not python. As long as full
    Chris> dynamic typing is the default, it would still be Python.

If I wrote a module in Python then compiled it to C, I would sure want it to
behave the same.  Otherwise, the compilation process would probably
introduce subtle bugs into the code.  I'd hate to have to debug the
generated C code.

-- 
Skip Montanaro (skip at pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/