split an iteration

Raymond Hettinger vze4rx4y at verizon.net
Thu Mar 31 16:51:26 EST 2005


[Robin Becker]
> This function from texlib in oedipus.sf.net is a real cpu hog and I determined
> to see if it could be optimized.
>
> def add_active_node(self, active_nodes, node):
>      """Add a node to the active node list.
>      The node is added so that the list of active nodes is always
>      sorted by line number, and so that the set of (position, line,
>      fitness_class) tuples has no repeated values.
>      """

If you can change the data structure to be an actual list of tuples, then the
bisect module can be used directly:

insert_index = bisect.bisect_left(active_nodes, node)
if active_nodes[insert_index] == node:
    return        # avoid creating a duplicate entry
active_nodes.insert(insert_index, node)

If the data structure cannot be changed to tuples, then try adding a custom
compare operation to the node class:

    def __cmp__(self, other):
        return cmp((self.line, self.position, self.fitness_class),
                           (other.line, other.position, other.fitness_class))



>      insert_index = nan
>      for index, a in enumerate(active_nodes):
>          if a.line>=node_line:
>              insert_index = index
>              break
>      index = insert_index

This loop can be squeezed a bit more using itertools.imap() and
operator.attrgetter() for the attribute lookup:

    for index, aline in enumerate(imap(attrgetter('line'), active_nodes):
        if aline > node_line:
            . . .



> Is there a fast way to get enumerate to operate over a slice of an iterable?

enumerate(s) is the same as izip(count(), s).
So, to start from position i, write:

    for index, value in izip(count(i), s[i:]):
         . . .

That being said, your best bet is to eliminate the initial linear search which
is likely consuming most of the clock cycles.

Also, I noticed that the code does not reference self.  Accordingly, it is a
good candidate for being a staticmethod or standalone function.



Raymond Hettinger





More information about the Python-list mailing list