split an iteration
Raymond Hettinger
vze4rx4y at verizon.net
Thu Mar 31 16:51:26 EST 2005
[Robin Becker]
> This function from texlib in oedipus.sf.net is a real cpu hog and I determined
> to see if it could be optimized.
>
> def add_active_node(self, active_nodes, node):
> """Add a node to the active node list.
> The node is added so that the list of active nodes is always
> sorted by line number, and so that the set of (position, line,
> fitness_class) tuples has no repeated values.
> """
If you can change the data structure to be an actual list of tuples, then the
bisect module can be used directly:
insert_index = bisect.bisect_left(active_nodes, node)
if active_nodes[insert_index] == node:
return # avoid creating a duplicate entry
active_nodes.insert(insert_index, node)
If the data structure cannot be changed to tuples, then try adding a custom
compare operation to the node class:
def __cmp__(self, other):
return cmp((self.line, self.position, self.fitness_class),
(other.line, other.position, other.fitness_class))
> insert_index = nan
> for index, a in enumerate(active_nodes):
> if a.line>=node_line:
> insert_index = index
> break
> index = insert_index
This loop can be squeezed a bit more using itertools.imap() and
operator.attrgetter() for the attribute lookup:
for index, aline in enumerate(imap(attrgetter('line'), active_nodes):
if aline > node_line:
. . .
> Is there a fast way to get enumerate to operate over a slice of an iterable?
enumerate(s) is the same as izip(count(), s).
So, to start from position i, write:
for index, value in izip(count(i), s[i:]):
. . .
That being said, your best bet is to eliminate the initial linear search which
is likely consuming most of the clock cycles.
Also, I noticed that the code does not reference self. Accordingly, it is a
good candidate for being a staticmethod or standalone function.
Raymond Hettinger
More information about the Python-list
mailing list