Sequence splitting

Fri Jul 3 13:03:32 EDT 2009

On Jul 3, 12:57 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> I've never needed such a split function, and I don't like the name, and
> the functionality isn't general enough. I'd prefer something which splits
> the input sequence into as many sublists as necessary, according to the
> output of the key function.

That's not a bad idea, I'll have to experiment with the alternatives.
My thought process for this, however, was that filter itself already
splits the sequence and it would have been more useful had it not
thrown away "half" of what it discovers. It could have been written to
returned two sequences with very litter perf hit for all but very
large input sequences, and been useful in more situations. What I
*really* wanted was a way to make filter itself more useful, since it
seems a bit silly to have two very similar functions.

Maybe this would be difficult to get into the core, but how about this
idea: Rename the current filter function to something like "split" or
"partition" (which I agree may be a better name) and modify it to
return the desired true and false sequences. Then recreate the
existing "filter" function with a wrapper that throws away the false
sequence.

Here are two simplified re-creations of situations where I could have
used partition (aka split):

words = ['this', 'is', 'a', 'bunch', 'of', 'words']
short, long = partition(words, lambda w: len(w) < 3)

d = {1 : 'w', 2 : 'x' ,3 : 'y' ,4 : 'z'}
keys = [1, 3, 4, 9]
found, missing = partition(keys, d.has_key)

There are probably a dozen other approaches, but the existing "filter"
is fast, clear, and *almost* good enough. So when is this useful in
general: Whenever filter itself is useful, but you want to use both
sides of the partitioning work it already does.

-Brad