how to flatten & lump lists
Tim Peters
tim.one at home.com
Sun Dec 31 18:25:08 EST 2000
[cpsoct at my-deja.com]
/F covered the flatten part, so let's look at the other one:
> ...
> additionally i would like to do the opposite, take a list like
> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>
> and have it lump up into subgroups (randomly) and return:
> [[1],[2],[3, 4, 5], [6, 7], [8, 9, 10]]
Well, the problem seems ill-specified. Try this for a start:
import random
def clump(seq):
result = []
i, n = 0, len(seq)
# invariant: result already contains a clumping of seq[:i]
while i < n: # while seq[i:] still needs to be clumped
j = random.randint(i+1, n) # assuming empty clumps no good
result.append(seq[i:j])
i = j
return result
Then, e.g.,
for i in range(5):
print clump(range(1, 11))
may print
[[1], [2, 3, 4, 5, 6, 7, 8], [9], [10]]
[[1, 2, 3, 4, 5, 6, 7, 8, 9], [10]]
[[1, 2, 3, 4, 5, 6, 7], [8], [9, 10]]
[[1, 2, 3, 4, 5, 6, 7, 8], [9], [10]]
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
As that suggests, you're much more likely to get "large" clumps at the start
than at the end with this algorithm. That may or may not suit your needs --
you have to define "random" in a sense appropriate for your application.
Fair warning: generating random instances of large structured objects
"fairly" can be extremely subtle. In this case, if you want every possible
way of clumping to be equally likely, there's a pretty simple way:
def clump(seq):
if not seq:
return []
result = [seq[:1]]
for e in seq[1:]:
# either extend last clump or start a new one, with
# equal odds
if random.random() < 0.5:
# extend last clump
result[-1].append(e)
else:
# start a new clump
result.append([e])
return result
Proof sketch: let the total number of ways to clump a sequence of length N
be C(N). C(1) == 1 by inspection. Suppose N > 1, and consider the first
element E of the sequence. Either E is in a clump by itself, or it isn't.
If it is by itself, the rest of the clumping is a clump of a list of N-1
elements, and there are C(N-1) ways to do that. Else E is not by itself, so
there's something other than E in the first piece of the clump. Removing E
from the first piece of the clump thus again yields a clumping of a list of
N-1 elements, and again there are C(N-1) ways to do that. So E is an a
clump by itself in exactly half the cases.
BTW, this shows that C(N)==2**(N-1). A more direct way to see that is to
picture the elements in a line:
e0 e1 e2 e3
and draw vertical bars at the clump boundaries. For example,
[[e0], [e1, e2], [e3]]
corresponds to
e0|e1 e2|e3
If there are N elements, there are N-1 positions at which we *could* draw a
vertical bar. At each position, we can either "draw a bar or not", so there
are 2**(N-1) total ways of choosing which bars to draw. That's very
directly what the "if" test in the algorithm is doing: choosing to draw a
bar (start a new clump) or not (extend the last clump) with equal odds.
good-programming-reduces-to-drawing-lines<wink>-ly y'rs - tim
More information about the Python-list
mailing list