[Tutor] problem solving with lists

Thu Mar 17 12:38:04 EDT 2022

On Thu, 17 Mar 2022 15:36:57 +0100, <marcus.luetolf at bluewin.ch> declaimed
the following:

	My comments are directly in order of your presented text, so may be
superseded by later text from you...

>
>As a remark in between: 
>° all_letters = 'abcdefghi' instead of all_letters = list('abcdefghi') threw
>an error.

	What error? Don't paraphrase -- cut&paste the exact traceback along
with the relevant code.

>° The problem with 6  instead of 9 letters cannot be solved for mathematical
>reasons (below).
>

	What is the source for these "mathematical reasons"... That is, some
text (wikipedia, other source) besides your less than clear description.

>Problem description:
>Distribute  letters (or items) of a list of n letter (or items) to sublists
>containig r letters (or items)
>in a manner that pairs of letters, p.e. 'a', 'b' or 'd', 'f' can occur only
>once.

	This is still an unclear statement. Provide actual examples showing
sets of letters which violate the constraint, and those that do not.

ABC
ABD	<-	AB are adjacent and in the same column
ACD	<-	AC appears in ABC, is this in violation
BCE		<-	BC is adjacent, but in different column

>As a border or constraint  r can only be 3 or 4 and a solution requires (n -
>1)  % ( r - 1) == 0.
>
	And this is the constraint that most intrigues me? Where does it come
from, what really is its purpose? After all -- you've opened the question
with predefined values of n and r, so this constraint has no effect. It is
only of meaning if one is starting from scratch and needs to find values
for n and r in the first place.

	However, n=16, r=4 gives 15 % 3, so passes.

>Algorithm for n = 9, r = 3:
>1. Step: Create sublists of all possible combinations of letters
>[['a','a','a'], ['a', 'a', 'b'].......['i', 'i', 'i']], a total of 9^3 = 729
>sublists.

	Terminology! "combinations" has a specific mathematical meaning. As
does "permutations". And variant "combinations with replacement" (after
selecting an element from the source pool, you put it back in so it can be
selected again).

	Permutations don't require ordered selection ("ABC" and "ACB" are
different). Combinations are ordered ("ABC" and "ACB" -> "ABC" are the same
value). You only get duplicate letters if 1) the source pool of letters has
duplicates or 2) you are using "combinations with replacement"

	9^3 implies "combinations with replacement"... BUT if...

>2. Step: Remove all sublists containing the same letter more than once.

... you are then going to remove any element with duplicate letters WHY DID
YOU GENERATE IT!

>>> math.perm(9, 3)
504
>>> math.comb(9, 3)
84
>>> 

There are 504 permutations (no duplicate letters, but the letter order
differs) and only 84 combinations (order is "sorted"). That's a lot of
elements you are throwing away after generation -- going from 729 -> 84!

>3. Step: Sort all remaining sublists.

	A good combinatorial generator creates sorted items.

>4. Step: Remove all duplicates.
>
>These four steps can accomplished by  itertools funcion combinations:

	Okay -- now we get down to updated logic... <G>
>
>>from itertools import combinations
>
>>iterable = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

	You are still focused on using LISTS rather than STRINGS. A string IS
an iterable.

>>> import itertools
>>> pool = "abcdefghi"
>>> list(itertools.combinations(pool, 3))

	Granted... .combinations() has the bad habit (in my mind) of returning
the combinations as tuples rather than rebuilding strings when the pool was
a string.

[('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'b', 'e'), ('a', 'b', 'f'), ('a',
'b', 'g'), ('a', 'b', 'h'), ('a', 'b', 'i'), ('a', 'c', 'd'), ('a', 'c',
'e'), ('a', 'c', 'f'), ('a', 'c', 'g'), ('a', 'c', 'h'), ('a', 'c', 'i'),
('a', 'd', 'e'), ('a', 'd', 'f'), ('a', 'd', 'g'), ('a', 'd', 'h'), ('a',
'd', 'i'), ('a', 'e', 'f'), ('a', 'e', 'g'), ('a', 'e', 'h'), ('a', 'e',
'i'), ('a', 'f', 'g'), ('a', 'f', 'h'), ('a', 'f', 'i'), ('a', 'g', 'h'),
('a', 'g', 'i'), ('a', 'h', 'i'), ('b', 'c', 'd'), ('b', 'c', 'e'), ('b',
'c', 'f'), ('b', 'c', 'g'), ('b', 'c', 'h'), ('b', 'c', 'i'), ('b', 'd',
'e'), ('b', 'd', 'f'), ('b', 'd', 'g'), ('b', 'd', 'h'), ('b', 'd', 'i'),
('b', 'e', 'f'), ('b', 'e', 'g'), ('b', 'e', 'h'), ('b', 'e', 'i'), ('b',
'f', 'g'), ('b', 'f', 'h'), ('b', 'f', 'i'), ('b', 'g', 'h'), ('b', 'g',
'i'), ('b', 'h', 'i'), ('c', 'd', 'e'), ('c', 'd', 'f'), ('c', 'd', 'g'),
('c', 'd', 'h'), ('c', 'd', 'i'), ('c', 'e', 'f'), ('c', 'e', 'g'), ('c',
'e', 'h'), ('c', 'e', 'i'), ('c', 'f', 'g'), ('c', 'f', 'h'), ('c', 'f',
'i'), ('c', 'g', 'h'), ('c', 'g', 'i'), ('c', 'h', 'i'), ('d', 'e', 'f'),
('d', 'e', 'g'), ('d', 'e', 'h'), ('d', 'e', 'i'), ('d', 'f', 'g'), ('d',
'f', 'h'), ('d', 'f', 'i'), ('d', 'g', 'h'), ('d', 'g', 'i'), ('d', 'h',
'i'), ('e', 'f', 'g'), ('e', 'f', 'h'), ('e', 'f', 'i'), ('e', 'g', 'h'),
('e', 'g', 'i'), ('e', 'h', 'i'), ('f', 'g', 'h'), ('f', 'g', 'i'), ('f',
'h', 'i'), ('g', 'h', 'i')]
>>> 

	Doesn't this look a lot easier to read?

>>> ["".join(tpl) for tpl in itertools.combinations(pool, 3)]
['abc', 'abd', 'abe', 'abf', 'abg', 'abh', 'abi', 'acd', 'ace', 'acf',
'acg', 'ach', 'aci', 'ade', 'adf', 'adg', 'adh', 'adi', 'aef', 'aeg',
'aeh', 'aei', 'afg', 'afh', 'afi', 'agh', 'agi', 'ahi', 'bcd', 'bce',
'bcf', 'bcg', 'bch', 'bci', 'bde', 'bdf', 'bdg', 'bdh', 'bdi', 'bef',
'beg', 'beh', 'bei', 'bfg', 'bfh', 'bfi', 'bgh', 'bgi', 'bhi', 'cde',
'cdf', 'cdg', 'cdh', 'cdi', 'cef', 'ceg', 'ceh', 'cei', 'cfg', 'cfh',
'cfi', 'cgh', 'cgi', 'chi', 'def', 'deg', 'deh', 'dei', 'dfg', 'dfh',
'dfi', 'dgh', 'dgi', 'dhi', 'efg', 'efh', 'efi', 'egh', 'egi', 'ehi',
'fgh', 'fgi', 'fhi', 'ghi']
>>> 

>>r = 3
>>all_tuples = list(combinations(iterable, 3))
>
>>all_sub_lists = []
>>for i in all_tuples:
>    >all_sub_lists.append(list(i))
>
	What purpose? All you are doing is converting each tuple element into a
list element. Lots of noise signifying nothing -- you can index a tuple
just as easily as a list or string... Look above to the one-liner that
produced strings -- no intermediate list of tuples needed, the generator
returns one tuple at a time, which is converted to a string element, then
repeats until the generator is out of values.

>>> ("a", "b", "c")[1]
'b'
>>> ["a", "b", "c"][1]
'b'
>>> "abc"[1]
'b'
>>> 
>>print(all_sub_lists, len(all_sub_lists))

>
>5. and most crucial step: Remove all sublists containing duplicates of pairs
>of letters:
>P.e. if pairs of sublist[0]and  sublist[1], or sublist[0] and sublist[2] or
>sublist[1] and sublist[2] were 

	And here is one source of confusion to the reader... Your use of
"lists", "sublists", et al makes it very difficult to follow just what you
mean here.

	SHOW us the actual "sublist" in question. My initial interpretation is
that "sublist[0]" is "ABC", sublist[1] is "ABD" and sublist[2] is "ABE". If
you mean the /letters/ in the tuples (again, string, list, tuple are all
indexable, so avoiding nested lists of lists would help).

	Taking the second interpretation, where you are accessing the LETTERS
in an element, then I have to interpret it to mean that "matching pairs" DO
NOT have to be ADJACENT; that is, ACD conflicts with ABC.

	If so, the easiest way to test is to create a SET from each element,
perform set INTERSECTION, and reject an element if the intersection has 2
or more members (ie; letters in common).

	All my working code has been written on the assumption that letter
pairs must be adjacent (especially the ooREXX implementation, done as an
exercise).

>already seen, remove that sublist. There should result 12 sublists.
>

	Where did this "12" come from? Note that, for 9/3 and using ADJACENT
letters, I do get 12 passing elements. **** THIS IS ERRONEOUS, first
element was hard-coded to 4 letters!

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>letters.rex 9 3
Accepted items: 12
abcd ace ade aef
afg agh ahi bdf
beg bfh bgi cfi

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>

16/4 produces 23 passing elements

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>letters.rex 16 4
Accepted items: 23
abcd acef adeg aehi
afgh agij ahjk aikl
ajlm akmn alno amop
bdfh beil bfim bgjm
bhkn binp cfjn cgko
chlo dglp dhmp

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>

	If I revert to a version that uses set intersection, 9/3 only produces
8 passing elements, and 16/4 produces 15 passing elements.

>It's that last step I struggle with and which would allow to apply the
>algorithm for n = 16 and r = 4, 

	For ADJACENT pairs, it works a lot easier with STRINGS. You only need
to loop over the "new" element taking substrings of 2 characters, and
testing if that substring is IN any of the previously accepted elements
(especially since substrings just require [strt:end] indexing in Python)

	I'm not going to present working Python -- so the working ooREXX filter
procedure will be shown <G>. Hints: accepted. is a "stem" variable, the
stuff following the . serves as an index or key (it can be anything). There
is no way to get a length from a stem variable, so I store the number of
items as accepted.0 (REXX tends to index from 1..., not 0... as Python).
substr(item, j, 2) is equivalent to item[j:j+2], pos() returns where the
match is (0 meaning no matches), Python "in" just says "it is there/it is
not there"

filter: procedure   
    use arg     item, accepted.
    rc = 1      /*  assume item will be accepted    */

    /*  for each accepted item                      */
    do i = 1 to accepted.0
        aitem = accepted.i
        do j = 1 to length(item) - 1
            pair = substr(item, j, 2)
            if pos(pair, aitem) > 0 then
            do
                /*  reject this item, and leave do i loop!  */
                rc = 0
                leave   i
            end
        end j
    end i

    return rc

>my final problem or further. I possibly could do it by hand but not for n =
>16, r = 4 (1820 sublists !)

	And as you can see by my earlier ooREXX output, the algorithm is not
specific to n/r.

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>letters.rex 26 5
Accepted items: 36
abcd acefg adegh aehij
afhjk agikl ahkmn ailmo
ajlno aknpq alopr ampst
anqrs aoqsu aptuv aqtvw
artwx asvxy auwyz bdfim
beinr bfjmq bgjns bhlpu
bioru bjosw bkotx blqux
bmrvy cfkpv cgkqv chmsx
cipwz cjpxz dglrw dhnty

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>

Hmmm, I do seem to have a minor bug... the first entry should have been
abcde

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>letters.rex 26 5
Accepted items: 37
abcde acefg adfhi aeghj
afijk agikl ahkmn ailmo
ajlno aknpq alopr ampst
anqrs aoqsu aptuv aqtvw
artwx asvxy auwyz bdgjm
behlp bfjnr bgkor bhmqu
bimru bjosw bkpux blqvy
bmsxz cfkqw cglrv chnsy
cintx cjpvz ckrwz dhoty
diouy

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>

That's better... Had a 4 hard-coded rather than using sublen. Which also
invalidated the 9/3 case above. Corrected 9/3 is

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>letters.rex 9 3
Accepted items: 13
abc acd ade aef
afg agh ahi bdf
beg bfh bgi ceh
cfi

C:\Users\Wulfraed\Documents\_Hg-Repositories\REXX>

WHERE 13 ELEMENTS PASS, NOT YOUR DECREED 12!

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/