# Best attack order for groups of numbers trying to destroy each other, given a victory chance for number to number attack.

skybuck2000 at hotmail.com skybuck2000 at hotmail.com
Fri Dec 9 07:26:21 EST 2016

```Hello,

(This problem is probably too computationally intensive to solve with Python, though Python + Cuda could be interesting, and also Python has some interesting expressive powers, so it could be interesting to see how Python programmers might be able to express this problem with Python code, so I am going to give this Python group a try... maybe it will surprise me ! :) At least now you have a nice computational problem for those boring rainy days (when the net is down?and the offline games bore you ;) LOL :))

There are two groups of numbers which try to destroy each other:

Group X = 1,2,3,4
Group Y = 1,2,3,4

There is a 4 by 4 victory table which describes the chance of a number
destroying another number:

Victory table =
50,  3, 80, 70
90, 60, 20, 40
30, 90, 55, 65
75, 90, 98, 60

(Currently implemented as a chance by diving it by 100 and storing as
floating point, but since these are subtracted only from 1.0 I guess they
can be stored as integers instead, even bytes)

This leads to the following computational problem as far as I am concerned:

Each number has an attack order permutation as follows (factorial 4 =
1x2x3x4 = 24)

1,2,3,4    // 1
1,2,4,3    // 2
1,3,2,4    // 3
1,3,4,2    // 4
1,4,2,3    // 5
1,4,3,2    // 6
2,1,3,4    // 7
2,1,4,3    // 8
2,3,1,4    // 9
2,3,4,1    // 10
2,4,1,3    // 11
2,4,3,1    // 12
3,1,2,4    // 13
3,1,4,2    // 14
3,2,1,4    // 15
3,2,4,1    // 16
3,4,1,2    // 17
3,4,2,1    // 18
4,1,2,3    // 19
4,1,3,2    // 20
4,2,1,3    // 21
4,2,3,1    // 22
4,3,1,2    // 23
4,3,2,1   // 24

(These attack orders can be numbered from 1 to 24 or 0 to 23 and then it's
attack order/permutation can be looked up to safe memory.)

Each number has it's own attack order and thus this leads to the following
combinational computational problem:

All combinations of permutations in which order group X can attack Group Y
and vice versa:

Group X = 24 x 24 x 24 x 24
Group Y = 24 x 24 x 24 x 24

So this is 24 possibility to the power of 8.

Final computation complexity at the very minimum is (24 to the power of 8)
multiplied by roughly 4 attacks perhaps even 5 or 6 to finally destroy a
group of numbers.

24 to the power of 8 = 110.075.314.176

I have already written a computer program which can solve this, however the
computer program estimates it takes 19 hours on a 2.0 gigahertz AMD Athlon
X2 core.

Using dual core could solve the problem over night, though I do not feel
comfortable running my PC at night unattended.

So now I have the idea to make this program run when my computer is idling
during the day, it should also be able to store it's progress so that it can
continue after it was shutdown.

(Idea for now is to make it multi threaded and assign a low thread priority
so it can run during the day when I use my computer and it's not doing much
so it can use the reserve computational horse power).
(I still have to try these "idle/reverse" ideas to see which one works best
without interrupting my web browsing or music listening too much ;))

My system has 4 GB of ram, so other ideas could be to store a data structure
partially which could keep some computations so that it doesn't have to be
done again... Though memory lookups might be a bit slow so not sure if that
makes any sense.

I might also try GPU/Cuda since there seems to be lots of loops/reoccurences
of the same computations that will happen over and over again... So maybe
cuda can detect "same branch execution" and some "computations" and might
speed it up, not sure about that.

Just the 8 index loops already cost a lot of instructions. Since there are
only 24 permutation it would be enough to store it in 5 bits. Perhaps a
rounded floating point which increases by 1/24 might be enough to trigger
the 4th bit from incrementing when it actually needs to.
2x2x2x2x2 = 32  (it should increment bit 6 when the 5 bits reach 24).
So the idea here was to create 8 indexes from just 1 index being incremented
to create the 8 combinations of indexes "instruction cheaply".
Not sure if this will work, just an idea I might try :)

Then those bits would still need to be extract and makes. So perhaps on
older systems this is not efficient.

The 8 indexes need at least 3 instructions, 1 index increment, 1
comparision, 1 jump.

The inner loop contains some while loops to increment attack index per
number.

Each number has a "alive" variable which starts at 1.0 and is decreased
everytime it's attacked.

Once a number is dead below 0.0000001 it's considered dead and can no longer
attack.

(Since victory table was described as integers above this can also be read
as: Alive starts at 100 and once it goes zero or negative it's dead).

Anyway the main reason why I wrote to this/these groups is that the numbers
themselfes are not that larger and can fit in a byte, even a few bits.

Thus they will fit into SSE registers and such and I also know SSE has some
permutations instructions.

I am not expert at SSE but I am wondering if there are perhaps some
SSE1/2/3/4/5 instructions which can help at solving/computing this problem ?

Now the question you might be wondering about is ? What am I actually trying
to compute ?

Well the final output could simply be the number of victories that each
attack order has had per number. I am particullary interested in victories
of number 4. So that would be sufficient for now.

Number 4 has 24 permutations.

So I want to know how each permutation of number 4 performs under all other
circumstances/permutations of all other numbers/combinations and so forth.

This basically requires the entire set to be computed and then record number
of victories for for example Group X number 4.

Only the results of one group needs to be outputted since the two groups are
a mirror of each other.

Also during the development of the computer program I finally had a
successfull implementation by keeping it as simple as possible and working
with direct variables, instead of much more complex arrays.

Later on I made a more code efficient version which uses arrays/lookups and
such. It was much harder to try the array approach at first since it becomes
mindly complex and can obscure "vision to a solution".

So I suggest anybody trying to implement a solver for this to keep the code
as simple as possible at first, since this is already an 8 dimensional
problem with a 9th dimension.

Also it was interesting to see the Delphi for loops not allowing array
fields as loop variables, so those loops will need to be fleshed out anyway,
or one big linear loop could be used and then 8 indexes calculated from
that.

That approach will probably be necessary for a cuda solution anyway... 32
bit might be sufficient if remainders are reincluded in the conversion from
thread/block dimensions to linear dimension back to 8 dimensional indexes.

Perhaps such computation might even be a bit quicker than the 3 instructions
per loop.

For example 8 divisions and 8 remainders will be necessary to compute 8
indexes from one big linear indexes. Since div/mod can be the same
instruction this might only require 8 instructions to compute these loop
indexes from a big linear index.
However computing the big linear index already requires between 6 or 10
instructions.

So just to compute those 100+ billion data entries requires something like
20 instructions just to be able to compute those loop indexes.

This is my finding with bigger data sets which consists out of small little
data units... The closer the number of entries/data items get to the
actually processing computing power the more time/processing/instructions
are wasted on just computing these loop indexes.

It makes sense from this perspective: A problem of 100 items only needs 100
loops. Thus lots of processing power remains for the data. But for 2 billion
data items, the number of items already matches the gigaherts of the core...
so the core is already swamped... with just loop index calculations.

I hope this reasoning/explanation convinces some CPU/GPU designers to
include more instructions to compute loop indexes in all kinds of ways that
could speed that up somewhat and safe some processing time.

Anyway... let me know if you think SSE could be of some help in solving this
permutation/combination problem ?!

Also for the python newsgroup:

Maybe python is more powerfull than I thought and it has some weird/kinda crazy/cool new permutation/combination algorithmic classes that can solve these kinds of problems faster than just the average run of the mill general code ! ;)
(Or perhaps build in solver support, that will be my next try/posting :P :))

Bye,
Skybuck.
```