[Tutor] separately updating parameters

Zsolt Turi zsoltturi at gmail.com
Thu May 31 16:43:53 CEST 2012


Dear Pythonists,

I'm using Python 2.7. on Win 7.

Problem description:
Currently, I am working on a reinforcement learning paradigm, where I would
like to update Qa values with
alfaG [if decision_input = 1 and feedback_input = 1] or with
alfaL [ if decision_input = 1 and feedback_value = 0].

   (1) So, I have two lists for input (with two values) :

        decision_input = [1,1] - this could be 1,2,3,4,5,6
        feedback_input = [1,0] - the value is either 1 or zero

    (2) The equation is the following

        for gain: Qa = Qa+(alfaG*(feedback_input-Qa)) thus, I would like to
use alfaG only if the i-th element of feedback_input is 1
        for lose: Qa = Qa+(alfaL*(feedback_input-Qa)) thus, only if the
i-th element of feedback_input is zero

        Qa value is initialized to zero.


    (3) Incrementing alfaG and alfaL independently after updating the Qa
value

         alfaG = 0.01 - initial value
         alfaL = 0.01 - initial value

    (4) The problematic code :(

decision_input = [1,1]
feedback_input = [1,0]
a = []
alfaG = 0.01
alfaL = 0.01
value = 0.04

for i in range(len(decision_input)):
    if decision_input[i] == 1 and feedback_input[i] == 1:
        while alfaG < value:
            Qa = 0
            for feedb in feedback_input:
                Qa = Qa+(alfaG*(feedb-Qa))
                a.append(Qa)
                if decision_input[i] == 1 and feedback_input[i] == 0:
                    while alfaL < value:
                        for feedb in feedback_input:
                            Qa = Qa+(alfaL*(feedb-Qa))
                            a.append(Qa)
                        alfaL += 0.01
            alfaG += 0.01
print a

after this, I've got the following output:
[0.01, 0.099], [0.02, 0.0196], [0.03, 0.0291]


    (5) I have no idea, how to get the following output:

[0.01, 0.099],   [0.01, 0.098],   [0.01, 0.097]       -->thus: alfaG =
0.01, alfaL = 0.01, 0.02, 0.03
[0.02, 0.0198], [0.02, 0.0196], [0.02, 0.0194]     -->thus: alfaG = 0.02,
alfaL = 0.01, 0.02, 0.03
[0.03, 0.0297], [0.03, 0.0294], [0.03, 0.0291]     -->thus: alfaG = 0.03,
alfaL = 0.01, 0.02, 0.03

Since both alfaG and alfaL have 3 values, I have 3x3 lists.

Does anyone have an idea, how to modify the code?

Best regards,
Zsolt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120531/bc353b15/attachment.html>


More information about the Tutor mailing list