[Tutor] separately updating parameters
Zsolt Turi
zsoltturi at gmail.com
Thu May 31 16:43:53 CEST 2012
Dear Pythonists,
I'm using Python 2.7. on Win 7.
Problem description:
Currently, I am working on a reinforcement learning paradigm, where I would
like to update Qa values with
alfaG [if decision_input = 1 and feedback_input = 1] or with
alfaL [ if decision_input = 1 and feedback_value = 0].
(1) So, I have two lists for input (with two values) :
decision_input = [1,1] - this could be 1,2,3,4,5,6
feedback_input = [1,0] - the value is either 1 or zero
(2) The equation is the following
for gain: Qa = Qa+(alfaG*(feedback_input-Qa)) thus, I would like to
use alfaG only if the i-th element of feedback_input is 1
for lose: Qa = Qa+(alfaL*(feedback_input-Qa)) thus, only if the
i-th element of feedback_input is zero
Qa value is initialized to zero.
(3) Incrementing alfaG and alfaL independently after updating the Qa
value
alfaG = 0.01 - initial value
alfaL = 0.01 - initial value
(4) The problematic code :(
decision_input = [1,1]
feedback_input = [1,0]
a = []
alfaG = 0.01
alfaL = 0.01
value = 0.04
for i in range(len(decision_input)):
if decision_input[i] == 1 and feedback_input[i] == 1:
while alfaG < value:
Qa = 0
for feedb in feedback_input:
Qa = Qa+(alfaG*(feedb-Qa))
a.append(Qa)
if decision_input[i] == 1 and feedback_input[i] == 0:
while alfaL < value:
for feedb in feedback_input:
Qa = Qa+(alfaL*(feedb-Qa))
a.append(Qa)
alfaL += 0.01
alfaG += 0.01
print a
after this, I've got the following output:
[0.01, 0.099], [0.02, 0.0196], [0.03, 0.0291]
(5) I have no idea, how to get the following output:
[0.01, 0.099], [0.01, 0.098], [0.01, 0.097] -->thus: alfaG =
0.01, alfaL = 0.01, 0.02, 0.03
[0.02, 0.0198], [0.02, 0.0196], [0.02, 0.0194] -->thus: alfaG = 0.02,
alfaL = 0.01, 0.02, 0.03
[0.03, 0.0297], [0.03, 0.0294], [0.03, 0.0291] -->thus: alfaG = 0.03,
alfaL = 0.01, 0.02, 0.03
Since both alfaG and alfaL have 3 values, I have 3x3 lists.
Does anyone have an idea, how to modify the code?
Best regards,
Zsolt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120531/bc353b15/attachment.html>
More information about the Tutor
mailing list