work-storage-presentations-sejnClass-sejnClassTalk foragingAndBasalGanglia

\documentclass{seminar}

\newpagestyle{mypagestyle}{}{\hbox{Bayle Shanks}} \pagestyle{mypagestyle}

\begin{document}

Preliminaries


Reinforcement Learning

"The basic paradigm of reinforcement learning is as follows: The learning agent observes an input state or input pattern, it produces an output signal (most commonly thought of as an "action" or "control signal"), and then it receives a scalar "reward" or "reinforcement" feedback signal from the environment indicating how good or bad its output was.

The goal of learning is to generate the optimal actions leading to maximal reward.

In many cases the reward is also delayed (i.e., is given at the end of a long sequence of inputs and outputs). In this case the learner has to solve what is known as the "temporal credit assignment" problem (i.e., it must figure out how to apportion credit and blame to each of the various inputs and outputs leading to the ultimate final reward signal)." \begin{scriptsize}(from http://www.research.ibm.com/massive/tdl.html)\end{scriptsize}


Element of Reinforcement Learning


Goals of Reinforcement Learning


Estimated reward is -V'

% \includegraphics{2127.eps}{.1}


Equation defining V(t)

V(t) = E[r(t) + \gamma V(t + 1)]

Rearrange to get equation for \delta(t):

0 = r(t) + \gamma \hat{V}(t + 1) - \hat{V}(t) \delta(t) = r(t) + \gamma \hat{V}(t + 1) - \hat{V}(t)


Sequences


Open questions

\end{document}