1 |
What is Reinforcement Learning? |
|
2 |
Introduction |
|
3 |
Evaluative Feedback |
|
4 |
The n-Armed Bandit Problem |
|
5 |
The Reinforcement Learning Problem |
|
6 |
The reward hypothesis |
|
7 |
The Markov Property |
|
8 |
Markov decition process |
|
9 |
Value Functions |
|
10 |
Dynamic Programming |
|
11 |
Policy Iteration |
|
12 |
value Iteration |
|
13 |
Asynchronous DP |
|
14 |
Monte Carlo Methods |
|
15 |
Random walk problem |
|
16 |
On-policy and Off-policy |
|
17 |
Temporal Difference Learning |
|
18 |
TD-Learning Vs Monte Carlo Learning |
|
19 |
Sarsa: On-Policy TD Control |
|
20 |
Q-Learning: Off-Policy TD Control |
|
21 |
Eligibility Traces |
|
22 |
TD-Lambda |
|
23 |
Presentation |
|