| 1 |
What is Reinforcement Learning? |
|
| 2 |
Introduction |
|
| 3 |
Evaluative Feedback |
|
| 4 |
The n-Armed Bandit Problem |
|
| 5 |
The Reinforcement Learning Problem |
|
| 6 |
The reward hypothesis |
|
| 7 |
The Markov Property |
|
| 8 |
Markov decition process |
|
| 9 |
Value Functions |
|
| 10 |
Dynamic Programming |
|
| 11 |
Policy Iteration |
|
| 12 |
value Iteration |
|
| 13 |
Asynchronous DP |
|
| 14 |
Monte Carlo Methods |
|
| 15 |
Random walk problem |
|
| 16 |
On-policy and Off-policy |
|
| 17 |
Temporal Difference Learning |
|
| 18 |
TD-Learning Vs Monte Carlo Learning |
|
| 19 |
Sarsa: On-Policy TD Control |
|
| 20 |
Q-Learning: Off-Policy TD Control |
|
| 21 |
Eligibility Traces |
|
| 22 |
TD-Lambda |
|
| 23 |
Presentation |
|