티스토리 뷰
editor, Junyeob Baek
Robotics Software Engineer /RL, Motion Planning and Control, SLAM, Vision
original repo : github.com/CUN-bjy/rl-paper-review
관련 페이지:
[whitebot/강화학습이야기] - DDPG 리뷰 : Continuous control with deep reinforcement learning
[whitebot/강화학습이야기] - TRPO 리뷰 : Trust region policy optimization
[whitebot/강화학습이야기] - PPO 리뷰 : Proximal policy optimization algorithms
[whitebot/강화학습이야기] - PER 리뷰 : Prioritized Experience Replay
[whitebot/강화학습이야기] - TD3 리뷰 : Addressing Function Approximation Error in Actor-Critic Methods
RL Roadmap
GITMIND
< Link!!
Policy Gradient
(1) Vanila PG(Sutton)
[Policy gradient methods for reinforcement learning with function approximation]
Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour,1994
(2) DPG
[Deterministic policy gradient algorithms]
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014).
(3) DDPG
[Continuous control with deep reinforcement learning]
Timothy P. Lillicrap∗ , Jonathan J. Hunt∗ , Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra (2016)
(4) NPG
[A natural policy gradient]
Sham Kakade(2002)
(5) TRPO
[Trust region policy optimization]
John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel (2015)
(6) GAE
[High-Dimensional Continuous Control Using Generalized Advantage Estimation]
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel(2016)
(7) PPO
[Proximal policy optimization algorithms]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov(2017)
(8) TD3
[Addressing Function Approximation Error in Actor-Critic Methods]
Scott Fujimoto , Herke van Hoof , David Meger (2018)
(9) SAC
[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor]
REVIEW
| PAPER
Exploration
(1) PER
[Prioritized Experience Replay]
Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, Google DeepMind(2015)
(2) HER
[Hindsight Experience Replay, Marcin Andrychowicz]
Marcin Andrychowicz∗ , Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel , Wojciech Zaremba ,OpenAI(2018)
REVIEW
| PAPER
Reference
'whitebot > 강화학습이야기' 카테고리의 다른 글
TD3 리뷰 : Addressing Function Approximation Error in Actor-Critic Methods (0) | 2021.02.20 |
---|---|
PER 리뷰 : Prioritized Experience Replay (4) | 2021.02.06 |
PPO 리뷰 : Proximal policy optimization algorithms (9) | 2021.02.06 |
TRPO 리뷰 : Trust region policy optimization (9) | 2021.02.04 |
DDPG 리뷰 : Continuous control with deep reinforcement learning (7) | 2021.02.04 |