티스토리 뷰

editor, Junyeob Baek
Robotics Software Engineer /RL, Motion Planning and Control, SLAM, Vision
Linkedin Badge Github Badge


original repo : github.com/CUN-bjy/rl-paper-review

 

CUN-bjy/rl-paper-review

road-map & paper review for Reinforcement Learning - CUN-bjy/rl-paper-review

github.com

관련 페이지:

[whitebot/강화학습이야기] - DDPG 리뷰 : Continuous control with deep reinforcement learning

[whitebot/강화학습이야기] - TRPO 리뷰 : Trust region policy optimization

[whitebot/강화학습이야기] - PPO 리뷰 : Proximal policy optimization algorithms

[whitebot/강화학습이야기] - PER 리뷰 : Prioritized Experience Replay

[whitebot/강화학습이야기] - TD3 리뷰 : Addressing Function Approximation Error in Actor-Critic Methods


RL Roadmap

GITMIND < Link!!


Policy Gradient

(1) Vanila PG(Sutton)

[Policy gradient methods for reinforcement learning with function approximation]

Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour,1994

REVIEW | PAPER

(2) DPG

[Deterministic policy gradient algorithms]

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014).

REVIEW | PAPER

(3) DDPG

[Continuous control with deep reinforcement learning]

Timothy P. Lillicrap∗ , Jonathan J. Hunt∗ , Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra (2016)

REVIEW | PAPER | CODE

(4) NPG

[A natural policy gradient]

Sham Kakade(2002)

REVIEW | PAPER

(5) TRPO

[Trust region policy optimization]

John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel (2015)

REVIEW | PAPER

(6) GAE

[High-Dimensional Continuous Control Using Generalized Advantage Estimation]

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel(2016)

REVIEW | PAPER

(7) PPO

[Proximal policy optimization algorithms]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov(2017)

REVIEW | PAPER

(8) TD3

[Addressing Function Approximation Error in Actor-Critic Methods]

Scott Fujimoto , Herke van Hoof , David Meger (2018)

REVIEW | PAPER | CODE

(9) SAC

[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor]

REVIEW | PAPER


Exploration

(1) PER

[Prioritized Experience Replay]

Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, Google DeepMind(2015)

REVIEW | PAPER

(2) HER

[Hindsight Experience Replay, Marcin Andrychowicz]

Marcin Andrychowicz∗ , Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel , Wojciech Zaremba ,OpenAI(2018)

REVIEW | PAPER


Reference

Key Papers in Deep RL

PG Travel Guide

utilForever/rl-paper-study

Khanrc's blog

댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/02   »
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28