개인적으로 정리하는 rl-roadmap

티스토리 뷰

whitebot/강화학습이야기

개인적으로 정리하는 rl-roadmap

whitebot 2021. 2. 2. 22:33

editor, Junyeob Baek
Robotics Software Engineer /RL, Motion Planning and Control, SLAM, Vision

original repo : github.com/CUN-bjy/rl-paper-review

CUN-bjy/rl-paper-review

road-map & paper review for Reinforcement Learning - CUN-bjy/rl-paper-review

github.com

RL Roadmap

GITMIND < Link!!

Policy Gradient

(1) Vanila PG(Sutton)

[Policy gradient methods for reinforcement learning with function approximation]

Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour,1994

REVIEW | PAPER

(2) DPG

[Deterministic policy gradient algorithms]

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014).

REVIEW | PAPER

(3) DDPG

[Continuous control with deep reinforcement learning]

Timothy P. Lillicrap∗ , Jonathan J. Hunt∗ , Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra (2016)

REVIEW | PAPER | CODE

(4) NPG

[A natural policy gradient]

Sham Kakade(2002)

REVIEW | PAPER

(5) TRPO

[Trust region policy optimization]

John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel (2015)

REVIEW | PAPER

(6) GAE

[High-Dimensional Continuous Control Using Generalized Advantage Estimation]

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan and Pieter Abbeel(2016)

REVIEW | PAPER

(7) PPO

[Proximal policy optimization algorithms]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov(2017)

REVIEW | PAPER

(8) TD3

[Addressing Function Approximation Error in Actor-Critic Methods]

Scott Fujimoto , Herke van Hoof , David Meger (2018)

REVIEW | PAPER | CODE

(9) SAC

[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor]

REVIEW | PAPER

Exploration

(1) PER

[Prioritized Experience Replay]

Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, Google DeepMind(2015)

REVIEW | PAPER

(2) HER

[Hindsight Experience Replay, Marcin Andrychowicz]

Marcin Andrychowicz∗ , Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel , Wojciech Zaremba ,OpenAI(2018)

REVIEW | PAPER

Reference

Key Papers in Deep RL

PG Travel Guide

utilForever/rl-paper-study

Khanrc's blog

'whitebot > 강화학습이야기' 카테고리의 다른 글

TD3 리뷰 : Addressing Function Approximation Error in Actor-Critic Methods (0)	2021.02.20
PER 리뷰 : Prioritized Experience Replay (4)	2021.02.06
PPO 리뷰 : Proximal policy optimization algorithms (9)	2021.02.06
TRPO 리뷰 : Trust region policy optimization (9)	2021.02.04
DDPG 리뷰 : Continuous control with deep reinforcement learning (7)	2021.02.04

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

로봇이 아닙니다.

티스토리 뷰