点击上方蓝色字体,关注:九三智能控
经典强化学习书籍打包下载,公众号回复:20180610
作者:王小惟
原文(知乎文章):强化学习从入门到放弃的资料(2018-3-15版)
文章地址:https://zhuanlan.zhihu.com/p/34918639
引言
和一些同学,还有对RL感兴趣的人聊天时,发现他们对于RL很感兴趣,却不知道怎么学习。其中有很大部分的原因是不知道学习资料在哪里寻找,我这里列举我一些我觉得比较好的学习资料与书籍,后续会一直modify学习资料的,比如将我觉得好的会议slide也加入,感兴趣的同学记得去star/watch github的仓库,知乎更新并不会太快。
github上就是单纯的收集,在知乎这儿,我会稍微对每个资料评论一两句(斗胆评论一下)。
wwxFromTju/awesome-reinforcement-learning-zhgithub.com
其实这些资料有心在网上应该都能找到,我就先列出了些我觉得好的(其实还有一些没有整理的琐碎的),毕竟现在RL还是国外是主流,国内做的老师都寥寥无几(不像cv,nlp之类的),所以也欢迎真心对RL的同学们互相交流,眼界要开阔~~~
不过是否更新就看心情了~~~毕竟开了好多坑,比如MARL的入门(multiagent reinforcement),sc2的教程(星际争霸二的reinforcement leanring)等等,挖坑要填啊~~
资料目录
书
[Reinforcement Learning: An Introduction](#Reinforcement Learning: An Introduction )
[Reinforcement Learning: An Introduction ](#Reinforcement Learning: An Introduction )
课程
基础课程
[Rich Sutton 强化学习课程(Alberta)](#Rich Sutton 强化学习课程(Alberta))
[David Silver 强化学习课程(UCL)](#David Silver 强化学习课程(UCL))
[Stanford 强化学习课程](#Stanford 强化学习课程)
深度DRL课程
[UCB 深度强化学习课程](#UCB 深度强化学习课程)
[CMU 深度强化学习课程](#CMU 深度强化学习课程)
书
Reinforcement Learning: An Introduction
Richard Sutton and Andrew Barto, Reinforcement Learning: An Introduction
link:http://incompleteideas.net/book/bookdraft2018jan1.pdf
RL领域元老级人物写的书,方便阅读理解,但是比较啰嗦(就是举例,论述多, 易懂但是拖沓)
Algorithms for Reinforcement Learning
Csaba Szepesvari, Algorithms for Reinforcement Learning
link:https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf
简单,直白,适合有一定基础,快速复习与学习
课程
基础课程
Rich Sutton 强化学习课程(Alberta)
课程主页 link:http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLAIcourse/RLAIcourse2006.html
这个比较老了,有一个比较新的在google云盘上,我找个时间整理一下。
简单地介绍一下,sutton是RL领域的大牛(上面那本书的作者),我和他发过邮件,大牛也有回我(哈哈哈哈哈哈,拜见祖师爷),感觉蛮和蔼的。但是说实话还是silver的slide更适合入门,感觉写的很好。
八卦一下,就是David Silver,还有DQN的一作,Aja Huang(就是代替alpha go下棋的)等等一大部分RL领域的中坚力量都与Alberta有千丝万缕的关系,所以他们的slide感觉蛮像的。
David Silver 强化学习课程(UCL)
简单地说,新生代大牛,alpha go的一作,在很多DRL(RL)论文中经常能见到他的名字。比如DQN,DDPG,NFSP等等,deepmind系关于RL的论文中,经常有他。
课程主页link:http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
对应slide(课件): Lecture 1: Introduction to Reinforcement Learning link
Lecture 2: Markov Decision Processes link
Lecture 3: Planning by Dynamic Programming link
Lecture 4: Model-Free Prediction link
Lecture 5: Model-Free Control link
Lecture 6: Value Function Approximation link
Lecture 7: Policy Gradient Methods link
Lecture 8: Integrating Learning and Planning link
Lecture 9: Exploration and Exploitation link
Lecture 10: Case Study: RL in Classic Games link
Stanford 强化学习课程
也适合入门吧,我对搜课的人不太了解,可能也是RL大牛吧,毕竟我的重心在DRL与MAS上。不过可以当成UCL的课程的补充来看。
课程主页 link:http://web.stanford.edu/class/cs234/schedule.html
对应slide(课件): Introduction to Reinforcement Learning link
How to act given know how the world works. Tabular setting. Markov processes. Policy search. Policy iteration. Value iteration link
Learning to evaluate a policy when don't know how the world works. link
Model-free learning to make good decisions. Q-learning. SARSA. link
Scaling up: value function approximation. Deep Q Learning. link
Deep reinforcement learning continued. link
Imitation Learning. link
Policy search. link
Policy search. link
Midterm review. link
Fast reinforcement learning (Exploration/Exploitation) Part I. link
Fast reinforcement learning (Exploration/Exploitation) Part II. link
Batch Reinforcement Learning. link
Monte Carlo Tree Search. link
Human in the loop RL with a focus on transfer learing. link
深度DRL课程
UCB 深度强化学习课程
强力推荐,大牛云集。毕竟和OpenAI和google brain联系很近,所以对于某些算法的解释比论文易懂多了,比如TRPO,PPO那一个slide,看的我神魂颠倒,厉害!(给John Schulman发过邮件,就是TRPO,PPO的一作,让我深刻理解到了random seed的重要性,,,,sad!)
课程主页 link:http://rail.eecs.berkeley.edu/deeprlcourse/
对应slide(课件): Introduction and course overviewlink
Supervised learning and imitation link
Reinforcement learning introduction link
Policy gradients introduction link
Actor-critic introduction link
Value functions introduction link
Advanced Q-learning algorithms link
Optimal control and planning link
Learning dynamical systems from data link
Learning policies by imitating optimal controllers link
Advanced model learning and images link
Connection between inference and control link
Inverse reinforcement learning link
Advanced policy gradients (natural gradient, importance sampling) link
Exploration link
Exploration (part 2) and transfer learning link
Multi-task learning and transfer link
Meta-learning and parallelism link
Advanced imitation learning and open problems link
CMU 深度强化学习课程
补充UCB的课程吧,相对更衔接UCL的课程
课程主页 link:https://katefvision.github.io/
对应slide(课件): Introduction link
Markov decision processes (MDPs), POMDPs link
Solving known MDPs: Dynamic Programming link
Monte Carlo learning: value function (VF) estimation and optimization link
Temporal difference learning: VF estimation and optimization, Q learning, SARSA link
Planning and learning: Dyna, Monte carlo tree search link
VF approximation, MC, TD with VF approximation, Control with VF approximation link
Deep Q Learning : Double Q learning, replay memory link
Policy Gradients I, Policy Gradients II link link
Continuous Actions, Variational Autoencoders, multimodal stochastic policies link
Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search link
Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial link
Imitation learning III: imitating controllers, learning local models link
Optimal control, trajectory optimization link
End-to-end policy optimization through back-propagation link
Exploration and Exploitation Russ [link](Exploration and Exploitation)
Hierarchical RL and Tranfer Learning link
Recitation: Trajectory optimization - iterative LQR link
Transfer learning(2): Simulation to Real World link
Memory Augmented RL link
Learning to learn, one shot learning link
微信群&交流合作
加入微信群:不定期分享资料,拓展行业人脉请在公众号留言:“微信号+名字+研究领域/专业/学校/公司”,我们将很快与您联系。
投稿、交流合作请留言联系。