活動内容

火曜勉強会: 隔週火曜に東京駅近郊で勉強会（教科書輪読または論文紹介）
アウトプット活動: 勉強会での議論から派生したアウトプット活動（書籍・論文の出版）

1.の火曜勉強会が主な活動内容になります。

アウトプット活動

勉強会での活動・議論から派生してのアウトプットを最終目標としています。これまでに次のようなアウトプット実績があります。

速習強化学習 ―基礎理論とアルゴリズム―

輪講メンバーによる訳本、「速習強化学習 ―基礎理論とアルゴリズム―」が2017年9月21日に共立出版から刊行されました。

論文

太字が勉強会参加者です。

S. Koyamada, Y. Kikuchi, A. Kanemura, S. Maeda, and S. Ishii: “Neural sequence model training via α-divergence minimization.” LGNL, ICML Workshop, 2017.
- 第56回の論文紹介での議論から派生

教科書輪読（水曜勉強会）

“Algorithms for Reinforcement Learning” の輪講をしていました。3周して終了しました。

Szepesvári 2010 “Algorithms for Reinforcement Learning” (Morgan & Claypool)
毎回5ページ程度

1周目: 2015/11/20〜
2週目: 2016/04/06〜
3週目: 2017/03/01〜

こちらでの輪講資料をもとに速習強化学習 ―基礎理論とアルゴリズム―が刊行されました。

論文紹介（水曜勉強会）

発表担当者が自分の好みで紹介したい論文を一本紹介してもらいます。資料の作成は任意になります。資料の作成よりは論文自体の理解を優先してもらい、当日の発表はホワイトボード等を使って説明して頂く形でも問題ありません。 1〜2時間かけて一本をじっくり読みます。

活動履歴

#	日付	担当	内容	資料
1	2015/10/22	@sotetsuk / @RodeoBoy24420	キックオフ: @sotetsukから趣旨の説明と@RodeoBoy24420から強化学習概要	pdf
2	2015/11/02	@smochi	教科書: p.3-p.6
3	2015/11/12	@nnnnishi	教科書: p.7-p.12
4	2015/11/17	@RodeoBoy24420	教科書: p.12-p.17
5	2015/11/24	@takayukisekine	教科書: p.17-p.23
6	2015/12/02	@ikki407	教科書: p.23-p.29	pdf
7	2015/12/08	@sotetsuk	教科書: p.29-p.35
8	2015/12/16	@fullflu	教科書: p.35-p.40	pdf
9	2016/01/18	@smochi	教科書: p.40-p.45
10	2016/01/27	@nnnnishi	教科書: p.45-p.50
11	2016/02/08	@RodeoBoy24420	教科書: p.50-p.56
12	2016/02/22	@takayukisekine	教科書: p.56-p.62
13	2016/02/29	@sotetsuk	教科書: p.62-p.67
14	2016/03/28	@ikki407	教科書: p.68-p.74
15	2016/04/06	@sotetsuk	教科書: p.3-p.6
16	2016/04/13	@muupan	論文: Mastering the game of Go with deep neural networks and tree search	pdf
17	2016/04/20	@smochi	教科書: p.7-p.12
18	2016/04/27	@takayukisekine	論文: Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions	pdf
19	2016/05/11	@smochi	教科書: p.7-p.12（続き）
20	2016/05/18	@RodeoBoy24420	論文: Algorithms for Inverse Reinforcement Learning	pdf
21	2016/05/25	@nnnnishi	教科書: p.12-p.17
22	2016/06/01	@sotetsuk	論文: Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network	slideshare
23	2016/06/08	@YuriCat	教科書: p.17-p.23
24	2016/06/15	@ikki407	論文: A Contextual-Bandit Approach to Personalized News Article Recommendation	pdf
25	2016/06/22	@fullflu	教科書: p.23-p.29
26	2016/06/29	Ueki-san	論文: True Online TD(λ)
27	2016/07/06	@muupan	教科書: p.29-p.35
28	2016/07/20	@smochi	論文: Dynamic pricing policies for interdependent perishable products or services using reinforcement learning
29	2016/08/03	@takayukisekine	教科書: p.35-p.40
30	2016/08/10	@nnnnishi	論文: Ensemble Contextual Bandits for Personalized Recommendation	slideshare
31	2016/08/24	@RodeoBoy24420	教科書: p.40-p.45
32	2016/08/31	@sotetsuk	論文: Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation	slideshare
33	2016/09/14	@ikki407	教科書: p.45-p.50
34	2016/09/21	@YuriCat	論文: Unifying Count-Based Exploration and Intrinsic Motivation	slideshare
35	2016/09/28	@shiba24	教科書: p.50-p.56
36	2016/10/05	@takayukisekine	論文: SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient	slideshare
37	2016/10/19	@shiba24	教科書: p.50-p.56
38	2016/10/26	@shiba24	論文: Hybrid computing using a neural network with dynamic external memory	pdf
39	2016/11/02	@muupan	教科書: p.56-p.62
40	2016/11/09	@fullflu	論文: DCM Bandits: Learning to Rank with Multiple Clicks	pdf
41	2016/11/30	@fullflu	教科書: p.62-p.67
43	2016/12/07	@shiba24	教科書: p.68-p.74
44	2017/01/11	@muupan	論文: Safe and Efficient Off-PolicyReinforcement Learning	pdf
45	2017/01/18	@RodeoBoy24420	論文: Application of fuzzy Q-learning for electricity market modeling by considering renewable power penetration	pdf
46	2017/02/01	@STRatANG	論文: Sample Efficient Actor-Critic with Experience Replay	pdf
47	2017/02/08	@ororoku	論文: The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems	pdf
48	2017/02/15	@smochi	論文: Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting	pdf
49	2017/03/01	@sotetsuk	教科書: p.3-p.12
50	2017/03/08	@nnnnishi	論文: Optimal Asset Allocation using Adaptive Dynamic Programming / Enhancing Q-Learning for Optimal Asset Allocation	pdf
51	2017/03/15	@STRatANG	教科書: p.12-p.17
52	2017/03/22	@YuriCat	論文: Combining policy gradient and Q-learning	slideshare
53	2017/03/29	@pacocat	教科書: p.17-p.23
54	2017/04/05	@takayukisekine	論文: Evolution Strategies as a Scalable Alternative to Reinforcement Learning	pdf
55	2017/04/12	@kiyukuta	教科書: p.23-p.29
56	2017/04/19	@sotetsuk	論文: Reward Augmented Maximum Likelihood for Neural Structured Prediction	slideshare
57	2017/04/26	Kume-san	教科書: p.29-p.35
58	2017/05/10	Yamada-san	論文: Overcoming catastrophic forgetting in neural networks	pdf
59	2017/05/17	@ororoku	教科書: p.35-p.40
60	2017/05/24	@smochi	論文: Prioritized Experience Replay	google slide
61	2017/05/31	@eratostennis	教科書: p.40-p.45
62	2017/06/07	@sotetsuk	論文: Bridging the Gap Between Value and Policy Based Reinforcement Learning	Dropbox paper
63	2017/06/14	@STRatANG	教科書: p.45-p.50	当日メモ
64	2017/06/21	@YuriCat	The Predictron: End-To-End Learning and Planning	Dropbox paper
65	2017/06/28	@fullflu	教科書: p.50-p.56	当日メモ
66	2017/07/05	@shiba24	Trust Region Policy Optimization	google slide / 当日メモ
67	2017/07/12	Kume-san	教科書: p.56-p.62	当日メモ
68	2017/07/19	@ikki407	Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games	pdf
69	2017/07/26	@kiyukuta	教科書: p.62-p.67	google docs
70	2017/08/02	Kohno-san	FeUdal Networks for Hierarchical Reinforcement Learning	pdf
71	2017/08/09	@rkawajiri	教科書: p68-p74	google docs
72	2017/08/23	@sotetsuk	ICML2017まとめ	google slide
73	2017/08/30	@YuriCat	Learning in POMDPs with Monte Carlo Tree Search	Dropbox paper
74	2017/09/06	@maeyon	A Distributional Perspective on Reinforcement Learning	Dropbox paper
75	2017/09/20	@rkawajiri	Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution	Dropbox paper
76	2017/10/04	@takayukisekine	Deep Reinforcement Learning from Human Preference	Dropbox paper
77	2017/10/18	@fullflu	Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees	Dropbox paper
78	2017/11/01	@pacocat	Poker AIの最新動向	pdf
79	2017/11/15	@ikki407	AlphaGo Zero	TBA
80	2017/12/13	Kume-san	Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning/The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously	pdf
81	2018/01/17	@sotetsuk	NIPS2017のRL x structured (sequence) prediction系	Dropbox paper
82	2018/01/28	@muupan	Model-Based Reinforcement Learning @NIPS2017	Slideshare
83	2018/02/14	arakawa-kun	Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input	google docs
84	2018/02/28	Okumura	DQNからRainbowまで〜深層強化学習の最新動向〜	Slideshare
85	2018/03/14	Ohto-kun	Learning to Search with MCTSnets	Dropbox paper
86	2018/03/28	Takayama-kun	Q-learning with censored data	Dropbox paper
87	2018/04/11	Nishiura-kun	Active Neural Localization	google slide
88	2018/04/25	Kikuchi-san	Learning an Embedding Space for Transferable Robot Skills	Dropbox paper
89	2018/05/08	@rkawajiri	MAML	Dropbox paper
90	2018/05/22	Maeda-san	Towards Symbolic Reinforcement Learning with Common Sense	Dropbox paper
91	2018/06/05	@toslunar	Learning Deep Mean Field Games for Modeling Large Population Behavior	None
92	2018/06/19	Yamaguchi	GAIL	google docs
93	2018/07/03	Arakawa-kun	RUDDER	google slide
94	2018/07/17	Ohto-kun	TBA	TBA
95	2018/07/31	@muupan	TBA	TBA
96	2018/08/21	@jun.okumura	TBA	TBA
97	2018/09/11	@ichi	TBA	TBA
98	2018/10/09	@kawajiri	TBA	TBA
99	2018/10/23	@maso	TBA	TBA
100	2018/11/06	@kume	TBA	TBA
101	2018/11/20	@kikuchi	TBA	TBA
102	2018/12/04	@takahashi	TBA	TBA

（注）教科書のページは “Algorithms for Reinforcement Learning” のPDFのページに対応
（注）担当者欄はGitHubアカウントが判明している方については@つきでGitHubアカウントで掲載しています。

お問い合わせ

github.com/rl-tokyo/rl-tokyo.github.io/issues にて承ります。参加者は随時募集しております。ただし持ち回りでの勉強会の発表の担当をご負担頂ける方だけになります。その点だけご了承下さい。

RL-Tokyo

RL-Tokyoは東京で強化学習を学ぶエンジニア・研究者・学生のコミュニティです

活動内容

アウトプット活動

速習強化学習 ―基礎理論とアルゴリズム―

論文

教科書輪読（水曜勉強会）

論文紹介（水曜勉強会）

活動履歴

お問い合わせ

RL-Tokyo

RL-Tokyoは東京で強化学習を学ぶエンジニア・研究者・学生のコミュニティです

活動内容

アウトプット活動

速習 強化学習 ―基礎理論とアルゴリズム―

論文

教科書輪読（水曜勉強会）

論文紹介（水曜勉強会）

活動履歴

お問い合わせ

速習強化学習 ―基礎理論とアルゴリズム―