|
Sections: |
|
|
|
Links: |
|
|
|
|
|
|
|
|
|
Correspondence: |
|
|
|
Mail: |
|
Weiwei Wang
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210093, China |
|
Laboratory: |
|
403, Meng Minwei Building, Gulou Campus of
Nanjing University |
|
URL: |
|
http://cs.nju.edu.cn/rl/weiweiwang/ |
|
Email: |
|
ww.wang.cs [at] gmail [dot] com |
|
|
|
|
Supervisor: |
|
|
Professor Yang Gao |
|
|
|
Biography:
|
|
|
Currently
I am a second year graduate student of Department of Computer Science and Technology in Nanjing University and a
member of RL Group, led by professor
Yang Gao.
I received my B.Sc. degree in Computer Science in June 2007.
You can find my Job Hunter C.V. here. |
|
|
|
Research Interest: |
|
|
Reinforcement Learning,
Relational Reinforcement Learning, Graphical Models
, Gaussian Process |
|
|
|
Publications: |
|
|
Weiwei Wang, Tianyin Xu, Xingguo Chen, Yang Gao and Sanglu Lu. Probabilistic Seeking Prediction in P2P VoD Systems. In Proceeding of AI 2009, LNAI 5866, pp. 676–685, 2009. [Code]
Weiwei Wang, Yang
Gao and Xingguo Chen. Reinforcement
Learning with Markov Logic Networks. In Proceedings of MICAI 2008, LNAI
5317, p230-243, 2008.
Weiwei Wang,
Xingguo Chen and Yang Gao. Reinforcement
Learning with Markov Logic Networks. EWRL'08 Presentation, 2008/7.
Weiwei Wang,
Xingguo Chen and Yang Gao.Approximation
methods based on average reward learning with Tile Coding. Pattern
Recognition and Artificial Intelligence, Volume.21 No.4, p446-452,2008/8
Shen Ge, Weiwei
Wang, Gao Yang and Shifu Chen. Reinforcement
Learning for POMDP. In Proceedings of 2007 China National Conference on
Artificial Intelligence, Beijing : Beijing University of Posts and
Telecommunications Press, p196-202, 2007 |
|
|
|
Reinforcement
Learning Introduction: |
|
|
Inspired by
related psychological theory, in computer science, reinforcement learning
is a sub-area of machine learning concerned with how an agent ought to take
actions in an environment so as to maximize some notion of long-term
reward. Reinforcement learning algorithms attempt to find a policy that
maps states of the world to the actions the agent ought to take in those
states. In economics and game theory, reinforcement learning is considered
as a boundedly rational interpretation of how equilibrium may arise.
The environment
is typically formulated as a finite-state Markov decision process (MDP),
and reinforcement learning algorithms for this context are highly related
to dynamic programming techniques. State transition probabilities and
reward probabilities in the MDP are typically stochastic but stationary
over the course of the problem.
Reinforcement
learning differs from the supervised learning problem in that correct
input/output pairs are never presented, nor sub-optimal actions explicitly
corrected. Further, there is a focus on on-line performance, which involves
finding a balance between exploration (of uncharted territory) and
exploitation (of current knowledge). The exploration vs. exploitation
trade-off in reinforcement learning has been mostly studied through the
multi-armed bandit problem. |
|
|
|
Reinforcement
Learning People(in alphabetical order):
|
|
|
Andrew Barto
Andrew Ng
Ben Van Roy
Csaba Szepesvári
Daan Wierstra
Dimitri Bertsekas
Jan Peters
Kurt Driessens
Mohammad Ghavamzadeh
Peter Dayan
Richard Sutton
Warren B. Powell
|
|
|
|
Friends:
|
|
|
Nanjing University: Tianyin Xu, Yangsheng Ji, Shujian Huang, Andong Zhan, Xingguo Chen, Liangdong Shi
McGill University: Jiayuan Yu
University of Alberta: Yuxi Li, Liang Yao, Hengshuai Yao, Wenye Li |
|