王巍巍
Weiwei Wang

M.Sc. Student

Rinforcement Learning Group
Department of Computer Science
Nanjing University, Nanjing, China


Sections:

Supervisor

Biogeraphy

Research Interest

Publications

RL Introduction

RL People

Friends

Correspondence

Links:

My Blog(Chinese)

 

 

 

 

 Locations of visitors to this page

Supervisor:

 

Professor Yang Gao

Biography:


 

Currently I am a second year graduate student of Department of Computer Science and Technology in Nanjing University and a member of RL Group, led by professor Yang Gao.
I received my B.Sc. degree in Computer Science in June 2007.

You can find my Job Hunter C.V. here.


Research Interest:

 

Reinforcement Learning, Relational Reinforcement Learning, Graphical Models , Gaussian Process


Publications:

 

Weiwei Wang, Tianyin Xu, Xingguo Chen, Yang Gao and Sanglu Lu. Probabilistic Seeking Prediction in P2P VoD Systems. In Proceeding of AI 2009, LNAI 5866, pp. 676–685, 2009. [Code]

Weiwei Wang, Yang Gao and Xingguo Chen. Reinforcement Learning with Markov Logic Networks. In Proceedings of MICAI 2008, LNAI 5317, p230-243, 2008.

Weiwei Wang, Xingguo Chen and Yang Gao. Reinforcement Learning with Markov Logic Networks. EWRL'08 Presentation, 2008/7.

Weiwei Wang, Xingguo Chen and Yang Gao.Approximation methods based on average reward learning with Tile Coding. Pattern Recognition and Artificial Intelligence, Volume.21 No.4, p446-452,2008/8

Shen Ge, Weiwei Wang, Gao Yang and Shifu Chen. Reinforcement Learning for POMDP. In Proceedings of 2007 China National Conference on Artificial Intelligence, Beijing : Beijing University of Posts and Telecommunications Press, p196-202, 2007


Reinforcement Learning Introduction:

 

Inspired by related psychological theory, in computer science, reinforcement learning is a sub-area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. In economics and game theory, reinforcement learning is considered as a boundedly rational interpretation of how equilibrium may arise.

The environment is typically formulated as a finite-state Markov decision process (MDP), and reinforcement learning algorithms for this context are highly related to dynamic programming techniques. State transition probabilities and reward probabilities in the MDP are typically stochastic but stationary over the course of the problem.

Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been mostly studied through the multi-armed bandit problem.


Reinforcement Learning People:

 

Richard Sutton
Andrew Ng
Warren B. Powell
Peter Dayan
Ben Van Roy
Andrew Barto
Csaba Szepesvári
Dimitri Bertsekas
Daan Wierstra
Jan Peters
Mohammad Ghavamzadeh
Kurt Driessens


Friends:

 

Nanjing University: Tianyin Xu, Yangsheng Ji, Shujian Huang, Andong Zhan, Xingguo Chen, Liangdong Shi

McGill University: Jiayuan Yu

University of Alberta: Yuxi Li, Liang Yao, Hengshuai Yao, Wenye Li


Correspondence:

 

Mail:

Weiwei Wang
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210093, China

Laboratory:

403, Meng Minwei Building, Gulou Campus of Nanjing University

URL:

http://cs.nju.edu.cn/rl/weiweiwang/

Email:

ww.wang.cs at gmail dot com