reinforcement learning introduction

Posted on 2020-01-25 Edited on 2020-04-18 In reinforcement learning Views:

What is reinforcement learning

Reinforcement learning is every where in the world. I learn to write blogs in English. I learn to use emacs to do coding jobs. I do some stock trading.

Reinforcement learning includes:

Policy: agent's behavior function
Value function: how good is each state and/or action
Reward signal: defines the goal of a reinforcement learning problem

Policy

Policy is the agent's behavior, it is a map from state to action

Deteministic policy:
Stocastic policy:

Value function

Value funciton is a prediction of future reward, used to evaluate the goodness/badness of states,and therefor to select between actions.

Exploration and Exploitation

To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the pat and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it has already exprienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.

Like variance and bias in machine learning, we always need to make trade-off. So to be or not to be, this is a problem.

I used to use vi as my coding tools. I want expore a new tool called emacs. It may bring more reward in the future, but it seems very hard at the beginning.