reinforcement learning introduction
What is reinforcement learning
Reinforcement learning is every where in the world. I learn to write blogs in English. I learn to use emacs to do coding jobs. I do some stock trading.
Reinforcement learning includes:
- Policy: agent's behavior function
- Value function: how good is each state and/or action
- Reward signal: defines the goal of a reinforcement learning problem
Policy
Policy is the agent's behavior, it is a map from state to action
- Deteministic policy:
- Stocastic policy:
Value function
Value funciton is a prediction of future reward, used to evaluate the goodness/badness of states,and therefor to select between actions.
Exploration and Exploitation
To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the pat and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it has already exprienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.
Like variance and bias in machine learning, we always need to make trade-off. So to be or not to be, this is a problem.
I used to use vi as my coding tools. I want expore a new tool called emacs. It may bring more reward in the future, but it seems very hard at the beginning.