Introduction
- Agent and Environment,History and state, Agent state, Environment state, Information state, Fully observable enviroments, Partially observable enviroments
环境完全可观测
环境部分可观测
- Policy: agents’ behaviour function, a map from state to action
- Value Function: how good is each state/action, a prediction of future reward
- Model: agents’ representation of enviroment, predicts what the enviroment will do next. predicts the next state/reward
- Categorys: Value based, Policy based, Actor Critic, Model Free, Model based
- E&E: Exploration finds more information about the enviroment; Exploitation exploits known information to maximise reward.
- Prediction: evaluate the future. Control: optimise the future.
强化学习基础:David Silver课程笔记

这篇博客介绍了强化学习的基础概念,包括智能体与环境、历史和状态、完全可观测与部分可观测环境。此外,还讨论了政策、价值函数、模型在强化学习中的作用,以及探索与利用、预测与控制的概念。
879

被折叠的 条评论
为什么被折叠?



