带有Python的AI –强化学习-优快云博客

本文介绍了使用Python进行强化学习的基础知识，包括环境和代理的概念。强化学习中的代理通过传感器感知环境并采取行动，根据环境的反馈调整自身行为。文章还讨论了环境的属性，如离散、连续、可观察性和动态性，并提到了OpenAI Gym库，这是一个用于构建和测试强化学习代理的工具。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

带有Python的AI –强化学习 (AI with Python – Reinforcement Learning)

In this chapter, you will learn in detail about the concepts reinforcement learning in AI with Python.

在本章中，您将详细了解有关使用Python在AI中进行强化学习的概念。

强化学习基础 (Basics of Reinforcement Learning)

This type of learning is used to reinforce or strengthen the network based on critic information. That is, a network being trained under reinforcement learning, receives some feedback from the environment. However, the feedback is evaluative and not instructive as in the case of supervised learning. Based on this feedback, the network performs the adjustments of the weights to obtain better critic information in future.

这种类型的学习用于基于评论者信息来增强或加强网络。即，正在强化学习下训练的网络从环境中接收一些反馈。但是，在监督学习的情况下，反馈是评估性的，而不是指导性的。基于此反馈，网络将对权重进行调整，以在将来获得更好的评论者信息。

This learning process is similar to supervised learning but we might have very less information. The following figure gives the block diagram of reinforcement learning −

此学习过程类似于监督学习，但是我们所掌握的信息可能很少。下图给出了强化学习的框图-

构件：环境和代理 (Building Blocks: Environment and Agent)

Environment and Agent are main building blocks of reinforcement learning in AI. This section discusses them in detail −

环境和Agent是AI中强化学习的主要构建块。本节详细讨论它们-

代理商 (Agent)

An agent is anything that can perceive its environment through sensors and acts upon that environment through effectors.

代理是可以通过传感器感知其环境并通过效应器在该环境上起作用的任何事物。

A human agent has sensory organs such as eyes, ears, nose, tongue and skin parallel to the sensors, and other organs such as hands, legs, mouth, for effectors.
人类试剂具有与传感器平行的感觉器官，例如眼睛，耳朵，鼻子，舌头和皮肤，以及其他器官，例如效应器的手，腿，嘴。
A robotic agent replaces cameras and infrared range finders for the sensors, and various motors and actuators for effectors.
机器人代理代替了用于传感器的照相机和红外测距仪，以及用于效应器的各种电机和致动器。
A software agent has encoded bit strings as its programs and actions.
软件代理已将位字符串编码为其程序和动作。

代理术语 (Agent Terminology)

The following terms are more frequently used in reinforcement learning in AI −

以下术语在AI的强化学习中更常用-

Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
代理的绩效评估-这是确定代理成功与否的标准。
Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
代理的行为 -这是代理在任何给定的感知序列之后执行的动作。
Percept − It is agent’s perceptual inputs at a given instance.
感知-它是给定实例上智能体的感知输入。
Percept Sequence − It is the history of all that an agent has perceived till date.
感知序列 -这是代理迄今为止所感知的所有历史。
Agent Function − It is a map from the precept sequence to an action.
代理功能 -它是从规约序列到动作的映射。

环境 (Environment)

Some programs operate in an entirely artificial environment confined to keyboard input, database, computer file systems and character output on a screen.

一些程序在完全人工的环境中运行，仅限于键盘输入，数据库，计算机文件系统和屏幕上的字符输出。

In contrast, some software agents, such as software robots or softbots, exist in rich and unlimited softbot domains. The simulator has a very detailed, and complex environment. The software agent needs to choose from a long array of actions in real time.

相反，某些软件代理(例如软件机器人或软件机器人)存在于丰富且无限的软件机器人域中。该模拟器具有非常详细 ， 复杂的环境 。软件代理需要实时从多种操作中进行选择。

For example, a softbot designed to scan the online preferences of the customer and display interesting items to the customer works in the real as well as an artificial environment.

例如，一个旨在扫描客户在线偏好并向客户显示有趣项目的软件机器人可以在真实环境和人工环境中工作。

环境性质 (Properties of Environment)

The environment has multifold properties as discussed below −

环境具有多重属性，如下所述-

Discrete/Continuous − If there are a limited number of distinct, clearly defined, states of the environment, the environment is discrete , otherwise it is continuous. For example, chess is a discrete environment and driving is a continuous environment.
离散/连续 -如果环境的状态数量有限且清晰定义，则该环境为离散状态，否则为连续状态。例如，国际象棋是一个离散的环境，而驾驶是一个连续的环境。
Observable/Partially Observable − If it is possible to determine the complete state of the environment at each time point from the percepts, it is observable; otherwise it is only partially observable.
可观察/部分可观察 -如果可以根据感知在每个时间点确定环境的完整状态，则可以观察；否则只能部分观察到。
Static/Dynamic − If the environment does not change while an agent is acting, then it is static; otherwise it is dynamic.
静态/动态 -如果代理在操作时环境没有变化，则它是静态的；否则它是动态的。
Single agent/Multiple agents − The environment may contain other agents which may be of the same or different kind as that of the agent.
单个代理/多个代理 -环境中可能包含与该代理具有相同或不同种类的其他代理。
Accessible/Inaccessible − If the agent’s sensory apparatus can have access to the complete state of the environment, then the environment is accessible to that agent; otherwise it is inaccessible.
可访问/不可访问 -如果代理的感觉设备可以访问环境的完整状态，则该代理可以访问环境；否则无法访问。
Deterministic/Non-deterministic − If the next state of the environment is completely determined by the current state and the actions of the agent, then the environment is deterministic; otherwise it is non-deterministic.
确定性/非确定性 -如果环境的下一个状态完全由当前状态和代理的行为确定，则环境是确定性的；否则，它是不确定的。
Episodic/Non-episodic − In an episodic environment, each episode consists of the agent perceiving and then acting. The quality of its action depends just on the episode itself. Subsequent episodes do not depend on the actions in the previous episodes. Episodic environments are much simpler because the agent does not need to think ahead.
情节/非情节 -在情节环境中，每个情节都由主体感知然后行动。其动作的质量仅取决于情节本身。后续情节不取决于先前情节中的动作。情景环境要简单得多，因为代理不需要提前考虑。

使用Python构建环境 (Constructing an Environment with Python)

For building reinforcement learning agent, we will be using the OpenAI Gym package which can be installed with the help of the following command −

对于建筑加固学习代理，我们将使用OpenAI Gym软件包，可以在以下命令的帮助下进行安装-


pip install gym

There are various environments in OpenAI gym which can be used for various purposes. Few of them are Cartpole-v0, Hopper-v1, and MsPacman-v0. They require different engines. The detail documentation of OpenAI Gym can be found on https://gym.openai.com/docs/#environments.

OpenAI体育馆有多种环境，可用于各种目的。它们很少是Cartpole-v0，Hopper-v1和MsPacman-v0 。他们需要不同的引擎。可以在https://gym.openai.com/docs/#environments上找到OpenAI Gym的详细文档。

The following code shows an example of Python code for cartpole-v0 environment −

以下代码显示了Cartpole-v0环境的Python代码示例-


import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
   env.render()
   env.step(env.action_space.sample())

You can construct other environments in a similar way.

您可以用类似的方式构建其他环境。

使用Python构造学习代理 (Constructing a learning agent with Python)

For building reinforcement learning agent, we will be using the OpenAI Gym package as shown −

对于建筑加固学习代理，我们将使用OpenAI Gym软件包，如下所示：


import gym
env = gym.make('CartPole-v0')
for _ in range(20):
   observation = env.reset()
   for i in range(100):
      env.render()
      print(observation)
      action = env.action_space.sample()
      observation, reward, done, info = env.step(action)
      if done:
         print("Episode finished after {} timesteps".format(i+1))
         break