15、强化学习：井字棋与A3C算法实现

transformer2023

于 2025-11-08 14:16:24 发布

阅读量13

点赞数

CC 4.0 BY-SA版权

分类专栏：从零开始掌握深度学习文章标签：强化学习井字棋 A3C算法

本文链接：https://blog.youkuaiyun.com/transformer2023/article/details/154669870

从零开始掌握深度学习专栏收录该内容

18 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

强化学习：井字棋与A3C算法实现

1. 抽象环境定义

首先，我们定义一个抽象的环境对象，它将系统的状态编码为NumPy对象列表。以下是环境类的定义：

class Environment(object):
    """An environment in which an actor performs actions to accomplish a task.
    An environment has a current state, which is represented as either a single NumPy
    array, or optionally a list of NumPy arrays.  When an action is taken, that causes
    the state to be updated.  Exactly what is meant by an "action" is defined by each
    subclass.  As far as this interface is concerned, it is simply an arbitrary object.
    The environment also computes a reward for each action, and reports when the task
    has been terminated (meaning that no more actions may be taken).
    """
    def __init__(self, state_shape, n_ac