强化学习:井字棋与A3C算法实现
1. 抽象环境定义
首先,我们定义一个抽象的环境对象,它将系统的状态编码为NumPy对象列表。以下是环境类的定义:
class Environment(object):
"""An environment in which an actor performs actions to accomplish a task.
An environment has a current state, which is represented as either a single NumPy
array, or optionally a list of NumPy arrays. When an action is taken, that causes
the state to be updated. Exactly what is meant by an "action" is defined by each
subclass. As far as this interface is concerned, it is simply an arbitrary object.
The environment also computes a reward for each action, and reports when the task
has been terminated (meaning that no more actions may be taken).
"""
def __init__(self, state_shape, n_ac
超级会员免费看
订阅专栏 解锁全文
869

被折叠的 条评论
为什么被折叠?



