Q-learning小游戏实例

最新推荐文章于 2025-01-16 10:22:25 发布

Ordinary_yfz

最新推荐文章于 2025-01-16 10:22:25 发布

阅读量1.3k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：科研之路：Mobile+AI+game theory 文章标签：游戏强化学习

本文链接：https://blog.youkuaiyun.com/csyifanZhang/article/details/104520934

看了莫烦python的一位寻宝游戏，为了更好的掌握Q-table的使用，自己写了二维地图寻宝游戏，map 10*10，玩家initial location随机产生，treasure位于地图中央，完整代码见github。可以直接运行（只要环境对）

import 相应的库

由于这里没有玩很high level的游戏，所以没有使用gym，tensorflow等库，只是简单的numpy，panda等

import numpy as np
import pandas as pd
import time
import random
import matplotlib.pyplot as plt

超参数

N_states = 9  # the length of two-dimensional world: N_states*N_states
POS = 40 # the position of treasure (4,4)
ACTION = ["left", "right", "top", "bottom"] #avaliable actions
alpha = 0.1 #learning rate
gamma = 0.9 #discounted factor
episodes = 20 #the maximal number of iterations
interval = 0.3 #time required for each step
greedy = 0.9 # the greedy factor

寻宝过程探险者只有上下左右四个方向，ACTION完全是自由发挥的。

建立Q表

def build_table(nStates, actions):
    '''
    DataFrame是一个表格型的数据结构，由行和列组成，分别有行索引和列索引，且每列可以是不同类型的值。
    创建DataFrame对象
    创建的时候，可以通过参数index和columns分别指定行索引和列索引
    
    1.传入一个numpy的多维数组对象
        
    2.传入一个字典内部包含列表，字典内的列表是等长的, 字典的key默认为列索引
    '''
    table = pd.DataFrame(
            np.zeros((nStates*nStates, len(actions))),
           columns = actions)
    # q_table:
    """
        left  right  top  bottom
    0    0.0    0.0  0.0     0.0
    1    0.0    0.0  0.0     0.0
    2    0.0    0.0  0.0     0.0
    3    0.0    0.0  0.0     0.0
    4    0.0    0.0  0.0     0.0
    5    0.0    0.0  0.0     0.0
    6    0.0    0.0  0.0     0.0
    7    0.0    0.0  0.0     0.0
    8    0.0    0.0  0.0     0.0
    9    0.0    0.0  0.0     0.0
    """
    return table

根据状态和Q-table选择下一步的action

有随机的概率不使用argmax来选择行为，给sample增加随机性

def chooseActions(state, table):
    '''
    loc通过行和列 的名字获取值
    iloc通过下标获取值
    '''
    #print(state)
    actions = table.iloc[state,:] #choose corresponding values of actions about this state
    if(np.random.uniform() >= greedy) or (actions.all() == 0):#not greedy or all the actions have not been sampled
        actName = np.random.choice(ACTION);
    else:
        actName