计算纳什均衡

EXAMPLE1

[
	[0, 2, -1, 1],
	[-2, 0, 3, 1],
	[1, -3, 0, 1],
]

双人零和对称博弈(M=−MTM = -M^TM=MT)的所有纳什均衡都是对称纳什均衡.
max⁡pmin⁡qpMqT=min⁡qmax⁡ppMqT=pMpT=−pMpT=0\max\limits_p \min\limits_q pMq^T = \min\limits_q \max\limits_p pMq^T = pMp^T = -pMp^T = 0pmaxqminpMqT=qminpmaxpMqT=pMpT=pMpT=0

  • max min 问题如下:

    max⁡pmin⁡qpMqT  ⟺  max⁡pmin⁡{pM}  ⟺  {max⁡p,vvpM≽v1p≽0p1T=1\begin{aligned} & \max\limits_p \min\limits_q pMq^T \\ \iff& \max\limits_p \min\{pM\} \\ \iff& \begin{cases} \max\limits_{p,v} v \\ pM \succcurlyeq v1 \\ p \succcurlyeq 0 \\ p1^T = 1 \\ \end{cases} \\ \end{aligned}pmaxqminpMqTpmaxmin{pM}p,vmaxvpMv1p0p1T=1

      ⟺  {min⁡p,v−v2p1−p2+v⩽0−2p0+3p2+v⩽0p0−3p1+v⩽0−p0⩽0−p1⩽0−p2⩽0p0+p1+p2=1\iff \begin{cases} & \min\limits_{p,v} -v \\ & \begin{matrix} & 2p_1 & -p_2 & +v &\leqslant &0 \\ -2p_0 & & +3p_2 & +v &\leqslant &0 \\ p_0 & -3p_1 & & +v &\leqslant &0 \\ -p_0 & & & &\leqslant &0 \\ & -p_1 & & &\leqslant &0 \\ & & -p_2 & &\leqslant &0 \\ p_0 & +p_1 & +p_2 & &= &1 \\ \end{matrix} \\ \end{cases}p,vminv2p0p0p0p02p13p1p1+p1p2+3p2p2+p2+v+v+v=0000001

  • max min 问题解得 p∗=[12,16,13]p^*=[\frac{1}{2}, \frac{1}{6}, \frac{1}{3}]p=[21,61,31], v∗=0v^*=0v=0.

  • min max 问题如下:

    min⁡qmax⁡pqMTpT  ⟺  min⁡qmax⁡{qMT}  ⟺  {min⁡q,uuqMT≼u1q≽0q1T=1\begin{aligned} & \min\limits_q \max\limits_p qM^Tp^T \\ \iff& \min\limits_q \max\{qM^T\} \\ \iff& \begin{cases} \min\limits_{q, u} u \\ qM^T \preccurlyeq u1 \\ q \succcurlyeq 0 \\ q1^T = 1 \\ \end{cases} \\ \end{aligned}qminpmaxqMTpTqminmax{qMT}q,uminuqMTu1q0q1T=1

      ⟺  {min⁡q,uu2q1−q2−u⩽0−2q0+3q2−u⩽0q0−3q1−u⩽0−q0⩽0−q1⩽0−q2⩽0q0+q1+q2=1\iff \begin{cases} & \min\limits_{q,u} u \\ & \begin{matrix} & 2q_1 & -q_2 & -u &\leqslant &0 \\ -2q_0 & & +3q_2 & -u &\leqslant &0 \\ q_0 & -3q_1 & & -u &\leqslant &0 \\ -q_0 & & & &\leqslant &0 \\ & -q_1 & & &\leqslant &0 \\ & & -q_2 & &\leqslant &0 \\ q_0 & +q_1 & +q_2 & &= &1 \\ \end{matrix} \\ \end{cases}q,uminu2q0q0q0q02q13q1q1+q1q2+3q2q2+q2uuu=0000001

  • min max 问题解得 q∗=[12,16,13]q^*=[\frac{1}{2}, \frac{1}{6}, \frac{1}{3}]q=[21,61,31], u∗=0u^*=0u=0.

  • 结论:
    计算机求解结果满足 v∗=u∗v^*=u^*v=u, 因此 (p∗,q∗)(p^*,q^*)(p,q) 是一个混合策略纳什均衡.
    计算机求解结果满足 p∗=q∗p^*=q^*p=q, v∗=u∗=0v^*=u^*=0v=u=0, 与理论分析一致.


具体代码如下:

# https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html

import numpy
import scipy

problem_p_v = {
    'c': numpy.array([[0, 0, 0, -1]]),
    'A_ub': numpy.array([
        [0, 2, -1, 1],
        [-2, 0, 3, 1],
        [1, -3, 0, 1],
        [-1, 0, 0, 0],
        [0, -1, 0, 0],
        [0, 0, -1, 0],
    ]),
    'b_ub': numpy.array([
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
    ]),
    'A_eq': numpy.array([[1, 1, 1, 0]]),
    'b_eq': numpy.array([[1]]),
}
result_p_v = scipy.optimize.linprog(**problem_p_v, bounds=(None, None))
print(result_p_v)

problem_q_u = {
    'c': numpy.array([[0, 0, 0, 1]]),
    'A_ub': numpy.array([
        [0, 2, -1, -1],
        [-2, 0, 3, -1],
        [1, -3, 0, -1],
        [-1, 0, 0, 0],
        [0, -1, 0, 0],
        [0, 0, -1, 0],
    ]),
    'b_ub': numpy.array([
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
    ]),
    'A_eq': numpy.array([[1, 1, 1, 0]]),
    'b_eq': numpy.array([[1]]),
}
result_q_u = scipy.optimize.linprog(**problem_q_u, bounds=(None, None))
print(result_q_u)

EXAMPLE2

[
    [1, -2, 6, -4],
    [2, -7, 2, 4],
    [-3, 4, -4, -3],
    [-8, 3, -2, 3],
]
  • max min 问题如下:

    max⁡pmin⁡qpMqT  ⟺  max⁡pmin⁡{pM}  ⟺  {max⁡p,vvpM≽v1p≽0p1T=1  ⟺  {min⁡p,v−v[1−MT⋮1−10⋱⋮−10][pTv]≼0[1⋯10][pTv]=[1]\begin{aligned} & \max\limits_p \min\limits_q pMq^T \\ \iff& \max\limits_p \min\{pM\} \\ \iff& \begin{cases} \max\limits_{p,v} v \\ pM \succcurlyeq v1 \\ p \succcurlyeq 0 \\ p1^T = 1 \\ \end{cases} \\ \iff& \begin{cases} \min\limits_{p,v} -v \\ \begin{bmatrix} & & &1 \\ &-M^T & &\vdots \\ & & &1 \\ -1 & & &0 \\ &\ddots & &\vdots \\ & &-1 &0 \\ \end{bmatrix} \begin{bmatrix} \\ p^T \\ \\ v \end{bmatrix} \preccurlyeq 0 \\ \begin{bmatrix} 1 &\cdots &1 &0 \end{bmatrix} \begin{bmatrix} \\ p^T \\ \\ v \end{bmatrix} = [1] \\ \end{cases} \\ \end{aligned}pmaxqminpMqTpmaxmin{pM}p,vmaxvpMv1p0p1T=1p,vminv1MT11100pTv0[110]pTv=[1]

  • max min 问题解得 p∗=[0.14,0.35,0.51,0]p^*=[0.14,0.35,0.51,0]p=[0.14,0.35,0.51,0], v∗=−0.69v^*=-0.69v=0.69.

  • min max 问题如下:

    min⁡qmax⁡pqMTpT  ⟺  min⁡qmax⁡{qMT}  ⟺  {min⁡q,uuqMT≼u1q≽0q1T=1  ⟺  {min⁡q,uu[−1M⋮−1−10⋱⋮−10][qTu]≼0[1⋯10][qTu]=[1]\begin{aligned} & \min\limits_q \max\limits_p qM^Tp^T \\ \iff& \min\limits_q \max\{qM^T\} \\ \iff& \begin{cases} \min\limits_{q, u} u \\ qM^T \preccurlyeq u1 \\ q \succcurlyeq 0 \\ q1^T = 1 \\ \end{cases} \\ \iff& \begin{cases} \min\limits_{q,u} u \\ \begin{bmatrix} & & &-1 \\ &M & &\vdots \\ & & &-1 \\ -1 & & &0 \\ &\ddots & &\vdots \\ & &-1 &0 \\ \end{bmatrix} \begin{bmatrix} \\ q^T \\ \\ u \end{bmatrix} \preccurlyeq 0 \\ \begin{bmatrix} 1 &\cdots &1 &0 \end{bmatrix} \begin{bmatrix} \\ q^T \\ \\ u \end{bmatrix} = [1] \\ \end{cases} \\ \end{aligned}qminpmaxqMTpTqminmax{qMT}q,uminuqMTu1q0q1T=1q,uminu1M11100qTu0[110]qTu=[1]

  • min max 问题解得 q∗=[0.53,0.33,0,0.14]q^*=[0.53,0.33,0,0.14]q=[0.53,0.33,0,0.14], u∗=−0.69u^*=-0.69u=0.69.

  • 结论:
    计算机求解结果满足 v∗=u∗v^*=u^*v=u, 因此 (p∗,q∗)(p^*,q^*)(p,q) 是一个混合策略纳什均衡.


具体代码如下:

# https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html

import numpy
import scipy

M = numpy.array([
    [1, -2, 6, -4],
    [2, -7, 2, 4],
    [-3, 4, -4, -3],
    [-8, 3, -2, 3],
])

problem_p_v = {
    'c': numpy.concatenate((numpy.zeros((1, 4)), numpy.array([[-1]])), axis=1),
    'A_ub': numpy.concatenate(
        (
            numpy.concatenate((-M.T, numpy.ones((4, 1))), axis=1),
            numpy.concatenate((-numpy.identity(4), numpy.zeros((4, 1))), axis=1),
        ),
        axis=0,
    ),
    'b_ub': numpy.zeros((8, 1)),
    'A_eq': numpy.concatenate((numpy.ones((1, 4)), numpy.array([[0]])), axis=1),
    'b_eq': numpy.array([[1]]),
}
result_p_v = scipy.optimize.linprog(**problem_p_v, bounds=(None, None))
print(result_p_v)

problem_q_u = {
    'c': numpy.concatenate((numpy.zeros((1, 4)), numpy.array([[1]])), axis=1),
    'A_ub': numpy.concatenate(
        (
            numpy.concatenate((M, -numpy.ones((4, 1))), axis=1),
            numpy.concatenate((-numpy.identity(4), numpy.zeros((4, 1))), axis=1),
        ),
        axis=0,
    ),
    'b_ub': numpy.zeros((8, 1)),
    'A_eq': numpy.concatenate((numpy.ones((1, 4)), numpy.array([[0]])), axis=1),
    'b_eq': numpy.array([[1]]),
}
result_q_u = scipy.optimize.linprog(**problem_q_u, bounds=(None, None))
print(result_q_u)
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值