AI基础 : Adversarial Search II 对抗性搜索

AND - OR搜索树与博弈决策复杂度优化

Non-deterministic Transitions

AND-OR Search Trees

• In deterministic environments在确定性环境中,分支仅由智能体的选择引起。, branching only occurs due to agent’s choice (OR Nodes)
• In non-deterministic environments在非确定性环境中,除了智能体的选择,环境的随机性也会导致分支, the environment’s choice must also be taken into account (AND Nodes)
• Solution is a subtree of the AND-OR tree that:
— Has a goal node at every leaf
— Specifies an action at each OR node
— Includes every outcome branch of its AND nodes

AND-OR Graph Search

Adversarial Optimal Decisions

• Time Complexity O(bm)
• Space Complexity O(bm)
• Chess, on average: b = 30 m = 40

Reducing Complexity

• Reducing complexity of bm
— Reduce branching factor (b)?
— Reduce maximum search depth (m)?
— Searching in a graph rather than a tree? 在树形结构中,状态之间的连接是分层的,而在图形结构中,状态之间的连接可以是任意形式的。

Reducing Branching Factor

• Alpha-Beta Pruning
— Evaluate which nodes/branches would not affect MIN/MAX’s decision
— Based on keeping track of two parameters:
◦ α - value of the best (highest) choice we have in MAX’s path
◦ β - value of the best (lowest) choice we have in MIN’s path
• Updates these values as one goes along the tree

Move Ordering

• Pruning is strongly affected by the ordering of the moves in the tree
— A good ordering*, would enable us to prune many nodes
• Move ordering is often game-dependent knowledge (heuristic)
• Dynamic move-ordering (killer-move heuristic)  可以利用搜索树中已知的有效剪枝信息。

Reducing Depth - Killer Move

• Dynamic heuristic to determine a “good” ordering
• Search two plies ahead until Max (alt. Min) causes a beta (alt. alpha) cutoff
• The move that caused the cutoff is the killer move

在搜索过程中,算法会搜索两步,直到MAX(或MIN)玩家导致一个剪枝。

如果一个移动导致剪枝,那么这个移动被称为killer move。

Reducing M - Eval Function 减少评估函数的复杂性

Weighted linear function over features of a state

示例:国际象棋当前状态:棋子和位置(结构)

示例:万智牌(纸牌游戏)当前状态:生命值、游戏卡牌和手牌

Graph Search

• As in non-adversarial search, many states will be revisited 搜索可能需要探索不同的路径
• However, only recording visited states is not enough (since MIN can deviate in the future)
• Need to store actual loop paths (memory intensive)
— Requires “caching” strategy

Stochastic Games

• Outcome of agent choices is not deterministic
— Games must take into account multiple outcomes for the player
• Solution: weight outcomes by their probability
— Expected value

Expectiminimax

浙江大学人工智能课程课件,内容有: Introduction Problem-solving by search( 4 weeks) Uninformed Search and Informed (Heuristic) Search (1 week) Adversarial Search: Minimax Search, Evaluation Functions, Alpha-Beta Search, Stochastic Search Adversarial Search: Multi-armed bandits, Upper Confidence Bound (UCB),Upper Confidence Bounds on Trees, Monte-Carlo Tree Search(MCTS) Statistical learning and modeling (5 weeks) Probability Theory, Model selection, The curse of Dimensionality, Decision Theory, Information Theory Probability distribution: The Gaussian Distribution, Conditional Gaussian distributions, Marginal Gaussian distributions, Bayes’ theorem for Gaussian variables, Maximum likelihood for the Gaussian, Mixtures of Gaussians, Nonparametric Methods Linear model for regression: Linear basis function models; The Bias-Variance Decomposition Linear model for classification : Basic Concepts; Discriminant Functions (nonprobabilistic methods); Probabilistic Generative Models; Probabilistic Discriminative Models K-means Clustering and GMM & Expectation–Maximization (EM) algorithm, BoostingThe Course Syllabus Deep Learning (4 weeks) Stochastic Gradient Descent, Backpropagation Feedforward Neural Network Convolutional Neural Networks Recurrent Neural Network (LSTM, GRU) Generative adversarial network (GAN) Deep learning in NLP (word2vec), CV (localization) and VQA(cross-media) Reinforcement learning (1 weeks) Reinforcement learning: introduction
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值