Categorical.log_prob()
log_prob takes the log of the probability (of some actions). Example:
import torch
from torch.distributions import Categorical
import torch.nn.functional as F
action_logits = torch.rand(5)
action_probs = F.softmax(action_logits, dim=-1)
print(action_probs)
dist = Categorical(action_probs)
action = dist.sample()
print(action)
print(dist.log_prob(action), torch.log(action_probs[action]))
输出
tensor([0.1419, 0.3035, 0.1763, 0.1427, 0.2355])
tensor(2)
tensor(-1.7358) tensor(-1.7358)
即loge(0.1763)log_e(0.1763)loge(0.1763)

本文介绍了PyTorch中Categorical分布的log_prob()方法,通过实例演示了如何使用该方法来获取动作概率的日志值,并验证其正确性。
4239

被折叠的 条评论
为什么被折叠?



