expand,where和softmax算子的cuda编程

本文介绍了PyTorch中的expand和where算子的CUDA编程,包括1D和高维情况下的实现。文章详细阐述了如何在高维数组上应用expand,解析了如何根据输出索引回溯输入向量索引。此外,还讨论了一维数组下的softmax算子,分析了不同规约策略如交叉配对、交错配对和shuffle warp的并行加速效果,并给出了相应的CUDA代码示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

expand和where介绍

当谈到 Torch 中的 expand 函数时,我们实际上是指 PyTorch(Torch 的 Python 接口)中的 expand 方法。下面是对 expand 方法和 where 函数的介绍,包括它们的输入和输出:
expand 方法:
torch.Tensor.expand() 是 PyTorch 中 Tensor 类的一个方法,用于扩展张量的维度。
输入:input 是要扩展的张量,size 是一个元组,指定了要扩展的每个维度的大小。
输出:返回一个新的张量,形状是 input 张量的形状扩展后的形状。
where 函数:
torch.where() 是 PyTorch 中的一个函数,用于根据给定的条件从两个张量中选择元素。
输入:condition 是一个布尔型的张量,形状与 x 和 y 两个张量的形状一致。x 和 y 是两个形状相同的张量。
输出:返回一个新的张量,形状与 x 和 y 的形状相同,其中的元素根据 condition 张量的值选择自 x 或 y。

1D情况下的expand和where编程

一维向量上操作expand和where过于简单,这里仅仅放一下chatgpt给出的kerne

### Person Re-Identification Loss Functions In person re-identification (ReID), several loss functions are commonly used to train models effectively. These losses aim at improving feature representation learning so that the same identities have similar features while different ones remain distinct. #### Triplet Loss Triplet loss is widely adopted in ReID tasks due to its effectiveness in pulling together embeddings from images of the same identity and pushing apart those from different identities[^1]. The triplet consists of an anchor image, a positive match with the same ID as the anchor, and a negative sample which has a different ID. Mathematically, this can be expressed as: \[ L_{\text{triplet}}(a,p,n) = \max(d(a,p)-d(a,n)+m, 0) \] where \( d(x,y) \) represents distance between two embedding vectors, typically Euclidean or cosine distance; \( m \) denotes margin parameter ensuring separation gap. ```python import torch.nn.functional as F def triplet_loss(anchor, positive, negative, margin=1.0): pos_dist = F.pairwise_distance(anchor, positive) neg_dist = F.pairwise_distance(anchor, negative) return F.relu(pos_dist - neg_dist + margin).mean() ``` #### Cross Entropy Loss Combined With Label Smoothing Cross entropy combined with label smoothing regularizes softmax-based classifiers during training by reducing overfitting on training data points. This approach encourages more generalized decision boundaries among classes. The formula for cross entropy loss incorporating label smoothing looks like below: \[ L_{CE}(y,\hat y)=-(1-\alpha)y\log(\hat y+\epsilon)-\frac{\alpha}{K}\sum_k^{K}{}\log(\hat y_k+\epsilon)\] Here, \( K \) stands for number of categories, \( \alpha \) controls strength of regularization term, and small constant \( \epsilon \) prevents numerical instability when computing logarithms. ```python import torch from torch import nn class SmoothedCELoss(nn.Module): def __init__(self, alpha=0.1, epsilon=1e-8): super().__init__() self.alpha = alpha self.epsilon = epsilon def forward(self, logits, targets): n_classes = logits.size(-1) one_hot_targets = F.one_hot(targets, num_classes=n_classes).float() smoothed_labels = (1-self.alpha)*one_hot_targets+self.alpha/n_classes log_probs = F.log_softmax(logits, dim=-1) ce_loss = -(smoothed_labels * log_probs).sum(dim=-1) return ce_loss.mean() ``` #### Center Loss Center loss minimizes intra-class variance through penalizing distances between each instance's deep feature vector and corresponding class center point. By doing so, instances belonging to the same category cluster tightly around their respective centers leading to better discrimination power against other clusters. Mathematical expression for center loss reads thusly: \[ L_c=\frac{1}{2N}\sum_i^N||f(x_i)-c_{y_i}||_2^2 \] Where \( f(x_i) \) signifies outputted feature map after passing input \( x_i \); \( c_j \) symbolizes learned centroid associated with j-th class. ```python import numpy as np def update_centers(features, labels, old_centers, lr=0.5): new_centers = [] counts = [] unique_ids = list(set(labels)) for id in unique_ids: mask = (labels == id) count = sum(mask) feat_sum = features[mask].sum(axis=0) updated_center = ((count*old_centers[id])+(lr*feat_sum))/(count+lr) new_centers.append(updated_center) counts.append(count+len(old_centers[id])) return np.array(new_centers), counts def compute_center_loss(features, centers, labels): device = 'cuda' if torch.cuda.is_available() else 'cpu' features = features.to(device) centers = centers.to(device) labels = labels.to(device) batch_size = features.size()[0] distmat = torch.pow(features, 2).sum(dim=1, keepdim=True).expand(batch_size, -1)\ +torch.pow(centers, 2).sum(dim=1, keepdim=True).expand(-1,batch_size).t() distmat.addmm_(features, centers.t(), beta=1, alpha=-2) indices = torch.arange(batch_size).long().to(device) loss = distmat.gather(1, labels.unsqueeze(1)).squeeze(1) return loss.sum()/batch_size ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

谨慎付费(看不懂试读博客不要订阅)

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值