论文笔记3D Convolutional Neural Networks for Human Action Recognition

本文基于早期研究《3DConvolutionalNeuralNetworksforHumanActionRecognition》,探讨了3D卷积神经网络(3DCNN)在人类动作识别领域的应用,通过对比实验验证了3DCNN相较于2DCNN在识别如打电话、放置物体和指物等动作上的优越性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1 简介

本文依据2009年左右的《3D Convolutional Neural Networks for Human Action Recognition》翻译总结.

应该是较早提出3D CNN的文章。识别的人类动作主要要三种打电话、ObjectPut、Pointing。

2 3D Convolutional Neural Networks

2.1 2D CNN

在这里插入图片描述

2.2 3D 卷积

在这里插入图片描述

下图是共享权重的3D卷积。相同的3D kernel.

在这里插入图片描述

不共享权重的3D卷积。右侧会产生两个不同的feature map。本文就用的这种。
在这里插入图片描述

2.3 3D CNN Architecture

输入用了7帧画面。

hardwired kernels 输出 5 个不同的channels 即 gray, gradient-x, gradient-y, optflow-x, and optflow-y.

H1到C2是在每个位置用了两个不同的3D卷积(上一节说到的不共享权重的3D卷积),所以产生2倍的23个feature maps.

S3到C4是用了3个不同的3D卷积,从23 * 2到13 * 6.

在这里插入图片描述

3 实验结果

可以看到3D CNN比2D CNN效果好。

在这里插入图片描述

### Skeleton-Based Action Recognition Research and Techniques In the field of skeleton-based action recognition, researchers have developed various methods to interpret human actions from skeletal data. These approaches leverage deep learning models that can effectively capture spatial-temporal features inherent in sequences of joint positions over time. One prominent technique involves utilizing recurrent neural networks (RNNs), particularly long short-term memory (LSTM) units or gated recurrent units (GRUs). Such architectures are adept at handling sequential information due to their ability to maintain a form of memory across timesteps[^1]. This characteristic makes them suitable for modeling temporal dependencies present within motion capture datasets. Convolutional Neural Networks (CNNs) also play an essential role when applied on graphs representing skeletons as nodes connected by edges denoting limb segments between joints. Graph Convolutional Networks (GCNs) extend traditional CNN operations onto non-Euclidean domains like point clouds or meshes formed around articulated bodies during movement execution phases[^2]. Furthermore, some studies integrate both RNN variants with GCN layers into hybrid frameworks designed specifically for this task domain; these combined structures aim to simultaneously exploit local appearance cues alongside global structural patterns exhibited throughout entire pose configurations captured frame-by-frame via sensors such as Microsoft Kinect devices or other depth cameras capable of tracking multiple individuals performing diverse activities indoors under varying lighting conditions without requiring any wearable markers attached directly onto participants' limbs/skin surfaces. ```python import torch.nn.functional as F from torch_geometric.nn import GCNConv class ST_GCN(torch.nn.Module): def __init__(self, num_features, hidden_channels, class_num): super(ST_GCN, self).__init__() self.conv1 = GCNConv(num_features, hidden_channels) self.fc1 = Linear(hidden_channels, class_num) def forward(self, x, edge_index): h = self.conv1(x, edge_index) h = F.relu(h) h = F.dropout(h, training=self.training) z = self.fc1(h) return F.log_softmax(z, dim=1) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值