CNN——识别手势小项目

该博客介绍了如何使用TensorFlow构建一个卷积神经网络(CNN)来识别手语。通过创建占位符、初始化参数、前向传播并计算成本,最终实现了一个完整的模型。在SIGNS数据集上训练模型,用于对手势进行分类。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文来自Coursera深度学习系列课程项目作业,请勿作为商业用途使用。

Convolutional Neural Networks: Application

Welcome to Course 4’s second assignment! In this notebook, you will:

  • Implement helper functions that you will use when implementing a TensorFlow model
  • Implement a fully functioning ConvNet using TensorFlow

After this assignment you will be able to:

  • Build and train a ConvNet in TensorFlow for a classification problem

We assume here that you are already familiar with TensorFlow. If you are not, please refer the TensorFlow Tutorial of the third week of Course 2 (“Improving deep neural networks”).

1.0 - TensorFlow model

In the previous assignment, you built helper functions using numpy to understand the mechanics behind convolutional neural networks. Most practical applications of deep learning today are built using programming frameworks, which have many built-in functions you can simply call.

As usual, we will start by loading in the packages.

import math
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage
import tensorflow as tf
from tensorflow.python.framework import ops
from cnn_utils import *

%matplotlib inline
np.random.seed(1)

Run the next cell to load the “SIGNS” dataset you are going to use.

# Loading the data (signs)
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

As a reminder, the SIGNS dataset is a collection of 6 signs representing numbers from 0 to 5.
这里写图片描述
The next cell will show you an example of a labelled image in the dataset. Feel free to change the value of index below and re-run to see different examples.

# Example of a picture
index = 6
plt.imshow(X_train_orig[index])
print ("y = " + str(np.squeeze(Y_train_orig[:, index])))
y = 2

这里写图片描述

In Course 2, you had built a fully-connected network for this dataset. But since this is an image dataset, it is more natural to apply a ConvNet to it.

To get started, let’s examine the shapes of your data.

X_train = X_train_orig/255.
X_test = X_test_orig/255.
Y_train = convert_to_one_hot(Y_train_orig, 6).T
Y_test = convert_to_one_hot(Y_test_orig, 6).T
print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
conv_layers = {
   
   }
number of training examples = 1080
number of test examples = 120
X_train shape: (1080, 64, 64, 3)
Y_train shape: (1080, 6)
X_test shape: (120, 64, 64, 3)
Y_test shape: (120, 6)

1.1 - Create placeholders

TensorFlow requires that you create placeholders for the input data that will be fed into the model when running the session.

Exercise: Implement the function below to create placeholders for the input image X and the output Y. You should not define the number of training examples for the moment. To do so, you could use “None” as the batch size, it will give you the flexibility to choose it later. Hence X should be of dimension [None, n_H0, n_W0, n_C0] and Y should be of dimension [None, n_y]. Hint.

# GRADED FUNCTION: create_placeholders

def create_placeholders(n_H0, n_W0, n_C0, n_y):
    """
    Creates the placeholders for the tensorflow session.
    
    Arguments:
    n_H0 -- scalar, height of an input image
    n_W0 -- scalar, width of an input image
    n_C0 -- scalar, number of channels of the input
    n_y -- scalar, number of classes
        
    Returns:
    X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
    Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
    """

    ### START CODE HERE ### (≈2 lines)
    X = tf.placeholder(tf.float32,shape=(None,n_H0, n_W0, n_C0))
    Y = tf.placeholder(tf.float32,shape=(None,n_y))
    ### END CODE HERE ###
    
    return X, Y
X, Y = create_placeholders(64, 64, 3, 6)
print ("X = " + str(X))
print ("Y = " + str(Y))
X = Tensor("
### 使用CNN实现多通道信号手势识别 #### 数据准备 对于基于卷积神经网络(CNN)的手势识别应用,数据准备阶段至关重要。通常情况下,会从视频或图像序列中提取关键点位置信息作为输入特征[^2]。当涉及到多通道信号时,这些额外的信息可以被编码成不同的维度加入到原始的RGB图像中形成一个多维张量。 例如,在基于骨架(Skeleton-based)的手势识别方案中,可以从每个视频片段抽取特定数量的关键帧,并记录下人体各个部位如手、肩等处的坐标值作为补充信息[^4]。这种做法不仅增加了模型可利用的空间结构线索,还可能提高对手势变化动态特性的捕捉精度。 #### 构建CNN架构 构建适合于处理此类复合型输入数据的CNN框架时,需考虑设计能够同时接收并融合来自不同源的数据流层。一种常见的方式是在早期阶段即让各路独立通过各自的卷积操作后再汇聚起来共同参与后续更深层次的学习过程;另一种则是先将所有模态拼接在一起再送入标准形式下的卷积核进行统一加工。 下面给出一段简单的PyTorch代码来展示这一思路: ```python import torch.nn as nn class MultiChannelGestureNet(nn.Module): def __init__(self, num_classes=5): # 假设有五个类别标签 super(MultiChannelGestureNet, self).__init__() # 定义针对RGB图象部分的基础卷积模块 self.rgb_conv = nn.Sequential( nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2)), nn.ReLU(inplace=True), ... ) # 对应骨骼关节坐标的线性变换或其他预处理器件 self.skel_fc = nn.Linear(input_dim_skeleton, hidden_units) # 合并与分类器连接之前的全连接层 self.fc_fusion = nn.Linear(combined_features_dims, final_hidden_layer_size) self.classifier = nn.Linear(final_hidden_layer_size, num_classes) def forward(self, rgb_input, skeleton_input): out_rgb = self.rgb_conv(rgb_input).view(-1, some_flattened_shape) out_skel = self.skel_fc(skeleton_input.view(-1, input_dim_skeleton)) combined_feature = torch.cat((out_rgb, out_skel), dim=-1) fused_output = F.relu(self.fc_fusion(combined_feature)) logits = self.classifier(fused_output) return logits ``` 此段伪代码展示了如何创建一个接受两种不同类型输入——彩色图片(`rgb_input`)以及由多个时间戳组成的姿势估计结果(`skeleton_input`)-—并通过适当方式将其结合起来传递给最终决策单元的过程。
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值