深度学习在图像与视频处理中的应用
计算机视觉致力于理解视觉数据,视频作为一系列图像的序列,可将图像处理的深度学习知识应用于视频处理。本文将介绍利用TFHub进行目标检测,以及实时检测人脸情绪和识别视频动作的方法。
1. 利用TFHub进行目标检测
TFHub提供了强大的预训练模型,可轻松实现开箱即用的目标检测。大多数模型从头开始实现和训练具有挑战性,而TFHub中的模型在大规模的COCO图像数据集上进行了训练,适合目标检测和图像分割任务。不过,这些模型无法重新训练,因此在处理包含COCO数据集中存在的对象的图像时效果最佳。若需创建自定义目标检测器,可考虑其他策略。
你可以通过以下链接访问TFHub中所有可用的目标检测器列表: https://tfhub.dev/tensorflow/collections/object_detection/1
2. 实时检测人脸情绪
视频本质上是一系列图像,可利用图像分类知识创建深度学习驱动的视频处理管道。本部分将构建一个算法,用于实时(通过网络摄像头流)或从视频文件中检测情绪。
2.1 准备工作
- 安装外部库:执行以下命令安装OpenCV和imutils。
$> pip install opencv-contrib-python imutils
- 下载数据集:从Kaggle竞赛“Challenges in Representation Learning: Facial Expression Recognition Challenge”下载数据集,地址为 https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data 。下载后将文件放置在偏好的位置(假设为
~/.keras/datasets文件夹),解压为emotion_recognition,并解压缩fer2013.tar.gz文件。
2.2 实现步骤
- 导入依赖 :
import csv
import glob
import pathlib
import cv2
import imutils
import numpy as np
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import *
from tensorflow.keras.utils import to_categorical
- 定义情绪列表和颜色 :
EMOTIONS = ['angry', 'scared', 'happy', 'sad',
'surprised','neutral']
COLORS = {'angry': (0, 0, 255),
'scared': (0, 128, 255),
'happy': (0, 255, 255),
'sad': (255, 0, 0),
'surprised': (178, 255, 102),
'neutral': (160, 160, 160)
}
- 构建情绪分类器架构 :
def build_network(input_shape, classes):
input = Input(shape=input_shape)
x = Conv2D(filters=32,
kernel_size=(3, 3),
padding='same',
kernel_initializer='he_normal')(input)
x = ELU()(x)
x = BatchNormalization(axis=-1)(x)
x = Conv2D(filters=32,
kernel_size=(3, 3),
kernel_initializer='he_normal',
padding='same')(x)
x = ELU()(x)
x = BatchNormalization(axis=-1)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(rate=0.25)(x)
# 后续代码省略,可参考原文
return Model(input, output)
- 加载数据集 :
def load_dataset(dataset_path, classes):
train_images = []
train_labels = []
val_images = []
val_labels = []
test_images = []
test_labels = []
# 后续代码省略,可参考原文
return (train_images, train_labels), \
(val_images, val_labels), \
(test_images, test_labels)
- 定义辅助函数 :
def rectangle_area(r):
return (r[2] - r[0]) * (r[3] - r[1])
def plot_emotion(emotions_plot, emotion, probability, index):
# 代码省略,可参考原文
return emotions_plot
def plot_face(image, emotion, detection):
# 代码省略,可参考原文
return image
def predict_emotion(model, roi):
# 代码省略,可参考原文
return predictions
- 加载或训练模型 :
checkpoints = sorted(list(glob.glob('./*.h5')), reverse=True)
if len(checkpoints) > 0:
model = load_model(checkpoints[0])
else:
base_path = (pathlib.Path.home() / '.keras' /
'datasets' /
'emotion_recognition' / 'fer2013')
input_path = str(base_path / 'fer2013.csv')
classes = len(EMOTIONS)
(train_images, train_labels), \
(val_images, val_labels), \
(test_images, test_labels) = load_dataset(input_path, classes)
model = build_network((48, 48, 1), classes)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.003),
metrics=['accuracy'])
checkpoint_pattern = ('model-ep{epoch:03d}-'
'loss{loss:.3f}'
'-val_loss{val_loss:.3f}.h5')
checkpoint = ModelCheckpoint(checkpoint_pattern,
monitor='val_loss',
verbose=1,
save_best_only=True,
mode='min')
BATCH_SIZE = 128
train_augmenter = ImageDataGenerator(rotation_range=10,zoom_range=0.1,
horizontal_flip=True,
rescale=1. / 255.,
fill_mode='nearest')
train_gen = train_augmenter.flow(train_images, train_labels, batch_size=BATCH_SIZE)
train_steps = len(train_images) // BATCH_SIZE
val_augmenter = ImageDataGenerator(rescale=1. / 255.)
val_gen = val_augmenter.flow(val_images, val_labels, batch_size=BATCH_SIZE)
EPOCHS = 300
model.fit(train_gen,
steps_per_epoch=train_steps,
validation_data=val_gen,
epochs=EPOCHS,
verbose=1,
callbacks=[checkpoint])
test_augmenter = ImageDataGenerator(rescale=1. / 255.)
test_gen = test_augmenter.flow(test_images, test_labels, batch_size=BATCH_SIZE)
test_steps = len(test_images) // BATCH_SIZE
_, accuracy = model.evaluate(test_gen, steps=test_steps)
print(f'Accuracy: {accuracy * 100}%')
- 检测情绪 :
video_path = 'emotions.mp4'
camera = cv2.VideoCapture(video_path) # Pass 0 to use webcam
cascade_file = 'resources/haarcascade_frontalface_default.xml'
det = cv2.CascadeClassifier(cascade_file)
while True:
frame_exists, frame = camera.read()
if not frame_exists:
break
frame = imutils.resize(frame, width=380)
emotions_plot = np.zeros_like(frame, dtype='uint8')
copy = frame.copy()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
detections = det.detectMultiScale(gray, scaleFactor=1.1,
minNeighbors=5,
minSize=(35, 35),
flags=cv2.CASCADE_SCALE_IMAGE)
if len(detections) > 0:
detections = sorted(detections, key=rectangle_area)
best_detection = detections[-1]
(frame_x, frame_y, frame_width, frame_height) = best_detection
roi = gray[frame_y:frame_y + frame_height, frame_x:frame_x + frame_width]
predictions = predict_emotion(model, roi)
label = EMOTIONS[predictions.argmax()]
for i, (emotion, probability) in enumerate(zip(EMOTIONS, predictions)):
emotions_plot = plot_emotion(emotions_plot, emotion, probability, i)
clone = plot_face(copy, label, best_detection)
cv2.imshow('Face & emotions', np.hstack([copy, emotions_plot]))
if cv2.waitKey(1) & 0xFF == ord('q'):
break
camera.release()
cv2.destroyAllWindows()
经过300个训练周期,测试准确率达到了65.74%。
2.3 工作原理
本部分实现了一个用于视频流(内置网络摄像头或存储的视频文件)的情绪检测器。首先解析FER 2013数据(以CSV格式存储),然后在其图像上训练情绪分类器,在测试集上取得了可观的准确率。需要注意的是,面部表情难以解释,即使对于人类也是如此,而且许多表情具有相似特征。最后,将输入视频流的每一帧传递给Haar Cascade人脸检测器,使用训练好的分类器从检测到的人脸区域获取情绪。但此方法假设每一帧是独立的,而实际处理视频时,考虑时间维度可获得更稳定和更好的结果。
了解Haar Cascade分类器的更多信息,可参考: https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html
3. 流程图
graph TD
A[准备工作] --> B[导入依赖]
B --> C[定义情绪列表和颜色]
C --> D[构建情绪分类器架构]
D --> E[加载数据集]
E --> F[定义辅助函数]
F --> G[加载或训练模型]
G --> H[检测情绪]
综上所述,通过利用TFHub的强大模型和深度学习技术,我们可以实现目标检测、实时检测人脸情绪和识别视频动作等功能。这些技术在计算机视觉领域具有广泛的应用前景。
深度学习在图像与视频处理中的应用
4. 使用TensorFlow Hub进行动作识别
动作识别是深度学习在视频处理中的一个有趣应用,它不仅面临图像分类的常见挑战,还涉及时间维度。Inflated 3D Convnet(I3D)架构非常适合解决此类问题,本部分将使用TFHub上的预训练版本对各种视频中的动作进行识别。
4.1 准备工作
需要安装几个补充库,如OpenCV、TFHub和imageio,执行以下命令:
$> pip install opencv-contrib-python tensorflow-hub imageio
4.2 实现步骤
- 导入依赖 :
import os
import random
import re
import ssl
import tempfile
from urllib import request
import cv2
import imageio
import numpy as np
import tensorflow as tf
import tensorflow_hub as tfhub
from tensorflow_docs.vis import embed
- 定义数据集路径 :
UCF_ROOT = 'https://www.crcv.ucf.edu/THUMOS14/UCF101/UCF101/'
KINETICS_URL = ('https://raw.githubusercontent.com/deepmind/kinetics-i3d/master/data/label_map.txt')
- 创建临时目录和SSL上下文 :
CACHE_DIR = tempfile.mkdtemp()
UNVERIFIED_CONTEXT = ssl._create_unverified_context()
- 定义辅助函数 :
def fetch_ucf_videos():
index = (request.urlopen(UCF_ROOT, context=UNVERIFIED_CONTEXT).read().decode('utf-8'))
videos = re.findall('(v_[\w]+\.avi)', index)
return sorted(set(videos))
def fetch_kinetics_labels():
with request.urlopen(KINETICS_URL) as f:
labels = [line.decode('utf-8').strip() for line in f.readlines()]
return labels
def fetch_random_video(videos_list):
video_name = random.choice(videos_list)
cache_path = os.path.join(CACHE_DIR, video_name)
if not os.path.exists(cache_path):
url = request.urljoin(UCF_ROOT, video_name)
response = (request.urlopen(url, context=UNVERIFIED_CONTEXT).read())
with open(cache_path, 'wb') as f:
f.write(response)
return cache_path
def crop_center(frame):
height, width = frame.shape[:2]
smallest_dimension = min(width, height)
x_start = (width // 2) - (smallest_dimension // 2)
x_end = x_start + smallest_dimension
y_start = (height // 2) - (smallest_dimension // 2)
y_end = y_start + smallest_dimension
roi = frame[y_start:y_end, x_start:x_end]
return roi
def read_video(path, max_frames=32, resize=(224, 224)):
capture = cv2.VideoCapture(path)
frames = []
while len(frames) <= max_frames:
frame_read, frame = capture.read()
if not frame_read:
break
frame = crop_center(frame)
frame = cv2.resize(frame, resize)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frames.append(frame)
capture.release()
frames = np.array(frames)
return frames / 255.
def predict(model, labels, sample_video):
model_input = tf.constant(sample_video, dtype=tf.float32)
model_input = model_input[tf.newaxis, ...]
logits = model(model_input)['default'][0]
probabilities = tf.nn.softmax(logits)
print('Top 5 actions:')
for i in np.argsort(probabilities)[::-1][:5]:
print(f'{labels[i]}: {probabilities[i] * 100:5.2f}%')
def save_as_gif(images, video_name):
converted_images = np.clip(images * 255, 0, 255)
converted_images = converted_images.astype(np.uint8)
imageio.mimsave(f'./{video_name}.gif', converted_images, fps=25)
- 获取视频和标签 :
VIDEO_LIST = fetch_ucf_videos()
LABELS = fetch_kinetics_labels()
- 获取随机视频并读取帧 :
video_path = fetch_random_video(VIDEO_LIST)
sample_video = read_video(video_path)
- 加载模型并进行预测 :
model_path = 'https://tfhub.dev/deepmind/i3d-kinetics-400/1'
model = tfhub.load(model_path)
model = model.signatures['default']
predict(model, LABELS, sample_video)
video_name = video_path.rsplit('/', maxsplit=1)[1][:-4]
save_as_gif(sample_video, video_name)
4.3 工作原理
本部分使用TFHub上的I3D模型对视频中的动作进行识别。首先从UCF101数据集获取测试视频,从Kinetics数据集获取标签。然后随机选择一个视频,读取其帧并进行预处理。接着加载I3D模型,将视频帧输入模型进行预测,输出前5个最可能的动作及其概率。最后将视频帧保存为GIF文件。
5. 总结
本文介绍了利用TFHub进行目标检测,实时检测人脸情绪和使用TFHub进行动作识别的方法。这些技术展示了深度学习在图像和视频处理中的强大能力,为计算机视觉领域的应用提供了有力支持。以下是这些技术的对比表格:
| 应用场景 | 主要技术 | 数据集 | 优点 | 注意事项 |
| ---- | ---- | ---- | ---- | ---- |
| 目标检测 | TFHub模型 | COCO | 开箱即用,效果较好 | 无法重新训练,适合COCO中存在的对象 |
| 实时情绪检测 | 自定义卷积神经网络 | FER 2013 | 可实时检测,准确率可观 | 假设帧独立,未考虑时间维度 |
| 动作识别 | I3D模型(TFHub) | UCF101、Kinetics | 适合处理含时间维度的动作识别 | 需安装多个补充库 |
6. 流程图
graph TD
A[准备工作] --> B[导入依赖]
B --> C[定义数据集路径]
C --> D[创建临时目录和SSL上下文]
D --> E[定义辅助函数]
E --> F[获取视频和标签]
F --> G[获取随机视频并读取帧]
G --> H[加载模型并进行预测]
通过这些技术,我们可以在不同的视频处理任务中取得较好的效果,未来可以进一步探索如何结合时间维度信息,提高模型的性能和稳定性。
超级会员免费看

被折叠的 条评论
为什么被折叠?



