Deepfake-FFDV-音视频赛题 学习笔记 Datawhale AI 夏令营

Deepfake-FFDV-音视频赛题 学习笔记

学习教程及BaseLine链接

‍‬‍‍⁠‬⁠‌‌‬‬⁠⁠‬‬‍‌⁠从零入门多模态竞赛(Deepfake攻防) - 飞书云文档 (feishu.cn)

Deepfake-FFDV-音视频赛题 baseline (kaggle.com)

赛题任务

在这个赛道中,比赛任务是判断一张人脸图像是否为Deepfake图像,并输出其为Deepfake图像的概率评分。参赛者需要开发和优化检测模型,以应对多样化的Deepfake生成技术和复杂的应用场景,从而提升Deepfake图像检测的准确性和鲁棒性。

在该次学习过程中,主要分为两部分内容:

01 音视频信息数据处理方法

1. 从视频中提取音频,分析音频信息 MLE

import moviepy.editor as mp
import librosa
import numpy as np
import cv2
import matplotlib.pyplot as plt

def generate_mel_spectrogram(video_path, n_mels=128, fmax=8000, target_size=(256, 256)):
    # 重视频中提取音频
    audio_path = 'extracted_audio.wav'
    video = mp.VideoFileClip(video_path)
    video.audio.write_audiofile(audio_path, verbose=False, logger=None)

    # 加载音频文件
    y, sr = librosa.load(audio_path)

    # 生成MEL频谱图
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=n_mels)

    # 将频谱图转换为dB单位
    S_dB = librosa.power_to_db(S, ref=np.max)

    # 归一化到0-255之间
    S_dB_normalized = cv2.normalize(S_dB, None, 0, 255, cv2.NORM_MINMAX)
    
    # 将浮点数转换为无符号8位整型
    S_dB_normalized = S_dB_normalized.astype(np.uint8)

    # 缩放到目标大小
    img_resized = cv2.resize(S_dB_normalized, target_size, interpolation=cv2.INTER_LINEAR)

    return img_resized

# 使用示例
video_path = '/kaggle/input/ffdv-phase1-sample-10k/ffdv_phase1_sample-0708/trainset/00154f42886002f8a2a6e40343617510.mp4'  # 替换为您的视频文件路径
# video_path = '/kaggle/input/ffdv-phase1-sample-10k/ffdv_phase1_sample-0708/trainset/00a118276083ab446a1723389d43ef53.mp4'
mel_spectrogram_image = generate_mel_spectrogram(video_path)
print(type(mel_spectrogram_image))
plt.imshow(mel_spectrogram_image)
plt.figure(figsize=(10, 4))

2. 使用Transformer 对图像进行处理

打开图片、绘制图片
from PIL import Image
from pathlib import Path
import matplotlib.pyplot as plt

import torch
from torchvision.transforms import v2

plt.rcParams["savefig.bbox"] = 'tight'
!wget https://mirror.coggle.club/image/tyler-swift.jpg
orig_img = Image.open('/kaggle/working/tyler-swift.jpg')
plt.axis('off')
plt.imshow(orig_img)
plt.show()
封装plot函数
import matplotlib.pyplot as plt
import torch
from torchvision.utils import draw_bounding_boxes, draw_segmentation_masks
from torchvision import tv_tensors
from torchvision.transforms.v2 import functional as F

def plot(imgs, row_title=None, **imshow_kwargs):
    if not isinstance(imgs[0], list):
        # Make a 2d grid even if there's just 1 row
        imgs = [imgs]

    num_rows = len(imgs)
    num_cols = len(imgs[0])
    _, axs = plt.subplots(nrows=num_rows, ncols=num_cols, squeeze=False)
    for row_idx, row in enumerate(imgs):
        for col_idx, img in enumerate(row):
            boxes = None
            masks = None
            if isinstance(img, tuple):
                img, target = img
                if isinstance(target, dict):
                    boxes = target.get("boxes")
                    masks = target.get("masks")
                elif isinstance(target, tv_tensors.BoundingBoxes):
                    boxes = target
                else:
                    raise ValueError(f"Unexpected target type: {
     
     type(target)}")
            img = F.to_image(img)
            if img.dtype.is_floating_point and img.min() < 0:
                # Poor man's re-normalization for the colors to be OK-ish. This
                # is useful for images coming out of Normalize()
                img -= img.min()
                img /= img.max()

            img = F.to_dtype(img, torch.uint8, scale=True)
            if boxes is not None:
                img = draw_bounding_boxes(img, boxes, colors="yellow", width=3)
            if masks is not None:
                img = draw_segmentation_masks(img, masks.to(torch.bool), colors=["green"] * masks.shape[0], alpha=.65)

            ax = axs
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值