Detecting Photoshopped Faces by Scripting Photoshop笔记

本文提出一种使用深度学习模型检测Photoshop中流行的人脸扭曲操纵的方法,该模型完全通过自动生成的假图像训练而成,能预测编辑的具体位置,甚至在某些情况下还原原始未编辑图像。

在这里插入图片描述

Abstract

We present a method for detecting one very popular Photoshop manipulation – image warping applied to human faces – using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself.

We show that our model outperforms humans at the task of recognizing manipulated images, can predict the specific location of edits, and in some cases can be used to “undo” a manipulation to reconstruct the original, unedited image. We demonstrate that the system can be successfully applied to real, artist-created image manipulations.

1.Introduction

In this work, we focus on one specific type of Photoshop manipulation – image warping applied to faces.

Our proposed approach is but one tool in a larger toolbox of techniques that together, could be used to help combat the spread of misinformation, and its effects.

Our approach consists of a CNN carefully trained to detect facial warping modifications in images.

Since there are no large-scale datasets of manually created visual fakes. In this work, we solve this problem by using Photoshop itself to automatically generate realistic-looking fake training data.

1.We first collect a large dataset of real face images, from different internet sources (Figure 2a).

2.We then directly script the Face-Aware Liquify tool in Photoshop, which abstracts facial manipulations into high level semantic operations, such as “increase nose width” and “decrease eye distance”.

3.By randomly sampling manipulations in this space (Figure 2b), we are left with a training set consisting of pairs of source images and realistic looking warped modifications.

We train both global classification and local warping field prediction networks on this dataset.

In particular, our local prediction method uses a combination of loss functions including flow warping prediction, relative warp preservation, and a pixel-wise reconstruction loss.

2.Related work

Image forensics, or forgery detection is an increasingly important area of research in computer vision.

In this section, we focus on works that are either trained from large amounts of data, or directly address(处理) the face domain.

Face manipulation

  • Researchers have proposed forensics methods to detect a variety of face manipulations.
  • Zhou et al. [42] and Roessler et al. [30, 31] propose neural net-work models to detect face swapping and face reenactment
  • Other work investigates detecting morphed (interpolated) faces [29] and inconsistencies in lighting from specular highlights on the eye [16].
  • In contrast, we consider facial warps which undergo subtle geometric deformations, rather than a complete replacement of the face, or the synthesis of new details.

Learning photo forensics

  • “Self-supervised” image forensics approaches that are trained on automatically-generated fake images.
  • Chen et al. [11] use a convolutional network to detect median filtering.
  • Zhou et al. [43] propose an object detection model, specifically using steganalysis features to reduce the influence of semantics. The model is pretrained on automatically created synthetic fakes using object segmentations, and subsequently fine-tuned on actual fake images.
  • A complementary approach is exploring unsupervised forensics models that learn only from real images, without explicitly modeling the fake image creation process.
  • For example, several models have been proposed to detect spliced images by identifying patches which come from different camera models [9, 24], by using EXIF metadata [15], or by identifying physical inconsistencies.

Hand-defined manipulation cues

  • Other image forensics work has proposed to detect fake images using hand- defined cues [14].
  • Early work detected resampling artifacts [28, 20] by finding periodic correlations between nearby pixels.
  • There has also been work that detects inconsistent quantization [4], double-JPEG artifacts [8, 5], and geometric inconsistencies [26].
  • However, the operations performed by interactive image editing tools are often complex, and can be difficult to model.
  • Our approach, by contrast, learns features appropriate for its task from a large dataset of manipulated images.

3.Datasets

We obtain a large dataset of real face images from the Open Images dataset [21] and Flickr, and create two datasets of fakes: 1.A large, automatically generated set of manipulated images for training a forensics model, 2.A smaller set of actual manipulations done by an artist for evaluation.

Generating manipulated face images

  • We script the Face-Aware Liquify (FAL) tool [1] in Adobe Photoshop to generate a variety of face manipulations.
  • We script the Face-Aware Liquify (FAL) tool [1] in Adobe Photoshop to generate a variety of face manipulations, using built-in support for JavaScript execution.
  • We modify each image from our real face dataset randomly 6 times.
  • In all, the data we used for training is 1.295M faces – 185K unmodified, and 1.1M modified. Additionally, we hold out 5K real faces each from Open Images and Flickr, leaving half of the images unmodified and the rest modified in the same way as the training data.

Test Set: Artist-created face manipulations

  • We test the generalization ability to “real” manipulations by contracting a professional artist to manipulate 50 real photographs.
  • Half are manipulated with the intent of “beautifying”, or increasing attractiveness, and the other half to change facial expression, positively or negatively. This covers two important use cases.

4.Methods

Our goal is to train a system to detect facial manipulations.
We present two models:

  1. A global classification model, tasked with predicting whether a face has been warped.
  2. A local warp predictor, which can be used to identify where manipulations occur, and reverse them.

4.1 Real-or-fake classification

We first address the question “has this image been manipulated?”

We train a binary classifier using a Dilated Residual Network variant (DRN-C-26)
We investigate the effect of resolution by training low and high-resolution models.

  • High-resolution models enable preservation of low-level details, potentially useful for identifying fakes.
  • Lower-resolution model potentially contains sufficient details to identify fakes and can be trained more ef- ficiently.
  • During training, the images are randomly left-right flipped and cropped to 384 and 640 pixels, respectively.
  • Real-world use cases may contain unexpected post- processing. Forensics algorithms are often sensitive to such operations [28].—— To increase robustness, we consider more aggressive data augmentation, including resizing methods (bicubic and bilinear), JPEG compression, brightness, contrast, and saturation.
  • We experimentally find that this increases robustness to perturbations at testing, even if they are not in the augmentation set.

4.2 Predicting what moved where

Upon detecting whether a face has been modified, a natural question for a viewer is how the image was edited:

  • To do this, we predict an optical flow field Uˆ ∈ R(H×W×2) from the original image Xorig ∈ R(H×W×3)to the warped image X, which we then use to try to “reverse” the manipulation and recover the original image.
  • We train a flow prediction model F to predict the per- pixel warping field, measuring its distance to an approximate “ground-truth” flow field U for each training example.
  • To remove erroneous flow values, we discard pixels that fail a forward-backward consistency test, resulting in binary mask M ∈ R(H×W×1)

Undoing a warp

  • With the correct flow field predicted from the original image to the modified image, one can retrieve the original image by inverse warping. This leads to a natural reconstruction loss.
  • Applying only the reconstruction loss leads to ambiguities in low-texture regions, which often results in undesir-able artifacts. Instead, we jointly train with all three losses.

Architecture

  • We use a Dilated Residual Network variant (DRN-C-26) [39], pretrained on the ImageNet [32] dataset, as our base network for local prediction.
  • The DRN architecture was designed originally for semantic segmentation, and we found it to work well for the warp prediction task.
  • We found that directly training the flow regression network performed poorly. 1.Multinomial Classification.We first recast the problem into multinomial classification, commonly used in regression problems (e.g., colorization [22, 40], surface normal pre- diction [36], and generative modeling [27]), 2. Regression loss.and then finetune with a regression loss. We computed ground truth flow fields using PWC-Net [33].

5.Experiments

We evaluate our ability to detect and undo image manipulations, using both automatic and artist-created images.

5.1 Real-or-fake classification

We first investigate whether manipulated images can be detected by our global classifier on our validation set. We test the robustness of the classifier by perturbing the images, and measure its generalization ability to manipulations by a professional artist.

We evaluate several variants: (1) Low-res with aug.(2) Low-res no aug.(3) High-res with aug.

Baselines

  • FaceForensics++
  • Self-consistency

Evaluations

  • Evaluate our model’s raw accuracy
  • Ranking-based scores
  • Average Precision (AP)/Two Alternative Force Choice (2AFC)

Evaluation on auto-generated fakes

Artist test set
We collect data from a professional artist, making it more attractive, or changing the subject’s expression. Since the edits here are made to be more noticeable, and study participants were able to identify the modified image with 71.1% accuracy.

Baseline
Neither of these methods are designed for our application 这两种方法都不是为我们的应用而设计的

  • FaceForensics++ is split into three manipulation types: face swapping, “deepfakes” face replacement, and face2face reenactment FaceForensics++分为三种操作类型:面部交换、“深度伪造”面部替换和face2face重新生成

  • Self-consistency, on the other hand, is designed to detect low-level differences in image character- istics.
    另一方面,Self-consistency被设计用来检测图像特征中的低级差异。

  • generalizing to facial warping manipulations is challenging.
    将其推广到面部扭曲操作是一项挑战。

5.2 Localizing and undoing manipulations

Model variations
we ablate the loss functions
(1) Our full method: trained with endpoint error (EPE) (Eqn. 1), multiscale gradient (Eqn. 2), and reconstruction (Eqn. 3) losses.

Evaluations
(1) End Point Error (EPE)
(2) Intersection Over Union (IOU-τ)
(3) Delta Peak Signal-to-Noise Ratio (∆PSNR)

Analysis
we found that directly optimizing the reconstruction loss led to better im- age reconstructions

5.3. Out-of-distribution manipulations
While our model is trained to detect face warping ma- nipulations made by Photoshop, we also evaluate its ability to detect other kinds of image editing, and discuss its limitations.

. We apply our manipulation detection model to this video data, still able to make reasonable predictions.

We observe that our low-res model with augmentation produces more stable predictions over time than the one trained without augmentation.

Moreover, the high-res model doesn’t generalize to detect- ing such manipulations. We note that PSNR comparisons on this data are not possible, due to the addition of non- warping image details.

Social media post-processing pipeline
post-processing operations performed by Facebook (e.g., extra JPEG compression)
We note that the high-res model doesn’t generalize to such scenario, and both global and local models trained with augmentation perform better in this scenario.

Other image editing tools
We also tested our lo- cal detection model on facial warping by Facetune [2], and Snapchat Lens Studio [3].Notice that our model is able to perform reasonable recovery of the edits even if the model is not trained on these tools.

Generic Liquify filter
Warping edits that exist outside of this, such as warping applied to hair or body, cannot be detected by our method, cannot be detected by our method.
Despite this, our method can still predict with success well above chance (64.0 accuracy, 85.6 AP)

通过挖掘控制台日志来检测大规模系统问题,是一种常用的方法。日志是系统运行过程中的关键信息记录,包含了各种关键指标、事件和异常信息。通过对控制台日志进行数据挖掘和分析,可以帮助我们发现并解决系统中的大规模问题。 首先,通过挖掘控制台日志,我们可以识别系统中的关键指标。例如,我们可以追踪系统的性能数据,如CPU利用率、内存占用率和网络延迟等。如果这些指标超过了设定的阈值,就可能表示系统存在问题。此外,我们还可以分析日志中的请求和响应时间,以便发现潜在的性能问题。 其次,挖掘控制台日志可以帮助我们发现系统中的异常事件。日志中记录了系统运行过程中的各种异常现象,如错误、警告和异常崩溃等。通过分析日志中的错误码、异常信息和堆栈轨迹,我们可以快速定位和解决这些异常问题,以保证系统的正常运行。 此外,通过对控制台日志进行挖掘,我们可以得到系统的运行趋势和模式。通过分析日志中的历史数据,我们可以发现系统发生问题的规律和周期性。这有助于我们预测和预防潜在的大规模系统问题,提前采取有效的措施。 总而言之,通过挖掘控制台日志,我们可以及时发现和解决大规模系统问题,提高系统的稳定性和性能。这种方法减少了对人工排查的依赖,自动化地监测和诊断系统,提高了故障排除效率,加快了问题的解决速度。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值