CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography
CRoSS:扩散模型使 可控、稳健且安全的图像隐写术
1 Peking University Shenzhen Graduate School 2 Peng Cheng Laboratory
1 北京大学深圳研究生院 2 彭程实验室
Abstract 抽象
Current image steganography techniques are mainly focused on cover-based methods, which commonly have the risk of leaking secret images and poor robustness against degraded container images. Inspired by recent developments in diffusion models, we discovered that two properties of diffusion models, the ability to achieve translation between two images without training, and robustness to noisy data, can be used to improve security and natural robustness in image steganography tasks. For the choice of diffusion model, we selected Stable Diffusion, a type of conditional diffusion model, and fully utilized the latest tools from open-source communities, such as LoRAs and ControlNets, to improve the controllability and diversity of container images. In summary, we propose a novel image steganography framework, named Controllable, Robust and Secure Image Steganography (CRoSS), which has significant advantages in controllability, robustness, and security compared to cover-based image steganography methods. These benefits are obtained without additional training. To our knowledge, this is the first work to introduce diffusion models to the field of image steganography. In the experimental section, we conducted detailed experiments to demonstrate the advantages of our proposed CRoSS framework in controllability, robustness, and security.
当前的图像隐写技术主要集中在基于覆盖的方法上,这些方法通常存在泄露秘密图像的风险,并且对降级的容器图像的鲁棒性较差。受到扩散模型最新发展的启发,我们发现扩散模型的两个特性,即无需训练即可实现两个图像之间的转换的能力,以及对噪声数据的鲁棒性,可用于提高图像隐写任务的安全性和自然鲁棒性。在扩散模型的选择上,我们选择了 Stable Diffusion,一种条件扩散模型,并充分利用了开源社区的最新工具,如 LoRAs 和 ControlNets,以提高容器镜像的可控性和多样性。综上所述,我们提出了一种新型的图像隐写框架,命名为 Controllable、Robust 和 Secure Image Steganography(CRoSS),与基于覆盖的图像隐写方法相比,该框架在可控性、鲁棒性和安全性方面具有显着优势。这些好处无需额外培训即可获得。据我们所知,这是第一个将扩散模型引入图像隐写术领域的工作。在实验部分,我们进行了详细的实验,以证明我们提出的 CRoSS 框架在可控性、鲁棒性和安全性方面的优势。 11The GitHub link is https://github.com/vvictoryuki/CRoSS.
1Introduction 1 介绍
With the explosive development of digital communication and AIGC (AI-generated content), the privacy, security, and protection of data have aroused significant concerns. As a widely studied technique, steganography [10] aims to hide messages like audio, image, and text into the container image in an undetected manner. In its reveal process, it is only possible for the receivers with pre-defined revealing operations to reconstruct secret information from the container image. It has a wide range of applications, such as copyright protection [4], digital watermarking [13], e-commerce [11], anti-visual detection [32], and cloud computing [74].
随着数字通信和 AIGC(人工智能生成内容)的爆发式发展,数据的隐私、安全和保护引发了人们的广泛关注。隐写术 [10] 是一种被广泛研究的技术,旨在以不被发现的方式将音频、图像和文本等信息隐藏到容器图像中。在其揭示过程中,只有具有预定义披露作的接收方才能从容器镜像中重建秘密信息。它具有广泛的应用,如版权保护 [4]、数字水印 [13]、电子商务 [11]、反视觉检测 [32] 和云计算 [74]。
For image steganography, traditional methods tend to transform the secret messages in the spatial or adaptive domains [25], such as fewer significant bits [9] or indistinguishable parts. With the development of deep neural networks, researchers begin to use auto-encoder networks [5, 6] or invertible neural networks (INN) [33, 24]to hide data, namely deep steganography.
对于图像隐写术,传统方法倾向于转换空间或自适应域中的秘密信息 [25],例如较少的有效位 [9] 或无法区分的部分。随着深度神经网络的发展,研究人员开始使用自动编码器网络 [5,6] 或可逆神经网络(INN) [33,24] 来隐藏数据,即深度隐写术。
The essential targets of image steganography are security, reconstruction quality, and robustness [9, 43, 75]. Since most previous methods use cover images to hide secret images, they tend to explicitly retain some secret information as artifacts or local details in the container image, which poses a risk of information leakage and reduces the security of transmission. Meanwhile, although previous works can maintain well reconstruction fidelity of the revealed images, they tend to train models in a noise-free simulation environment and can not withstand noise, compression artifacts, and non-linear transformations in practice, which severely hampers their practical values and robustness [28, 42, 23].
图像隐写术的基本目标是安全性、重建质量和鲁棒性[9,43,75]。 由于以往的方法大多使用封面镜像来隐藏秘密镜像,因此往往会在容器镜像中显式保留一些秘密信息作为伪影或局部细节,存在信息泄露的风险,降低了传输的安全性 。同时,尽管以往的工作能够很好地保持揭示图像的重建保真度,但它们往往在无噪声的仿真环境中训练模型,在实践中无法承受噪声、压缩伪影和非线性变换,这严重阻碍了其实用价值和鲁棒性 [28,42,23]。
To address security and robustness concerns, researchers have shifted their focus toward coverless steganography. This approach aims to create a container image that bears no relation to the secret information, thereby enhancing its security. Current coverless steganography methods frequently employ frameworks such as CycleGAN [76] and encoder-decoder models [74], leveraging the principle of cycle consistency. However, the controllability of the container images generated by existing coverless methods remains limited. Their container images are only randomly sampled from the generative model and cannot be determined by the user. Moreover, existing approaches [45] tend to only involve hiding bits into container images, ignoring the more complex hiding of secret images. Overall, current methods, whether cover-based or coverless, have not been able to achieve good unity in terms of security, controllability, and robustness. Thus, our focus is to propose a new framework that can simultaneously improve existing methods in these three aspects.
为了解决安全性和稳健性问题,研究人员已将重点转向无盖隐写术。这种方法旨在创建一个与秘密信息无关的容器映像,从而增强其安全性。目前的无盖隐写术方法经常采用 CycleGAN [76] 和编码器-解码器模型 [74] 等框架,利用循环一致性原理。然而,现有无盖方法生成的容器图像的可控性仍然有限。它们的容器图像只是从生成模型中随机采样的,用户无法确定。此外,现有的方法 [45] 往往只涉及将位隐藏到容器镜像中,而忽略了更复杂的秘密镜像隐藏。总体而言,目前的方法,无论是基于覆盖的还是无覆盖的,在安全性、可控性和鲁棒性方面都未能实现良好的统一性。因此,我们的重点是提出一个新的框架,可以同时改进这三个方面的现有方法。
Recently, research on diffusion-based generative models [20, 53, 54] has been very popular, with various unique properties such as the ability to perform many tasks in a zero-shot manner [34, 26, 62, 61, 70, 35, 18], strong control over the generation process [14, 47, 72, 38, 17, 48], natural robustness to noise in images [62, 26, 12, 63], and the ability to achieve image-to-image translation [73, 8, 18, 37, 55, 12, 27, 35]. We were pleasantly surprised to find that these properties perfectly match the goals we mentioned above for image steganography: (1) Security: By utilizing the DDIM Inversion technique [52] for diffusion-based image translation, we ensure the invertibility of the translation process. This invertible translation process enables a coverless steganography framework, ensuring the security of the hidden image. (2) Controllability: The powerful control capabilities of conditional diffusion models make the container image highly controllable, and its visual quality is guaranteed by the generative prior of the diffusion model; (3) Robustness: Diffusion models are essentially Gaussian denoisers and have natural robustness to noise and perturbations. Even if the container image is degraded during transmission, we can still reveal the main content of the secret image.
最近,基于扩散的生成模型 [20,53,54] 的研究非常流行,它具有各种独特的特性,例如能够以零样本方式 执行许多任务 [34,26,62,61,70,35,18],对生成过程 的强控制 [14,47,72,38,17,48] ,对图像 噪声的天然鲁棒性 [62,26,12,63],以及实现图像到图像 转换的能力 [73,8,18,37,55,12,27,35]。 我们惊喜地发现,这些特性完全符合我们上面提到的图像隐写术目标:(1):通过利用 DDIM 反演技术 进行基于扩散的图像翻译,我们确保了翻译过程的可逆性。这种可逆翻译过程实现了无盖隐写框架,确保隐藏图像的安全性。(2):条件扩散模型强大的控制能力,使得容器图像具有高度可控性,其视觉质量由扩散模型的生成先验保证;(3) : 扩散模型本质上是高斯降噪器,对噪声和扰动具有天然的鲁棒性。即使容器镜像在传输过程中被降级,我们仍然可以揭示秘密镜像的主要内容。
We believe that the fusion of diffusion models and image steganography is not simply a matter of mechanically combining them, but rather an elegant and instructive integration that takes into account the real concerns of image steganography. Based on these ideas, we propose the Controllable, Robust and Secure Image Steganography (CRoSS) framework, a new image steganography framework that aims to simultaneously achieve gains in security, controllability, and robustness.
我们认为,扩散模型和图像隐写术的融合不仅仅是机械地将它们组合在一起的问题,而是一种优雅而富有启发性的集成,考虑到了图像隐写术的真正问题。基于这些想法,我们提出了 Controllable、Robust 和 Secure Image Steganography(CRoSS) 框架,这是一种新的图像隐写框架,旨在同时实现安全性、可控性和鲁棒性方面的增益。
Our contributions can be summarized as follows:
我们的贡献可以总结如下:
- •
We identify the limitations of existing image steganography methods and propose a unified goal of achieving security, controllability, and robustness. We also demonstrate that the diffusion model can seamlessly integrate with image steganography to achieve these goals using diffusion-based invertible image translation technique without requiring any additional training.
我们确定了现有图像隐写方法的局限性,并提出了实现安全性、可控性和鲁棒性的统一目标。我们还证明,扩散模型可以与图像隐写术无缝集成,以使用基于扩散的可逆图像转换技术实现这些目标,而无需任何额外的训练。 - •
We propose a new image steganography framework: Controllable, Robust and Secure Image Steganography (CRoSS). To the best of our knowledge, this is the first attempt to apply the diffusion model to the field of image steganography and gain better performance.
我们提出了一种新的图像隐写术框架:可控、稳健和安全的图像隐写术(CRoSS)。据我们所知,这是首次尝试将扩散模型应用于图像隐写术领域并获得更好的性能。 - •
We leveraged the progress of the rapidly growing Stable Diffusion community to propose variants of CRoSS using prompts, LoRAs, and ControlNets, enhancing its controllability and diversity.
我们利用快速发展的稳定扩散社区的进展,使用提示、LoRA 和 ControlNets 提出了 CRoSS 的变体,增强了其可控性和多样性。 - •
We conducted comprehensive experiments focusing on the three targets of security, controllability, and robustness, demonstrating the advantages of CRoSS compared to existing methods.
我们围绕安全性、可控性和鲁棒性三个目标进行了全面的实验,展示了 CRoSS 与现有方法相比的优势。
2Related Work 阿拉伯数字 相关工作
2.1Steganography Methods
2.1 隐写术方法
Cover-based Image Steganography. Unlike cryptography, steganography aims to hide secret data in a host to produce an information container. For image steganography, a cover image is required to hide the secret image in it [5]. Traditionally, spatial-based [22, 39, 41, 44] methods utilize the Least Significant Bits (LSB), pixel value differencing (PVD) [41], histogram shifting [58], multiple bit-planes [39] and palettes [22, 40] to hide images, which may arise statistical suspicion and are vulnerable to steganalysis methods. Adaptive methods [43, 29] decompose the steganography into embedding distortion minimization and data coding, which is indistinguishable by appearance but limited in capacity. Various transform-based schemes [10, 25] including JSteg [44] and DCT steganography [19] also fail to offer high payload capacity. Recently, various deep learning-based schemes have been proposed to solve image steganography. Baluja [5] proposed the first deep-learning method to hide a full-size image into another image. Generative adversarial networks (GANs) [51] are introduced to synthesize container images. Probability map methods focus on generating various cost functions satisfying minimal-distortion embedding [43, 57]. [67] proposes a generator based on U-Net. [56] presents an adversarial scheme under distortion minimization. Three-player game methods like SteganoGAN [71] and HiDDeN [75] learn information embedding and recovery by auto-encoder architecture to adversarially resist steganalysis. Recent attempts [64] to introduce invertible neural networks (INN) into low-level inverse problems like denoising, rescaling, and colorization show impressive potential over auto-encoder, GAN [3], and other learning-based architectures. Recently, [33, 24] proposed designing the steganography model as an invertible neural network (INN) [15, 16] to perform image hiding and recovering with a single INN model.
基于封面的图像隐写术。 与密码学不同,隐写术旨在将秘密数据隐藏在主机中以生成信息容器。对于图像隐写术,需要封面图像来隐藏其中 的秘密图像 [5]。 传统上,基于空间的 [22,39,41,44] 方法利用最低有效位(LSB)、像素值差分(PVD) [41]、直方图移位 [58]、多位平面 [39] 和调色板 [22,40] 来隐藏图像,这些图像可能会引起统计怀疑,并且容易受到隐写分析方法的影响。自适应方法 [43,29] 将隐写术分解为嵌入失真最小化和数据编码,这在外观上无法区分,但容量有限。各种基于变换的方案 [10,25],包括 JSteg [44] 和 DCT 隐写术 [19],也无法提供高有效载荷能力。近年来,人们提出了各种基于深度学习的方案来解决图像隐写术问题。Baluja [5] 提出了第一个将全尺寸图像隐藏到另一张图像中的深度学习方法。引入生成对抗网络(GAN) [51] 来合成容器镜像。概率图方法侧重于生成满足最小失真嵌入 的各种成本函数 [43,57]。 [67] 提出了一种基于 U-Net 的生成器。 [56] 提出了一种失真最小化下的对抗方案。SteganoGAN [71] 和 HiDDeN [75] 等三人游戏方法通过自动编码器架构学习信息嵌入和恢复,以对抗隐写分析。最近尝试 [64] 将可逆神经网络(INN)引入低级逆问题,如去噪、重新缩放和着色,与自动编码器、GAN [3] 和其他基于学习的架构相比,显示出令人印象深刻的潜力。最近,[33, 24] 建议将隐写术模型设计为可逆神经网络(INN) [15,16],以使用单个 INN 模型进行图像隐藏和恢复。
Coverless Steganography. Coverless steganography is an emerging technique in the field of information hiding, which aims to embed secret information within a medium without modifying the cover object [45]. Unlike traditional steganography methods that require a cover medium (e.g., an image or audio file) to be altered for hiding information, coverless steganography seeks to achieve secure communication without introducing any changes to the cover object [31]. This makes it more challenging for adversaries to detect the presence of hidden data, as there are no observable changes in the medium’s properties [36]. To the best of our knowledge, existing coverless steganography methods [32] still focus on hiding bits into container images, and few explorations involve hiding images without resorting to cover images.
无盖隐写术。 无掩体隐写术是信息隐藏领域的一种新兴技术,其目的是在不修改掩体对象的情况下将秘密信息嵌入介质中[45]。 与需要改变覆盖介质(例如图像或音频文件)以隐藏信息的传统隐写方法不同,无覆盖隐写术寻求在不对覆盖对象进行任何更改的情况下实现安全通信 [31]。 这使得攻击者检测隐藏数据的存在更具挑战性,因为介质的属性没有可观察到的变化 [36]。 据我们所知,现有的无盖隐写术方法 [32] 仍然侧重于将位隐藏到容器图像中,很少有探索涉及在不求助于掩护图像的情况下隐藏图像。
2.2Diffusion Models
2.2 扩散模型
Diffusion models [20, 53, 54] are a type of generative model that is trained to learn the target image distribution from a noise distribution. Recently, due to their powerful generative capabilities, diffusion models have been widely used in various image applications, including image generation [14, 46, 49, 47], restoration [50, 26, 62], translation [12, 27, 35, 73], and more. Large-scale diffusion model communities have also emerged on the Internet, with the aim of promoting the development of AIGC(AI-generated content)-related fields by applying the latest advanced techniques.
扩散模型 [20,53,54] 是一种生成模型,经过训练以从噪声分布中学习目标图像分布。近年来,扩散模型因其强大的生成能力,被广泛应用于各种图像应用,包括图像生成 [14,46,49,47], 恢复 [50,26,62],翻译 [12,27,35,73],等等。互联网上也出现了大规模的扩散模型社区,旨在通过应用最新的先进技术来推动 AIGC(AI-generated content)相关领域的发展。
In these communities, the Stable Diffusion [47] community is currently one of the most popular and thriving ones, with a large number of open-source tools available for free, including model checkpoints finetuned on various specialized datasets. Additionally, various LoRAs [21] and ControlNets [72] are available in these communities for efficient control over the results generated by Stable Diffusion. LoRAs achieve control by efficiently modifying some network parameters in a low-rank way, while ControlNets introduce an additional network to modify the intermediate features of Stable Diffusion for control. These mentioned recent developments have enhanced our CRoSS framework.
在这些社区中,Stable Diffusion [47] 社区是目前最受欢迎和最繁荣的社区之一,有大量免费提供的开源工具,包括在各种专业数据集上微调的模型检查点。此外,这些社区中还提供了各种 LoRA [21] 和 ControlNet[ 72],以有效控制稳定扩散生成的结果。LoRAs 通过以低秩的方式高效修改一些网络参数来实现控制,而 ControlNets 则引入了一个额外的网络来修改 Stable Diffusion 的中间特征进行控制。这些提到的最新发展增强了我们的 CRoSS 框架。
3Method 3、方法

Figure 1:Illustration used to show the definition of image steganography.
图 1: 插图用于显示图像隐写术的定义。
3.1Definition of Image Steganography
3.1 图像隐写术的定义
Before introducing our specific method, we first define the image steganography task as consisting of three images and two processes (as shown in Fig. 1): the three images refer to the secret image 𝐱sec, container image 𝐱cont, and revealed image 𝐱rev, while the two processes are the hide process and reveal process. The secret image 𝐱sec is the image we want to hide and is hidden in the container image 𝐱cont through the hide process. After transmission over the Internet, the container image 𝐱cont may become degraded, resulting in a degraded container image 𝐱cont′, from which we extract the revealed image 𝐱rev through the reveal process. Our goal is to make our proposed framework have the following properties: (1) Security: even if the container image 𝐱cont is intercepted by other receivers, the hidden secret image 𝐱sec cannot be leaked. (2) Controllability: the content in the container image 𝐱cont can be controlled by the user, and its visual quality is high. (3) Robustness: the reveal process can still generate semantically consistent results (𝐱rev≈𝐱sec) even if there is deviation in the 𝐱cont′ compared to the 𝐱cont (𝐱cont′=d(𝐱cont), d(⋅) denotes the degradation process). According to the above definition, we can consider the hide process as a translation between the secret image 𝐱sec and the container image 𝐱cont, and the reveal process as the inverse process of the hide process. In Sec. 3.2, we will introduce how to use diffusion models to implement these ideas, and in Sec. 3.3, we will provide a detailed description of our proposed framework CRoSS for coverless image steganography.
在介绍我们的具体方法之前,我们首先将图像隐写任务定义为由三个图像和两个过程组成(如图 1 所示):三个图像是指秘密图像 𝐱sec 、容器图像 𝐱cont 和揭示图像 𝐱rev ,而这两个过程是隐藏过程和揭示过程。秘密镜像 𝐱sec 是我们想要隐藏的镜像,通过隐藏过程隐藏在容器镜像 𝐱cont 中。通过互联网传输后,容器镜像 𝐱cont 可能会降级,从而导致容器镜像 𝐱cont′ 降级,我们通过揭示过程从中提取显示的镜像 𝐱rev 。我们的目标是使我们提出的框架具有以下属性:(1) 安全性 :即使容器镜像 𝐱cont 被其他接收者拦截,隐藏的秘密镜像 𝐱sec 也不会泄露。(2) 可控性 :容器图像 𝐱cont 中的内容可以由用户控制,其视觉质量高。(3) 鲁棒性 :即使与( 𝐱cont′ 𝐱cont 𝐱cont′=d(𝐱cont) d(⋅) 相比存在偏差,揭示过程仍然可以产生语义上一致的结果( 𝐱rev≈𝐱sec ),表示降解过程。根据上述定义,我们可以将隐藏过程视为秘密镜像 𝐱sec 和容器镜像 𝐱cont 之间的转换,将揭示过程视为隐藏过程的逆过程。在第 3.2 节 中,我们将介绍如何使用扩散模型来实现这些想法,在第 3.3 节 中,我们将详细描述我们提出的用于无盖图像隐写术的框架 CRoSS。
3.2Invertible Image Translation using Diffusion Model
3.2 使用扩散模型进行可逆图像转换
Diffusion Model Defined by DDIM.
由 FREE 定义的扩散模型。
A complete diffusion model process consists of two stages: the forward phase adds noise to a clean image, while the backward sampling phase denoises it step by step. In DDIM [52], the formula for the forward process is given by:
一个完整的扩散模型过程包括两个阶段:正向阶段为干净的图像添加噪声,而后向采样阶段逐步对其进行降噪。在 DDIM [52] 中,正向过程的公式由下式给出:
| 𝐱t=αt𝐱t−1+1−αtϵ,ϵ∼𝒩(𝟎,𝐈), | (1) |
where 𝐱t denotes the noisy image in the t-th step, ϵ denotes the randomly sampled Gaussian noise, αt is a predefined parameter and the range of time step t is [1,T]. The formula of DDIM for the backward sampling process is given by:
其中 𝐱t 表示 t 第-步中的噪声图像, ϵ 表示随机采样的高斯噪声, αt 是一个预定义的参数,时间步长 t 范围为 [1,T] 。反向抽样过程的 DDIM 公式由下式给出:
| 𝐱s=α¯s𝐟𝜽(𝐱t,t)+1−α¯s−σs2ϵ𝜽(𝐱t,t)+σsϵ,𝐟𝜽(𝐱t,t)=𝐱t−1−α¯tϵ𝜽(𝐱t,t)α¯t, | (2) |
where ϵ∼𝒩(𝟎,𝐈) is a randomly sampled Gaussian noise with σs2 as the noise variance, 𝐟𝜽(⋅,t) is a denoising function based on the pre-trained noise estimator ϵ𝜽(⋅,t), and α¯t=∏i=1tαi. DDIM does not require the two steps in its sampling formula to be adjacent (i.e., t=s+1). Therefore, s and t can be any two steps that satisfy s<t. This makes DDIM a popular algorithm for accelerating sampling. Furthermore, if σs in Eq.2 is set to 0, the DDIM sampling process becomes deterministic. In this case, the sampling result is solely determined by the initial value 𝐱T, which can be considered as a latent code. The sampling process can also be equivalently described as solving an Ordinary Differential Equation (ODE) using an ODE solver [52]. In our work, we choose deterministic DDIM to implement the diffusion model and use the following formula:
其中 ϵ∼𝒩(𝟎,𝐈) 是随机采样的高斯噪声 σs2 ,作为噪声方差, 𝐟𝜽(⋅,t) 是基于预训练噪声估计器的 ϵ𝜽(⋅,t) 去噪函数,并且 α¯t=∏i=1tαi 。DDIM 不要求其抽样公式中的两个步骤相邻(即 t=s+1 )。因此, s 并且 t 可以是满足 s<t 的任意两个步骤。这使得 DDIM 成为一种流行的加速采样算法。此外,如果在等式 2 中设置为 0 ,则 σs DDIM 采样过程将变得确定性。在这种情况下,采样结果仅由初始值 𝐱T 决定,可以将其视为潜在代码。采样过程也可以等效地描述为使用常微分求解器 求解常微分方程(ODE)[52]。在我们的工作中,我们选择确定性 DDIM 来实现扩散模型,并使用以下公式:
| 𝐱0=ODESolve(𝐱T;ϵ𝜽,T,0) | (3) |
to represent the process of sampling from 𝐱T to 𝐱0 using a pretrained noise estimator ϵ𝜽.
表示从 𝐱T 采样到 𝐱0 使用预训练噪声估计器的 ϵ𝜽 过程。

Figure 2:In part (a), Conditional diffusion models can be used with different conditions to perform image translation. In this example, we use two different prompts (“cat" and “tiger") to translate a cat image into a tiger image. However, a critical challenge for coverless image steganography is whether we can reveal the original image from the translated image. The answer is yes, and we can use DDIM Inversion (shown in part (b)) to achieve dual-direction translation between the image distribution and noise distribution, allowing for invertible image translation.
图 2: 在(a)部分中,条件扩散模型可以在不同的条件下使用来进行图像平移。在此示例中,我们使用两个不同的提示(“猫”和“老虎”)将猫图像转换为老虎图像。然而,无盖图像隐写术的一个关键挑战是我们是否可以从翻译后的图像中揭示原始图像。答案是肯定的,我们可以使用 DDIM 反演(如(b)部分所示)来实现图像分布和噪声分布之间的双向平移,从而实现可逆的图像平移。
Image Translation using Diffusion Model.
使用扩散模型进行图像转换。
A large number of image translation methods [73, 8, 18, 37, 55, 12, 27, 35] based on diffusion models have been proposed. In our method, we will adopt a simple approach. First, we assume that the diffusion models used in our work are all conditional diffusion models that support condition 𝐜 as input to control the generated results. Taking the example shown in Fig. 2 (a), suppose we want to transform an image of a cat into an image of a tiger. We add noise to the cat image using the forward process (Eq. 1) to obtain the intermediate noise, and then control the backward sampling process (Eq. 2) from noise by inputting a condition (prompt=“tiger”), resulting in a new tiger image. In general, if the sampling condition is set to 𝐜, our conditional sampling process can be expressed based on Eq. 3 as follows:
提出了大量基于扩散模型的图像翻译方法 [73、8、18、37、55、12、27、35]。 在我们的方法中,我们将采用一种简单的方法。首先,我们假设我们工作中使用的扩散模型都是条件扩散模型,支持条件 𝐜 作为输入来控制生成的结果。以图 2 (a) 所示的示例为例,假设我们要将猫的图像转换为老虎的图像。我们使用正向过程(式 1)向猫图像添加噪声以获得中间噪声,然后通过输入条件(prompt=“tiger”)从噪声中控制向后采样过程(式 2),从而得到新的老虎图像。一般来说,如果采样条件设置为 𝐜 ,我们的条件抽样过程可以基于式 3 表示如下:
| 𝐱0=ODESolve(𝐱T;ϵ𝜽,𝐜,T,0). | (4) |
For image translation, there are two properties that need to be considered: the structural consistency of the two images before and after the translation, and whether the translation process is invertible. Structural consistency is crucial for most applications related to image translation, but for coverless image steganography, ensuring the invertibility of the translation process is the more important goal. To achieve invertible image translation, we utilize DDIM Inversion based on deterministic DDIM.
对于图像翻译,需要考虑两个属性:翻译前后两张图像的结构一致性,以及翻译过程是否可逆。结构一致性对于大多数与图像翻译相关的应用至关重要,但对于无盖图像隐写术来说,确保翻译过程的可逆性是更重要的目标。为了实现可逆图像转换,我们利用了基于确定性 DDIM 的 DDIM 反演。
DDIM Inversion Makes an Invertible Image Translation.
DDIM 反演进行可逆图像平移。
DDIM Inversion (shown in Fig. 2 (b)), as the name implies, refers to the process of using DDIM to achieve the conversion from an image to a latent noise and back to the original image. The idea is based on the approximation of forward and backward differentials in solving ordinary differential equations [52, 27]. Intuitively, in the case of deterministic DDIM, it allows s and t in Eq. 2 to be any two steps (i.e., allowing s<t and s>t). When s<t, Eq. 2 performs the backward process, and when s>t, Eq. 2 performs the forward process. As the trajectories of forward and backward processes are similar, the input and output images are very close, and the intermediate noise 𝐱T can be considered as the latent variable of the inversion. In our work, we use the following formulas:
DDIM 反演(如图 2(b)所示),顾名思义,是指使用 DDIM 实现从图像到潜在噪声再转换回原始图像的过程。这个想法是基于求解常微分方程 时正向和后向微分的近似[52,27]。 直观地说,在确定性 DDIM 的情况下,它允许 s 等 式 2 中的 和 t 是任意两个步骤(即允许 s<t 和 s>t )。当 s<t 式 2 执行向后处理时,当 s>t 式 2 执行向前处理时。由于前向和后退过程的轨迹相似,输入和输出图像非常接近,中间噪声 𝐱T 可以被认为是反演的潜在变量。在我们的工作中,我们使用以下公式:
| 𝐱T=ODESolve(𝐱0;ϵ𝜽,𝐜,0,T),𝐱0′=ODESolve(𝐱T;ϵ𝜽,𝐜,T,0), | (5) |
to represent the DDIM Inversion process from the original image 𝐱0 to the latent code 𝐱T and from the latent code 𝐱T back to the original image 𝐱0 (the output image is denoted as 𝐱0′ and 𝐱0′≈𝐱0). Based on DDIM Inversion, we have achieved the invertible relationship between images and latent noises. As long as we use deterministic DDIM to construct the image translation framework, the entire framework can achieve invertibility with two DDIM Inversion loops. It is the basis of our coverless image steganography framework, which will be described in detail in the next subsection.
表示从原始图像 𝐱0 到潜在代码 𝐱T 以及从潜在代码 𝐱T 返回原始图像 𝐱0 的 DDIM 反转过程(输出图像表示为 𝐱0′ 和 𝐱0′≈𝐱0 )。基于 DDIM 反演,实现了图像与潜噪声之间的可逆关系。只要我们使用确定性 DDIM 来构建图像翻译框架,整个框架就可以通过两个 DDIM Inversion 循环实现可逆性。它是我们的无盖图像隐写框架的基础,将在下一小节中详细描述。

Figure 3:Our coverless image steganography framework CRoSS. The diffusion model we choose is a conditional diffusion model, which supports conditional inputs to control the generation results. We choose the deterministic DDIM as the sampling strategy and use the two different conditions (𝐤pri and 𝐤pub) given to the model as the private key and the public key.
图 3: 我们的无盖图像隐写术框架 CRoSS。我们选择的扩散模型是条件扩散模型,它支持条件输入来控制生成结果。我们选择确定性 DDIM 作为采样策略,并使用给模型的两个不同条件( 𝐤pri 和 𝐤pub )作为私钥和公钥。
3.3The Coverless Image Steganography Framework CRoSS
3.3 无盖图像隐写术框架 CRoSS
Algorithm 1 The Hide Process of CRoSS.
算法 1 CRoSS 的隐藏过程。
输入: 将隐藏的秘密图像 𝐱sec ,一个带有噪声估计器的 ϵ𝜽 预训练条件扩散模型,采样的时间步长数 T 和两个不同的条件 𝐤pri 𝐤pub ,它们用作私钥和公钥。
输出: 用于隐藏秘密映像 𝐱sec 的容器映像 𝐱cont 。
返回 𝐱cont
Algorithm 2 The Reveal Process of CRoSS.
算法 2 CRoSS 的揭示过程。
输入: 已通过互联网传输的容器镜像 𝐱cont′ (可能降级为 𝐱cont )、带有噪声估计器的 ϵ𝜽 预训练条件扩散模型、采样的时间步长数 T 、私钥 𝐤pri 和公钥 𝐤pub 。
输出: 显示的图像 𝐱rev 。
返回 𝐱rev
Our basic framework CRoSS is based on a conditional diffusion model, whose noise estimator is represented by ϵ𝜽, and two different conditions that serve as inputs to the diffusion model. In our work, these two conditions can serve as the private key and public key (denoted as 𝐤pri and 𝐤pub), as shown in Fig.3, with detailed workflow described in Algo.1 and Algo. 2. We will introduce the entire CRoSS framework in two parts: the hide process and the reveal process.
我们的基本框架 CRoSS 基于条件扩散模型,其噪声估计器由 ϵ𝜽 表示,以及作为扩散模型输入的两个不同条件。在我们的工作中,这两个条件可以作为私钥和公钥(表示为 𝐤pri 和 𝐤pub ),如图所示。3,详细的工作流程在 Algo 中描述。1 和算法。 2.我们将分两部分介绍整个 CRoSS 框架:隐藏过程和揭示过程。
The Process of Hide Stage.
隐藏阶段的过程。
In the hide stage, we attempt to perform translation between the secret image 𝐱sec and the container image 𝐱cont using the forward and backward processes of deterministic DDIM. In order to make the images before and after the translation different, we use the pre-trained conditional diffusion model with different conditions in the forward and backward processes respectively. These two different conditions also serve as private and public keys in the CRoSS framework. Specifically, the private key 𝐤pri is used for the forward process, while the public key 𝐤pub is used for the backward process. After getting the container image 𝐱cont, it will be transmitted over the Internet and publicly accessible to all potential receivers.
在隐藏阶段,我们尝试使用确定性 DDIM 的正向和后向过程在秘密镜像 𝐱sec 和容器镜像 𝐱cont 之间进行转换。为了使平移前后的图像不同,我们分别在前向和后向过程中使用了不同条件的预训练条件扩散模型。这两个不同的条件在 CRoSS 框架中也充当私钥和公钥。具体来说,私钥 𝐤pri 用于正向过程,而公密钥 𝐤pub 用于后向过程。获得容器映像 𝐱cont 后,它将通过互联网传输,并可供所有潜在的接收者公开访问。
The Roles of the Private and Public Keys in Our CRoSS Framework.
私钥和公钥在我们的 CRoSS 框架中的作用。
In CRoSS, we found that these given conditions can act as keys in practical use. The private key is used to describe the content in the secret image, while the public key is used to control the content in the container image. For the public key, it is associated with the content in the container image, so even if it is not manually transmitted over the network, the receiver can guess it based on the received container image (described in Scenario#2 of Fig. 4). For the private key, it determines whether the receiver can successfully reveal the original image, so it cannot be transmitted over public channels.
在 CRoSS 中,我们发现这些给定的条件在实际使用中可以充当关键。私有密钥用于描述秘密镜像中的内容,而公钥则用于控制容器镜像中的内容。对于公密钥,它与容器镜像中的内容相关联,因此即使它不是通过网络手动传输的,接收者也可以根据接收到的容器镜像进行猜测(如图 2 的场景 #2 中描述)。4).对于私钥,它决定了接收方是否能够成功揭示原始图像,因此不能通过公共渠道传输。
The Process of Reveal Stage.
揭示阶段的过程。
In the reveal stage, assuming that the container image has been transmitted over the Internet and may have been damaged as 𝐱cont′, the receiver needs to reveal it back to the secret image through the same forward and backward process using the same conditional diffusion model with corresponding keys. Throughout the entire coverless image steganography process, we do not train or fine-tune the diffusion models specifically for image steganography tasks but rely on the inherent invertible image translation guaranteed by the DDIM Inversion.
在揭示阶段,假设容器镜像已经通过互联网传输,并且可能已经被损坏, 𝐱cont′ 接收方需要使用相同的条件扩散模型和相应的密钥,通过相同的正向和反向过程将其揭示回秘密图像。在整个无盖图像隐写过程中,我们不会专门为图像隐写任务训练或微调扩散模型,而是依赖于 DDIM 反演保证的固有可逆图像转换。

Figure 4:Further explanation of the CRoSS framework. We simulated the possible problems that a receiver may encounter in three different scenarios during the reveal process.
图 4:CRoSS 框架的进一步解释。我们模拟了接收者在揭示过程中在三种不同场景中可能遇到的问题。
The Security Guaranteed by CRoSS.
CRoSS 保证的安全性。
Some questions about security may be raised, such as: What if the private key is guessed by the receivers? Does the container image imply the possible hidden secret image? We clarify these questions from two aspects: (1) Since the revealed image is generated by the diffusion model, the visual quality of the revealed image is relatively high regardless of whether the input private key is correct or not. The receiver may guess the private key by exhaustive method, but it is impossible to judge which revealed image is the true secret image from a pile of candidate revealed images (described in Scenario#1 of Fig. 4). (2) Since the container image is also generated by the diffusion model, its visual quality is guaranteed by the generative prior of the diffusion model. Moreover, unlike cover-based methods that explicitly store clues in the container image, the container image in CRoSS does not contain any clues that can be detected or used to extract the secret image. Therefore, it is hard for the receiver to discover that the container image hides other images or to reveal the secret image using some detection method (described in Scenario#3 of Fig. 4).
可能会提出一些关于安全性的问题,例如:如果私钥被接收者猜到怎么办?容器镜像是否暗示可能隐藏的机密镜像?我们从两个方面澄清这些问题:(1)由于揭示的图像是由扩散模型生成的,因此无论输入的私钥是否正确,揭示图像的视觉质量都相对较高。接收者可以通过详尽的方法猜测私钥,但无法从一堆候选暴露图像中判断哪个暴露的图像是真正的秘密图像(如图 1 的场景 #1 所述)。(2)由于容器图像也是由扩散模型生成的,因此其视觉质量由扩散模型的生成先验来保证。此外,与在容器镜像中显式存储线索的基于掩护的方法不同,CRoSS 中的容器镜像不包含任何可以检测或用于提取秘密镜像的线索。因此,接收方很难发现容器镜像隐藏了其他镜像,也很难使用某种检测方法(如图 4 的场景#3 所述)来揭示秘密镜像。
Various Variants for Public and Private Keys.
公钥和私钥的各种变体。
Our proposed CRoSS relies on pre-trained conditional diffusion models with different conditions 𝐤pub,𝐤pri and these conditions serve as keys in the CRoSS framework. In practical applications, we can distinguish different types of conditions of diffusion models in various ways. Here are some examples: (1) Prompts: using the same checkpoint of text-to-image diffusion models like Stable Diffusion [47] but different prompts as input conditions; (2) LoRAs [21]: using the same checkpoint initialization, but loading different LoRAs; (3) ControlNets [72]: loading the same checkpoint but using ControlNet with different conditions.
我们提出的 CRoSS 依赖于具有不同条件 𝐤pub,𝐤pri 的预训练条件扩散模型,这些条件是 CRoSS 框架中的关键。在实际应用中,我们可以用各种方式区分扩散模型的不同类型的条件。以下是一些示例:(1) 提示 :使用相同的文本到图像扩散模型的检查点,如稳定扩散 [47],但不同的提示作为输入条件;(2)LoRAs [21]:使用相同的 checkpoint 初始化,但加载不同的 LoRA;(3)ControlNets [72]:加载相同的检查点,但使用不同条件的 ControlNet。
4Experiment 4、实验
4.1Implementation Details
4.1 实施细节
Experimental Settings. In our experiment, we chose Stable Diffusion [47] v1.5 as the conditional diffusion model, and we used the deterministic DDIM [52] sampling algorithm. Both the forward and backward processes consisted of 50 steps. To achieve invertible image translation, we set the guidance scale of Stable Diffusion to 1. For the given conditions, which serve as the private and public keys, we had three options: prompts, conditions for ControlNets [72] (depth maps, scribbles, segmentation maps), and LoRAs [21]. All experiments were conducted on a GeForce RTX 3090 GPU card, and our method did not require any additional training or fine-tuning for the diffusion model. The methods we compared include RIIS [66], HiNet [24], Baluja [6], and ISN [33].
实验设置。 在我们的实验中,我们选择了 Stable Diffusion [47] v1.5 作为条件扩散模型,并使用了确定性 DDIM [52] 采样算法。前进和后退过程都由 50 个步骤组成。为了实现可逆图像平移,我们将稳定扩散的引导尺度设置为 1 。对于作为私钥和公钥的给定条件,我们有三个选项:提示、控制网 的条件[72](深度图、涂鸦图、分割图)和 LoRAs [21]。 所有实验都是在 GeForce RTX 3090 GPU 卡上进行的,我们的方法不需要对扩散模型进行任何额外的训练或微调。我们比较的方法包括 RIIS [66]、HiNet [24]、Baluja [6] 和 ISN [33]。
Data Preparation. To perform a quantitative and qualitative analysis of our method, we collect a benchmark with a total of 260 images and generate corresponding prompt keys specifically tailored for the coverless image steganography, dubbed Stego260. We categorize the dataset into three classes, namely humans, animals, and general objects (such as architecture, plants, food, furniture, etc.). The images in the dataset are sourced from publicly available datasets [1, 2] and Google search engines. For generating prompt keys, we utilize BLIP [30] to generate private keys and employ ChatGPT or artificial adjustment to perform semantic modifications and produce public keys in batches. More details about the dataset can be found in the supplementary material.
数据准备。 为了对我们的方法进行定量和定性分析,我们收集了一个总共包含 260 张图像的基准,并生成专门为无盖图像隐写术(称为 Stego260)量身定制的相应提示键。我们将数据集分为三类,即人类、动物和一般物体(如建筑、植物、食品、家具等)。数据集中的图像来自公开可用的数据集 [1,2] 和谷歌搜索引擎。对于生成提示密钥,我们利用 BLIP [30] 生成私钥,并采用 ChatGPT 或人工调整进行语义修改,批量生成公钥。有关数据集的更多详细信息,请参阅补充材料。
4.2Property Study#1: Security
4.2 房产研究#1:安全


Figure 5:Deep steganalysis results by the latest SID [59]. As the number of leaked samples increases, methods whose detection accuracy curves grow more slowly and approach 50% exhibit higher security. The right is the recall curve of different methods under the StegExpose [7] detector. The closer the area enclosed by the curve and the coordinate axis is to 0.5, the closer the method is to the ideal evasion of the detector.
图 5: 最新 SID 的深度隐写分析结果[59]。 随着泄漏样品数量的增加,检测精度曲线增长较慢且接近 50% 的方法表现出更高的安全性。右图是 StegExpose [7] 检测器下不同方法的召回率曲线。曲线和坐标轴包围的面积越接近 0.5,该方法越接近探测器的理想规避。
| Methods 方法 | NIQE ↓ 尼克 ↓ | |Detection Accuracy - 50| ↓ | 检测精度 - 50 | ↓ | ||
| XuNet [65] 旭网 [65] | YedroudjNet [68] 耶德鲁吉网 [68] | KeNet [69] 科内特 [69] | ||
| Baluja [6] 巴鲁贾 [6] | 3.43±0.08 | 45.18±1.69 | 43.12±2.18 | 46.88±2.37 |
| ISN [33] 国际标准网 [33] | 2.87±0.02 | 5.14±0.44 | 3.01±0.29 | 8.62±1.19 |
| HiNet [24] | 2.94±0.02 | 5.29±0.44 | 3.12±0.36 | 8.33±1.22 |
| RIIS [66] 国际研究报告 [66] | 3.13±0.05 | 0.73±0.13 | 0.24±0.08 | 4.88±1.15 |
| CRoSS (ours) CRoSS(我们的) | 3.04 | 1.32 | 0.18 | 2.11 |
Table 1:Security analysis. NIQE indicates the visual quality of container images, lower is better. The closer the detection rate of a method approximates 50%, the more secure the method is considered, as it suggests its output is indistinguishable from random chance. The best results are red and the second-best results are blue.
表 1: 安全分析。NIQE 表示容器图像的视觉质量,越低越好。方法的检测率越接近 50% ,则该方法被认为越安全,因为它表明其输出与随机机会没有区别。最好的结果是红色的,第二好的结果是蓝色的。
In Fig. 5, the recent learning-based steganalysis method Size-Independent-Detector (SID) [59] is retrained with leaked samples from testing results of various methods on Stego260. The detection accuracy of CRoSS increases more gradually as the number of leaked samples rises, compared to other methods. The recall curves on the right also reveal the lower detection accuracy of our CRoSS, indicating superior anti-steganalysis performance.
在图中。 图 5,使用 Stego260 上各种方法测试结果的泄漏样本对最近基于学习的隐写分析方法尺寸无关检测器(SID) [59] 进行了重新训练。与其他方法相比,随着泄漏样品数量的增加,CRoSS 的检测精度逐渐提高。右侧的召回率曲线也显示了我们的 CRoSS 的检测准确度较低,表明具有优越的抗隐写分析性能。
Our security encompasses two aspects: imperceptibility in visual quality against human suspicion and resilience against steganalysis attacks. NIQE is a no-reference image quality assessment (IQA) model to measure the naturalness and visual security without any reference image or human feedback. In Tab. 1, the lower the NIQE score, the less likely it is for the human eye to identify the image as a potentially generated container for hiding secret information. Our NIQE is close to those of other methods, as well as the original input image (2.85), making it difficult to discern with human suspicion. Anti-analysis security is evaluated by three steganalysis models XuNet[65], YedroudjNet[68], and KeNet[69], for which lower detection accuracy denotes higher security. Our CRoSS demonstrates the highest or near-highest resistance against various steganalysis methods.
我们的安全性包括两个方面:视觉质量对人类怀疑的不易察觉性和对隐写分析攻击的恢复能力。NIQE 是一种无参考图像质量评估 (IQA) 模型,无需任何参考图像或人工反馈即可测量自然性和视觉安全性。在表 1 中,NIQE 分数越低,人眼就越不可能将图像识别为可能生成的用于隐藏秘密信息的容器。我们的 NIQE 与其他方法的 NIQE 以及原始输入图像 (2.85) 接近,因此很难用人类怀疑来辨别。通过 XuNet[65]、YedroudjNet[68] 和 KeNet[69] 三种隐写分析模型评估了反分析安全性,其中检测精度越低表示安全性越高。我们的 CRoSS 对各种隐写分析方法表现出最高或接近最高的耐受性。

Figure 6:Visual results of the proposed CRoSS controlled by different prompts. The container images are realistic and the revealed images have well semantic consistency with the secret images.
图 6: 由不同提示控制的所提出的 CRoSS 的可视化结果。容器镜像真实,显示的镜像与秘密镜像具有良好的语义一致性。

Figure 7:Visual results of our CRoSS controlled by different ControlNets and LoRAs. Depth maps, scribbles, and segmentation maps are presented in the lower right corner of the images.
图 7: 由不同 ControlNet 和 LoRA 控制的 CRoSS 的视觉结果。深度图、涂鸦图和分割图显示在图像的右下角。

Figure 8:Visual comparisons of our CRoSS and other methods [66, 24] under two real-world degradations, namely “WeChat” and “Shoot”. Obviously, our method can reconstruct the content of secret images, while other methods exhibit significant color distortion or have completely failed.
图 8: 我们的 CRoSS 与其他方法 [66,24] 在两种现实世界的降级(即“微信”和“拍摄”)下的视觉比较。显然,我们的方法可以重建秘密图像的内容,而其他方法则表现出明显的颜色失真或完全失败。
4.3Property Study#2: Controllability
4.3 属性研究#2:可控性
To verify the controllability and flexibility of the proposed CRoSS, various types of private and public keys such as prompts, ControlNets, and LoRAs 22The last row of Fig. 7 are generated via LoRAs downloaded from https://civitai.com/. are incorporated in our framework. As illustrated in Fig. 6, our framework is capable of effectively hiding the secret images in the container images based on the user-provided “Prompt2” without noticeable artifacts or unrealistic image details. The container image allows for the seamless modification of a person’s identity information, facial attributes, as well as species of animals. The concepts of these two prompts can also differ significantly such as the Eiffel Tower and a tree, thereby enhancing the concealment capability and stealthiness of the container images. Meanwhile, the revealed image extracted with “Prompt1” exhibits well fidelity by accurately preserving the semantic information of secret images. Besides prompts, our CRoSS also supports the utilization of various other control conditions as keys, such as depth maps, scribbles, and segmentation maps. As depicted in Fig. 7, our methods can effectively hide and reveal the semantic information of the secret image without significantly compromising the overall visual quality or arousing suspicion. Our CRoSS can also adopt different LoRAs as keys, which is conducive to personalized image steganography.
为了验证所提出的 CRoSS 的可控性和灵活性,我们的框架中加入了各种类型的私钥和公钥,例如提示、ControlNet 和 LoRA 22The last row of Fig. 7 are generated via LoRAs downloaded from https://civitai.com/. 。如图 6 所示,我们的框架能够根据用户提供的 “Prompt2” 有效地将秘密图像隐藏在容器图像中,而不会出现明显的伪影或不切实际的图像细节。容器图像允许无缝修改一个人的身份信息、面部属性以及动物种类。这两个提示的概念也可以有很大不同,例如埃菲尔铁塔和树,从而增强容器图像的隐蔽性和隐蔽性。同时,使用 “Prompt1” 提取的揭示图像通过准确保留秘密图像的语义信息表现出良好的保真度。除了提示,我们的 CRoSS 还支持使用各种其他控制条件作为键,例如深度图、涂鸦图和分割图。如图 7 所示,我们的方法可以有效地隐藏和揭示秘密图像的语义信息,而不会显着影响整体视觉质量或引起怀疑。我们的 CRoSS 还可以采用不同的 LoRAs 作为键,有利于个性化图像隐写术。
| Methods 方法 | clean 干净 | Gaussian noise 高斯噪声 | Gaussian denoiser [60] 高斯降噪器 [60] | JPEG compression JPEG 压缩 | JPEG enhancer [60] JPEG 增强器 [60] | ||||||||
| σ = 10 | σ = 20 | σ = 30 | σ = 10 | σ = 20 | σ = 30 | Q = 20 | Q = 40 | Q = 80 | Q = 20 | Q = 40 | Q = 80 | ||
| Baluja [6] 巴鲁贾 [6] | 34.24 | 10.30 | 7.54 | 6.92 | 7.97 | 6.10 | 5.49 | 6.59 | 8.33 | 11.92 | 5.21 | 6.98 | 9.88 |
| ISN [33] 国际标准网 [33] | 41.83 | 12.75 | 10.98 | 9.93 | 11.94 | 9.44 | 6.65 | 7.15 | 9.69 | 13.44 | 5.88 | 8.08 | 11.63 |
| HiNet [24] | 42.98 | 12.91 | 11.54 | 10.23 | 11.87 | 9.32 | 6.87 | 7.03 | 9.78 | 13.23 | 5.59 | 8.21 | 11.88 |
| RIIS [66] 国际研究报告 [66] | 43.78 | 26.03 | 18.89 | 15.85 | 20.89 | 15.97 | 13.92 | 22.03 | 25.41 | 27.02 | 13.88 | 16.74 | 20.13 |
| CRoSS (ours) CRoSS(我们的) | 23.79 | 21.89 | 20.19 | 18.77 | 21.39 | 21.24 | 21.02 | 21.74 | 22.74 | 23.51 | 20.60 | 21.22 | 21.19 |
Table 2:PSNR(dB) results of the proposed CRoSS and other methods under different levels of degradations. The proposed CRoSS can achieve superior data fidelity in most settings. The best results are red and the second-best results are blue.
表 2: 所提出的 CRoSS 和其他方法在不同降解水平下的 PSNR(dB)结果。所提出的 CRoSS 可以在大多数设置下实现卓越的数据保真度。最好的结果是红色的,第二好的结果是蓝色的。
4.4Property Study#3: Robustness
4.4 属性研究#3:稳健性
Simulation Degradation. To validate the robustness of our method, we conduct experiments on simulation degradation such as Gaussian noise and JPEG compression. As reported in Tab. 2, our CRoSS performs excellent adaptability to various levels of degradation with minimal performance decrease, while other methods suffer significant drops in fidelity (over 20dB in PSNR). Meanwhile, our method achieves the best PSNR at σ=20 and σ=30. Furthermore, when we perform nonlinear image enhancement [60] on the degraded container images, all other methods have deteriorations but our CRoSS can still maintain good performance and achieve improvements in the Gaussian noise degradation. Noting that RIIS [66] is trained exclusively on degraded data, but our CRoSS is naturally resistant to various degradations in a zero-shot manner and outperforms RIIS in most scenarios.
模拟退化。 为了验证我们方法的鲁棒性,我们进行了高斯噪声和 JPEG 压缩等仿真退化实验。如表 2 所示,我们的 CRoSS 对各种降解水平具有出色的适应性,性能下降最小,而其他方法的保真度显着下降(PSNR 中超过 20 dB)。同时,我们的方法在 和 σ=30 处 σ=20 实现了最佳的 PSNR。此外,当我们对降级的容器图像进行非线性图像增强 [60] 时,所有其他方法都有恶化,但我们的 CRoSS 仍然可以保持良好的性能,并实现高斯噪声降解的改善。注意到 RIIS [66] 完全针对降级数据进行训练,但我们的 CRoSS 以零样本方式自然抵抗各种降解,并且在大多数情况下优于 RIIS。
Real-World Degradation. We further choose two real-world degradations including “WeChat” and “Shoot”. Specifically, we send and receive container images via the pipeline of WeChat to implement network transmission. Simultaneously, we utilize the mobile phone to capture the container images on the screen and then simply crop and warp them. Obviously, as shown in Fig. 8, all other methods have completely failed or present severe color distortion subjected to these two extremely complex degradations, yet our method can still reveal the approximate content of the secret images and maintain well semantic consistency, which proves the superiority of our method.
现实世界的退化。 我们进一步选择了两个现实世界的降级,包括“微信”和“拍摄”。具体来说,我们通过微信的管道发送和接收容器镜像,实现网络传输。同时,我们利用手机在屏幕上捕捉容器图像,然后简单地裁剪和扭曲它们。显然,如图 8 所示,在这两种极其复杂的降解下,所有其他方法都完全失败或呈现严重的颜色失真,但我们的方法仍然能够揭示秘密图像的大致内容并保持良好的语义一致性,这证明了我们方法的优越性。
CRoSS:基于扩散模型的可控隐写术
924

被折叠的 条评论
为什么被折叠?



