Learning Large-Factor EM Image Super-Resolution with Generative Priors 翻译

Doc2X:批量处理 PDF 的理想工具
支持 批量PDF识别、表格转换、代码解析,结合深度翻译功能,提升工作效率!
Doc2X: Ideal Tool for Batch PDF Processing
Supports batch PDF recognition, table conversion, and code parsing, combined with advanced translation for enhanced productivity!
👉 立即访问 Doc2X | Visit Doc2X Now

原文链接: Shou_Learning_Large-Factor_EM_Image_Super-Resolution_with_Generative_Priors_CVPR_2024_paper.pdf

Learning Large-Factor EM Image Super-Resolution with Generative Priors

基于生成先验的超分辨率大因子EM图像学习

Jiateng Shou 1 {}^{1} 1 Zeyu Xiao 1 {}^{1} 1 Shiyu Deng 1 {}^{1} 1 Wei Huang 1 {}^{1} 1 Peiyao Shi 3 {}^{3} 3

Ruobing Zhang 3 , 2 {}^{3,2} 3,2 Zhiwei Xiong 1 , 2 {}^{1,2} 1,2 Feng Wu 1 , 2 , † {}^{1,2, \dagger } 1,2,

1 {}^{1} 1 MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition,University of Science and Technology of China

1 {}^{1} 1 中国科学技术大学脑启发智能感知与认知MoE重点实验室

2 {}^{2} 2 Institute of Artificial Intelligence,Hefei Comprehensive National Science Center

2 {}^{2} 2 合肥综合性国家科学中心人工智能研究所

3 {}^{3} 3 Suzhou Institute of Biomedical Engineering and Technology,Chinese Academy of Sciences shoujt@mail.ustc.edu.cn {zwxiong,fengwu}@ustc.edu.cn

3 {}^{3} 3 中国科学院苏州生物医学工程与技术研究所 shoujt@mail.ustc.edu.cn {zwxiong,fengwu}@ustc.edu.cn

Abstract

摘要

As the mainstream technique for capturing images of biological specimens at nanometer resolution, electron microscopy (EM) is extremely time-consuming for scanning wide field-of-view (FOV) specimens. In this paper, we investigate a challenging task of large-factor EM image super-resolution (EMSR), which holds great promise for reducing scanning time, relaxing acquisition conditions, and expanding imaging FOV. By exploiting the repetitive structures and volumetric coherence of EM images, we propose the first generative learning-based framework for large-factor EMSR. Specifically, motivated by the predictability of repetitive structures and textures in EM images, we first learn a discrete codebook in the latent space to represent high-resolution (HR) cell-specific priors and a latent vector indexer to map low-resolution (LR) EM images to their corresponding latent vectors in a generative manner. By incorporating the generative cell-specific priors from HR EM images through a multi-scale prior fusion module, we then deploy multi-image feature alignment and fusion to further exploit the inter-section coherence in the volumetric EM data. Extensive experiments demonstrate that our proposed framework outperforms advanced single-image and video super-resolution methods for 8 × 8 \times 8× and 16 × {16} \times 16× EMSR (i.e.,with 64 times and 256 times less data acquired, respectively), achieving superior visual reconstruction quality and downstream segmentation accuracy on benchmark EM datasets. Code is available at https://github.com/jtshou/

作为以纳米分辨率捕捉生物样本图像的主流技术,电子显微镜(EM)在扫描宽视场(FOV)样本时非常耗时。本文研究了一项具有挑战性的任务,即大因子电子显微镜图像超分辨率(EMSR),这为减少扫描时间、放宽采集条件和扩展成像FOV提供了极大的潜力。通过利用EM图像的重复结构和体积一致性,我们提出了第一个基于生成学习的大因子EMSR框架。具体而言,受到EM图像中重复结构和纹理可预测性的启发,我们首先在潜在空间中学习一个离散代码本,以表示高分辨率(HR)细胞特定先验,并通过生成方式映射低分辨率(LR)EM图像到其对应的潜在向量索引器。通过多尺度先验融合模块,我们将HR EM图像中的生成细胞特定先验结合进来,然后部署多图像特征对齐和融合,以进一步利用体积EM数据中的交叉一致性。大量实验表明,我们提出的框架在 8 × 8 \times 8× 16 × {16} \times 16× EMSR(即分别以64倍和256倍更少的数据采集)方面优于先进的单图像和视频超分辨率方法,达到了卓越的视觉重建质量和基准EM数据集上的下游分割精度。代码可在 https://github.com/jtshou/ 获取。

GPEMSR.

1. Introduction

1. 引言

Electron microscopy (EM) is a commonly used imaging technique in life sciences to investigate the ultrastructure of cells, tissues, organelles, and macromolecular complexes, which captures images of biological specimens at nanometer resolution. However, high-quality EM image acquisition typically requires a strict and time-consuming process, involving careful adjustments of beam current, aperture size, and detector settings. This process may take up to years to scan wide field-of-view (FOV) specimens. For example, Zheng et al. [57] spent approximately 16 months to acquire a ∼ 106    T B \sim {106}\mathrm{\;{TB}} 106TB whole-brain dataset of an adult drosophila melanogaster. The long acquisition time greatly limits the application of EM imaging in analyzing complete biological structures in large specimens, such as neuron connections in mammalian brains.

电子显微镜(EM)是一种在生命科学中常用的成像技术,用于研究细胞、组织、细胞器和大分子复合物的超微结构,能够以纳米级分辨率捕获生物样本的图像。然而,高质量的EM图像获取通常需要严格且耗时的过程,包括对束流强度、光圈大小和探测器设置的仔细调整。这个过程可能需要长达数年的时间来扫描广视野(FOV)样本。例如,Zheng等人[57]花费了大约16个月的时间来获取一组成年果蝇的 ∼ 106    T B \sim {106}\mathrm{\;{TB}} 106TB全脑数据集。漫长的获取时间极大限制了EM成像在分析大型样本中完整生物结构(如哺乳动物大脑中的神经连接)的应用。

Image super-resolution (SR), which is capable of restoring high-resolution (HR) images from their corresponding low-resolution (LR) observations, has the potential to revolutionize EM imaging by allowing for faster and less restrictive data acquisition, while also providing high-quality images with a wide field of view. By applying SR to EM images (shorted as EMSR hereafter), the capturing time can be significantly reduced, and the strict capturing conditions can be relaxed. By deploying a simple ResNet-based UNet model, Fang et al. [13] have demonstrated the promising performance of EMSR for 4 × 4 \times 4× magnification (i.e., with 16 times less data acquired). However, achieving even larger-factor EMSR to further reduce capturing time remains challenging. This is in accordance with existing methods [ 5 , 18 , 30 , 35 , 50 , 58 ] \left\lbrack {5,{18},{30},{35},{50},{58}}\right\rbrack [5,18,30,35,50,58] for natural images,which can achieve satisfactory results for up to 4 × 4 \times 4× magnification, but fail to meet the demand for larger factors.

图像超分辨率(SR)能够从相应的低分辨率(LR)观测中恢复高分辨率(HR)图像,具有革命性地改变EM成像的潜力,允许更快且限制更少的数据获取,同时提供高质量的广视野图像。通过将SR应用于EM图像(以下简称为EMSR),捕获时间可以显著减少,严格的捕获条件也可以放宽。Fang等人[13]通过部署一个基于ResNet的UNet模型,展示了EMSR在 4 × 4 \times 4×放大(即获取的数据量减少16倍)方面的良好性能。然而,实现更大倍数的EMSR以进一步减少捕获时间仍然具有挑战性。这与现有的自然图像方法 [ 5 , 18 , 30 , 35 , 50 , 58 ] \left\lbrack {5,{18},{30},{35},{50},{58}}\right\rbrack [5,18,30,35,50,58]相符,这些方法在高达 4 × 4 \times 4×的放大倍数下能够取得令人满意的结果,但未能满足更大倍数的需求。

On the other hand, recent advances in generative models, such as ChatGPT and diffusion-based models [9, 16, 21, 43], reveal powerful capability in automatic content generation, including natural languages and images. This motivates us to consider the EMSR task from a generative perspective. Especially, compared with natural images that possess diverse structures and textures, EM images often exhibit repetitive structures and textures due to the predictability of imaging specimens, making it more suitable to leverage generative learning for accurate reconstruction. In this paper, we propose a novel deep learning-based framework tailored to the challenging task of large-factor EMSR, by 1) exploiting the repetitive structures and textures in EM images with generative cell-specific priors learned from HR EM images, and 2) exploiting the inter-section coherence in the volumetric EM data by aggregating features learned from multiple consecutive images.

另一方面,最近在生成模型方面的进展,如 ChatGPT 和基于扩散的模型 [9, 16, 21, 43],揭示了在自动内容生成方面的强大能力,包括自然语言和图像。这激励我们从生成的角度考虑 EMSR 任务。特别是,与具有多样结构和纹理的自然图像相比,EM 图像由于成像样本的可预测性,通常表现出重复的结构和纹理,这使得利用生成学习进行准确重建更为合适。本文提出了一种新颖的基于深度学习的框架,专门针对大因子 EMSR 的挑战任务,具体通过 1) 利用从高分辨率 EM 图像中学习到的生成细胞特定先验来利用 EM 图像中的重复结构和纹理,以及 2) 通过聚合从多个连续图像中学习到的特征,利用体积 EM 数据中的交叉一致性。


† {}^{ \dagger } Corresponding author.

† {}^{ \dagger } 通讯作者。


Specifically, our framework explores cell-specific priors using a VQGAN-Indexer network, consisting of VQ-GAN [12] and our proposed latent vector indexer. We first learn a discrete codebook to represent the distribution of HR EM images in the latent space. The codebook captures both structure and texture information, while the decoder establishes relationships between latent vectors and image patches. We then train a latent vector indexer to acquire the corresponding latent vectors and integrate the indexer with the codebook and the decoder for generating HR EM images. By treating the generation process as an indexing task, we can match LR EM images with their corresponding HR feature representations from the latent space, thereby obtaining priors solely derived from HR EM images.

具体而言,我们的框架使用 VQGAN-Indexer 网络探索细胞特定先验,该网络由 VQ-GAN [12] 和我们提出的潜在向量索引器组成。我们首先学习一个离散代码本,以表示潜在空间中高分辨率 EM 图像的分布。代码本捕获结构和纹理信息,而解码器则建立潜在向量与图像块之间的关系。然后,我们训练一个潜在向量索引器,以获取相应的潜在向量,并将索引器与代码本和解码器集成,以生成高分辨率 EM 图像。通过将生成过程视为索引任务,我们可以将低分辨率 EM 图像与其对应的来自潜在空间的高分辨率特征表示进行匹配,从而获得仅源自高分辨率 EM 图像的先验。

To maintain reconstruction quality while prioritizing downstream segmentation accuracy, we propose a Multi-Scale Prior Fusion (MPF) module for incorporating the above learned cell-specific priors in EMSR. We use the VQGAN-Indexer output as reference images and learn a mask for fusing reference features based on the patch-level cosine similarity between LR EM images and corresponding reference images. To fully utilize the latent vectors and relationships learned by the decoder, we use multi-scale reference features from different layers of the decoder with varying resolutions. Following the MPF module, our framework includes two key steps for exploiting inter-section coherence in the volumetric EM data: multi-image feature alignment (along the axial direction) and multi-image feature fusion. To this end, we introduce a Pyramid Optical-flow-based Deformable convolution alignment (POD) module and a 3D Spatial-Attention fusion (3DA) module. The former leverages a pre-trained optical-flow network SPyNet [42] and deformable convolutions [ 6 , 60 ] \left\lbrack {6,{60}}\right\rbrack [6,60] ,while the latter leverages the spatial attention mechanism and 3D convolutions. Both improve reconstruction quality and downstream segmentation accuracy for large-factor EMSR.

为了在优先考虑下游分割精度的同时保持重建质量,我们提出了一种多尺度先验融合(MPF)模块,以将上述学习的细胞特定先验纳入EMSR。我们使用VQGAN-Indexer的输出作为参考图像,并学习一个掩码,以根据LR EM图像与相应参考图像之间的补丁级余弦相似度融合参考特征。为了充分利用解码器学习的潜在向量和关系,我们使用来自解码器不同层的多尺度参考特征,这些特征具有不同的分辨率。在MPF模块之后,我们的框架包括两个关键步骤,以利用体积EM数据中的交叉一致性:多图像特征对齐(沿轴向)和多图像特征融合。为此,我们引入了基于金字塔光流的可变形卷积对齐(POD)模块和3D空间注意力融合(3DA)模块。前者利用预训练的光流网络SPyNet [42] 和可变形卷积 [ 6 , 60 ] \left\lbrack {6,{60}}\right\rbrack [6,60],而后者利用空间注意力机制和3D卷积。两者均提高了大因子EMSR的重建质量和下游分割精度。

In summary, this paper offers the following contributions. 1) We present the first generative learning-based framework for the challenging task of large-factor EMSR. 2) We introduce the VQGAN-Indexer network to explore generative cell-specific prior information from HR EM images. 3) We propose the MPF module to effectively utilize the generative priors while preserving image fidelity with LR observations, followed by the POD and 3DA modules for multi-image feature alignment and fusion. 4) Extensive experiments demonstrate the superiority of our framework in terms of both reconstruction quality and downstream segmentation accuracy for 8 × 8 \times 8× and 16 × {16} \times 16× EMSR.

总之,本文提供了以下贡献。1)我们提出了第一个基于生成学习的框架,用于解决大因子EMSR这一具有挑战性的任务。2)我们引入了VQGAN-Indexer网络,以探索来自HR EM图像的生成细胞特定先验信息。3)我们提出了MPF模块,以有效利用生成先验,同时保持与LR观测的图像保真度,随后是POD和3DA模块,用于多图像特征对齐和融合。4)大量实验表明,我们的框架在重建质量和下游分割精度方面优于 8 × 8 \times 8× 16 × {16} \times 16× EMSR。

2. Related Work

2. 相关工作

Electron microscopy image super-resolution. Existing EMSR methods can be categorized into two types: restoring isotropic volumes from anisotropic ones, i.e., SR along the axial dimension [ 8 , 20 ] \left\lbrack {8,{20}}\right\rbrack [8,20] ,and reconstructing HR images from corresponding LR observations in the lateral dimensions [ 7 , 13 , 40 , 46 , 53 ] \left\lbrack {7,{13},{40},{46},{53}}\right\rbrack [7,13,40,46,53] . We focus on the latter task in this paper, while our proposed framework may also apply to the former task. As a pioneering work in the field of EMSR, Sreehari et al. [46] introduce a Bayesian framework and utilize a library-based non-local means (LB-NLM) algorithm to achieve up to 16 × {16} \times 16× EMSR without requiring a training process. However, this non-learning-based method limits performance and is not specifically designed for large-factor EMSR. Along the deep learning line, Nehme et al. [40] train a fully convolutional encoder-decoder network on simulated data to reconstruct super-resolved images. Hann et al. [7] train a GAN model using pairs of test specimens captured from the same region of interest. Xie et al. [53] leverage the attention mechanism to capture inter-section dependencies and shared features among adjacent images. Compared to previous EMSR methods, our framework not only utilizes adjacent EM images but also explores and integrates generative cell-specific priors to tackle the challenging task of large-factor EMSR.

电子显微镜图像超分辨率。现有的 EMSR 方法可以分为两类:从各向异性体积恢复各向同性体积,即沿轴向维度的 SR [ 8 , 20 ] \left\lbrack {8,{20}}\right\rbrack [8,20],以及从相应的横向维度的低分辨率观察重建高分辨率图像 [ 7 , 13 , 40 , 46 , 53 ] \left\lbrack {7,{13},{40},{46},{53}}\right\rbrack [7,13,40,46,53]。本文重点关注后者任务,同时我们提出的框架也可能适用于前者任务。作为 EMSR 领域的开创性工作,Sreehari 等人 [46] 引入了一个贝叶斯框架,并利用基于库的非局部均值 (LB-NLM) 算法实现高达 16 × {16} \times 16× 的 EMSR,而无需训练过程。然而,这种非学习型方法限制了性能,并且并非专门为大因子 EMSR 设计。在深度学习方面,Nehme 等人 [40] 在模拟数据上训练了一个全卷积编码器-解码器网络以重建超分辨率图像。Hann 等人 [7] 使用从同一区域捕获的测试样本对训练 GAN 模型。Xie 等人 [53] 利用注意机制捕捉相邻图像之间的交叉依赖性和共享特征。与以前的 EMSR 方法相比,我们的框架不仅利用相邻的 EM 图像,还探索并整合生成的细胞特异性先验,以应对大因子 EMSR 的挑战性任务。

Video super-resolution. Video super-resolution (VSR) aims to restore HR frames by leveraging adjacent temporal information in multiple LR frames. To align temporal features, optical flow [3, 5, 26, 44, 49, 52, 54] and deformable convolution [ 47 , 50 ] \left\lbrack { {47},{50}}\right\rbrack [47,50] ,have been widely adopted. Recently, transformer-based approaches [ 4 , 36 ] \left\lbrack {4,{36}}\right\rbrack [4,36] yield remarkable advancements in VSR, owing to the utilization of diverse attention mechanisms. Inspired by these VSR methods, to exploit the inter-section coherence in the volumetric EM data, we utilize optical-flow networks and deformable convolutions for multi-image feature alignment, and spatial attention mechanisms for multi-image feature fusion.

视频超分辨率。视频超分辨率(VSR)旨在通过利用多个低分辨率(LR)帧中的相邻时间信息来恢复高分辨率(HR)帧。为了对齐时间特征,光流 [3, 5, 26, 44, 49, 52, 54] 和可变形卷积 [ 47 , 50 ] \left\lbrack { {47},{50}}\right\rbrack [47,50] 被广泛采用。最近,基于变换器的方法 [ 4 , 36 ] \left\lbrack {4,{36}}\right\rbrack [4,36] 在 VSR 中取得了显著进展,这得益于多样的注意机制的利用。受到这些 VSR 方法的启发,为了利用体积电镜(EM)数据中的交叉一致性,我们利用光流网络和可变形卷积进行多图像特征对齐,并使用空间注意机制进行多图像特征融合。

Generative priors in image restoration. Generative image restoration methods [11, 31-33] employ the priors from the pre-trained generative adversarial network (GAN), such as StyleGAN [24] and BigGAN [2], to approximate the natural image manifold and synthesize high-quality images. Given the superior performance of discrete codebook-based generative methods in semantic image synthesis, structure-to-image,and stochastic super-resolution tasks [12, 48], recent methods explore codebook-based facial priors [17, 59] by leveraging VQGAN [12] for training. Different from these methods, we propose a latent vector indexer to exploit the information contained within the input LR images, and the MPF module to fuse generative priors.

图像恢复中的生成先验。生成图像恢复方法 [11, 31-33] 利用预训练的生成对抗网络(GAN)中的先验,如 StyleGAN [24] 和 BigGAN [2],来近似自然图像流形并合成高质量图像。鉴于基于离散代码本的生成方法在语义图像合成、结构到图像和随机超分辨率任务 [12, 48] 中的优越表现,最近的方法通过利用 VQGAN [12] 进行训练,探索基于代码本的面部先验 [17, 59]。与这些方法不同,我们提出了一种潜在向量索引器,以利用输入低分辨率图像中包含的信息,以及 MPF 模块来融合生成先验。

3. Method

3. 方法

3.1. Overview

3.1. 概述

As illustrated in Figure 1, the goal of large-factor EMSR is to obtain the super-resolved I S R 0 ∈ R r H × r W × 1 {I}_{SR}^{0} \in {\mathbb{R}}^{ {rH} \times {rW} \times 1}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值