文本驱动图像编辑：可学习区域-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00448/article/details/147225997

文本驱动图像编辑：可学习区域

Learnable_Regions Official implementation of the work "Text-Driven Image Editing via Learnable Regions" (CVPR 2024) 项目地址: https://gitcode.com/gh_mirrors/le/Learnable_Regions

1. 项目介绍

本项目是基于文本提示的图像编辑方法，通过可学习区域来实现无需用户提供的掩码或草图即可进行图像编辑。本项目利用现有的预训练文本到图像模型，并引入一个边界框生成器，用以找到与文本提示对齐的编辑区域。此方法简单有效，兼容当前图像生成模型，能够处理包含多个对象、复杂句子或长段落的复杂提示。通过广泛用户研究，实验证明了本方法在处理高保真度和现实感的图像编辑方面的竞争力。

2. 项目快速启动

首先，需要克隆本项目仓库并设置Python环境。

git clone https://github.com/yuanze-lin/Learnable_Regions.git
cd Learnable_Regions
conda create -n LearnableRegion python==3.9 -y
source activate LearnableRegion
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
conda env update --file environment.yaml

单张图像编辑

使用以下命令开始编辑单张图像。如果之前未下载runwayml/stable-diffusion-inpainting模型，请设置--diffusion_model_path。

torchrun --nnodes=1 --nproc_per_node=1 train.py \
--image_file_path images/1.png \
--image_caption 'trees' \
--editing_prompt '中心有一棵开满花的树' \
--diffusion_model_path 'stabilityai/stable-diffusion-2-inpainting' \
--output_dir output/ \
--draw_box \
--lr 5e-3 \
--max_window_size 15 \
--per_image_iteration 10 \
--epochs 1 \
--num_workers 8 \
--seed 42 \
--pin_mem \
--point_number 9 \
--batch_size 1 \
--save_path checkpoints/

多张图像编辑

使用以下命令同时编辑多张图像。

torchrun --nnodes=1 --nproc_per_node=2 train.py \
--image_dir_path images/ \
--output_dir output/ \
--json_file images.json \
--diffusion_model_path 'stabilityai/stable-diffusion-2-inpainting' \
--draw_box \
--lr 5e-3 \
--max_window_size 15 \
--per_image_iteration 10 \
--epochs 1 \
--num_workers 8 \
--seed 42 \
--pin_mem \
--point_number 9 \
--batch_size 1 \
--save_path checkpoints/