CLIP Interrogator技术解析：从图像生成高质量文本提示的AI工具

姬如雅Brina

于 2025-06-19 09:25:46 发布

阅读量273

点赞数 3

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00096/article/details/148758211

CLIP Interrogator技术解析：从图像生成高质量文本提示的AI工具

clip-interrogator Image to prompt with BLIP and CLIP 项目地址: https://gitcode.com/gh_mirrors/cl/clip-interrogator

项目概述

CLIP Interrogator是一个基于多模态AI模型的工具，能够分析输入图像并生成最适合用于Stable Diffusion等文本到图像生成模型的文本提示(prompt)。该项目由pharmapsychotic开发，目前已经迭代到2.4版本，专门针对Stable Diffusion模型进行了优化。

核心原理

CLIP Interrogator的核心技术基于以下几个关键组件：

CLIP模型：OpenAI开发的多模态模型，能够理解图像和文本之间的关联
BLIP/GIT模型：用于生成图像的初始描述
特征匹配系统：将图像特征与预定义的艺术家风格、艺术流派等分类进行匹配

环境配置与安装

使用CLIP Interrogator需要配置Python环境并安装必要的依赖包：

pip install gradio open_clip_torch clip-interrogator

项目支持两种主要的CLIP模型配置：

ViT-L-14/openai：适用于Stable Diffusion 1.X系列
ViT-H-14/laion2b_s32b_b79k：适用于Stable Diffusion 2.0及以上版本

主要功能模块

1. 单图像分析功能

def image_analysis(image):
    # 将图像转换为特征向量
    image_features = ci.image_to_features(image)
    
    # 分析图像的艺术风格特征
    top_mediums = ci.mediums.rank(image_features, 5)
    top_artists = ci.artists.rank(image_features, 5)
    top_movements = ci.movements.rank(image_features, 5)
    top_trendings = ci.trendings.rank(image_features, 5)
    top_flavors = ci.flavors.rank(image_features, 5)
    
    return 艺术媒介排名, 艺术家排名, 艺术流派排名, 流行趋势排名, 风格特征排名

2. 提示词生成功能

提供四种生成模式：

best模式：生成最全面的提示词
fast模式：快速生成基本提示词
classic模式：使用经典算法生成
negative模式：生成负面提示词(用于排除不需要的元素)

def image_to_prompt(image, mode):
    if mode == 'best':
        return ci.interrogate(image)
    elif mode == 'classic':
        return ci.interrogate_classic(image)
    elif mode == 'fast':
        return ci.interrogate_fast(image)
    elif mode == 'negative':
        return ci.interrogate_negative(image)

3. 批量处理功能

支持对文件夹中的大量图像进行批量处理，输出方式有两种：

生成desc.csv文件保存提示词
直接重命名文件包含提示词内容

def sanitize_for_filename(prompt: str, max_len: int) -> str:
    # 清理字符串使其适合作为文件名
    name = "".join(c for c in prompt if (c.isalnum() or c in ",._-! "))
    return name.strip()[:(max_len-4)]