OmniFusion 项目启动与配置教程-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00263/article/details/148200451

OmniFusion 项目启动与配置教程

OmniFusion OmniFusion — a multimodal model to communicate using text and images 项目地址: https://gitcode.com/gh_mirrors/om/OmniFusion

1. 项目的目录结构及介绍

OmniFusion 是一个先进的跨模态 AI 模型，旨在通过整合图像、文本等多种数据模态，扩展传统语言处理系统的能力。项目目录结构如下：

OmniFusion/
├── content/
│   └── README.md
├── docs/
├── LICENSE
└── README.md

content/: 包含项目的 README 文档。
docs/: 包含项目的官方文档。
LICENSE: 包含项目的许可证信息。
README.md: 包含项目的总体介绍。

2. 项目的启动文件介绍

项目的启动文件是 README.md，它位于项目的根目录和 content/ 目录下。README.md 文件提供了项目的概述、功能、架构、训练过程、如何使用以及未来计划等信息。

3. 项目的配置文件介绍

项目中没有明确的配置文件，因为大部分配置都是通过代码实现的。但是，在项目的 README.md 文件中，提供了如何使用模型的示例代码，这可以被视为一种配置方式。

import torch
from PIL import Image
from transformers import AutoTokenizer, AutoModelForCausalLM
from urllib.request import urlopen
import torch.nn as nn
from huggingface_hub import hf_hub_download

# 加载一些源文件
hf_hub_download(repo_id="AIRI-Institute/OmniFusion", filename="models.py", local_dir='./')
from models import CLIPVisionTower
DEVICE = "cuda:0"
PROMPT = "This is a dialog with AI assistant.\n"

tokenizer = AutoTokenizer.from_pretrained("AIRI-Institute/OmniFusion", subfolder="OmniMistral-v1_1/tokenizer", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("AIRI-Institute/OmniFusion", subfolder="OmniMistral-v1_1/tuned-model", torch_dtype=torch.bfloat16, device_map=DEVICE)
hf_hub_download(repo_id="AIRI-Institute/OmniFusion", filename="OmniMistral-v1_1/projection.pt", local_dir='./')
hf_hub_download(repo_id="AIRI-Institute/OmniFusion", filename="OmniMistral-v1_1/special_embeddings.pt", local_dir='./')

projection = torch.load("OmniMistral-v1_1/projection.pt", map_location=DEVICE)
special_embs = torch.load("OmniMistral-v1_1/special_embeddings.pt", map_location=DEVICE)
clip = CLIPVisionTower("openai/clip-vit-large-patch14-336")
clip.load_model()
clip = clip.to(device=DEVICE, dtype=torch.bfloat16)

def gen_answer(model, tokenizer, clip, projection, query, special_embs, image=None):
    # ... (代码省略，请参考项目中的实际代码)

img_url = "https://i.pinimg.com/originals/32/c7/81/32c78115cb47fd4825e6907a83b7afff.jpg"
question = "What is the sky color on this image?"
img = Image.open(urlopen(img_url))
answer = gen_answer(model, tokenizer, clip, projection, query=question, special_embs=special_embs, image=img)
img.show()
print(question)
print(answer)

这段代码演示了如何使用 OmniFusion 模型生成对给定图像问题的回答。在实际使用中，您可能需要根据具体需求调整代码中的参数和配置。

OmniFusion OmniFusion — a multimodal model to communicate using text and images 项目地址: https://gitcode.com/gh_mirrors/om/OmniFusion

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考