Janus-pro 本地运行问题

最新推荐文章于 2025-05-23 09:16:47 发布

marconiho

最新推荐文章于 2025-05-23 09:16:47 发布

阅读量200

点赞数 6

文章标签： python pytorch ai

本文链接：https://blog.youkuaiyun.com/weixin_41645458/article/details/146056673

版权

问题一：安装 torch GPU版本

由于pip安装太慢
阿里云下载对应cuda版本的 torch 包阿里云镜像站
安装本地下载的 torch 包

pip install C:\Users\xxx\Downloads\torch-2.2.2+cu118-cp310-cp310-win_amd64.whl

问题二：RuntimeError: “triu_tril_cuda_template“ not implemented for ‘BFloat16‘

官方requirements中版本

torch==2.0.1

将版本替换

torch==2.2.2

下载对应的 torch GPU包安装即可

运行示例

import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images


if __name__ == '__main__':
    # 指定模型路径
    model_path = "../Janus-Pro-1B"
    # 加载VLChatProcessor
    vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
    # 加载分词器
    tokenizer = vl_chat_processor.tokenizer
    # 加载vl_gpt
    vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
        model_path, trust_remote_code=True
    )
    vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
    image = "./pic.png"
    # question = "explain this meme"
    question = "这张图片有什么？"
    conversation = [
        {
            "role": "<|User|>",
            "content": f"<image_placeholder>\n{question}",
            "images": [image],
        },
        {"role": "<|Assistant|>", "content": ""},
    ]
    pil_images = load_pil_images(conversation)
    prepare_inputs = vl_chat_processor(
        conversations=conversation, images=pil_images, force_batchify=True
    ).to(vl_gpt.device)
    # # run image encoder to get the image embeddings
    inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
    print(inputs_embeds)
    # # run the model to get the response
    outputs = vl_gpt.language_model.generate(
        inputs_embeds=inputs_embeds,
        attention_mask=prepare_inputs.attention_mask,
        pad_token_id=tokenizer.eos_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=512,
        do_sample=False,
        use_cache=True,
    )
    answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
    print(f"{prepare_inputs['sft_format'][0]}", answer)