linux服务器配置GroundingDINO 详细过程（py310, cuda118, pytorch2.1）（离线下载bert-base-uncased运行）(2024年6月)

原创已于 2024-06-24 17:13:01 修改 · 3.8k 阅读

34 ·

CC 4.0 BY-SA版权

文章标签：

#服务器 #linux #dino #目标检测 #物体识别

于 2024-05-31 22:02:57 首次发布

linux服务器配置GroundingDINO 详细过程

1. 参考帖子
2. 配置流程：
运行dino

1. 参考帖子

已经跑通了，该踩的坑也都踩过来了，大家按照我配置的流程来配，大概率不会出错。（现在是2024/5/31日）
Grounding DINO使用攻略一
 Github- IDEA-Research / GroundingDINO Public
【AI】Windows环境安装GroundingDINO
Ground DINO 自用/部署方法（notebook代码）
大家主要参考相关的Github网站

2. 配置流程：

环境配置：py310, cuda118, pytorch2.1

2.1 设置相关的环境变量：

# 首先： 设置相关的CUDA_HOME
echo $CUDA_HOME
# 寻找一下我们这台电脑的nvcc
which nvcc
# 显示/usr/local/cuda/bin/nvcc
# 设置相关的环境变量
export CUDA_HOME=/usr/local/cuda
# 再次输入：
echo $CUDA_HOME
显示：/usr/local/cuda
# 如果想永久的设置好cuda home(建议永久设置)，可以这样设置：
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
source ~/.bashrc
echo $CUDA_HOME

2.2 配置conda

下载anaconda

首先进入清华大学开源镜像站，
点击里面的archive页面，下载Anaconda3-2024.02-1-Linux-x86_64.sh 安装包。（但是夜里好像打不开清华大学开源镜像站，可能白天才能打开，可以换一个网址来下载。）

# 我使用的是autodl云服务器来跑的，这个服务器基本都把文件放到autodl-tmp文件夹下，大家进入自己的文件夹里就好
cd autodl-tmp
# 为Anaconda3-2024.02-1-Linux-x86_64.sh赋予运行权限
chmod +x Anaconda3-2024.02-1-Linux-x86_64.sh
# 开始安装
./Anaconda3-2024.02-1-Linux-x86_64.sh

# 之后在系统root根目录下运行
ls -a
# 之后可以看到相关的目录下有.bashrc文件，运行命令：
vim .bashrc
# 之后我们添加上一句： （因为我的conda装完以后默认就在/root/anaconda3下）
export PATH=/root/anaconda3/bin:$PATH
# 之后保存文件，然后
source ~/.bashrc
# 这样的话，我们默认运行的anaconda就是咱们自己安装的anaconda了。(系统里有两个conda，我制定了我自己安装的这个版本的conda，便于后续迁移到其他环境里)
# 我们运行命令：
conda info
# 只要我们包上传的过程不出错，安装的过程也不出错，同时全程正确的安装好了之后，这时显示的就是正确的conda信息、

配置相对应的环境

1. 先使用anadonda创建python3.10的DINO环境：

# 配置相对应的conda环境
source activate base
conda create -n dino python=3.10
conda init bash && source /root/.bashrc
conda activate dino

为了避免cuda版本错误，我在这里又额外利用conda安装了一下cuda：（服务器里已经有cuda11.8和cudnn的大家就不用再安装了）

conda search cudatoolkit
conda search cudnn
# 也可使用conda search cudatoolkit --info命令，下载后conda install ./cudatoolkit-11.3.1-h2bc3f7f_2.conda 本地安装
# 如果不怕慢，也可以换源搜索：
# conda search cudatoolkit -c conda-forge
# conda search cudnn -c conda-forge
conda install cudatoolkit==11.8.0
conda install cudnn==8.9.7.29
# 我这里本地下载安装了
# conda search cudatoolkit --info -c conda-forge
# conda install ./cudnn-8.9.7.29-hcdd5f01_2.conda
# 接下来可以检查一下conda安装的cuda和cudnn
conda list | grep cudatoolkit
conda list | grep cudnn

现在安装一下对应cuda11.8版本的pytorch2.1

conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia

然后进入安装GroundingDINO：

# 首先获取相关的DINO的安装包
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
# 其次安装相关的DINO
pip install -e .

# 这回我们使用pip install -e . 没有报错了，因为提前安装了pytorch ，对应上了电脑用的conda

# 所以我们还是参考那个网址吧： https://zhuanlan.zhihu.com/p/635346878
# 创建weights文件夹
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

然后我们下载bert-base-uncased文件到本地文件夹中。把bert-base-uncased.tar移动到autodl-tmp/GroundingDINO文件夹内。然后解压
(
PS,我已经提前下好了bert-base-uncased.tar了，大家也可以直接用我的网盘来下载
链接：https://pan.baidu.com/s/18XNPQweebr6J4gBdb_zf-w
提取码：8x8d
–来自百度网盘超级会员V8的分享
）

cd GroundingDINO
tar -xvf bert-base-uncased.tar

运行dino

首先更改get_tokenlizer.py的相关代码：

GroundingDINO/groundingdino/util/get_tokenlizer.py文件原来的代码如下：

from transformers import AutoTokenizer, BertModel, BertTokenizer, RobertaModel, RobertaTokenizerFast
import os

def get_tokenlizer(text_encoder_type):
    if not isinstance(text_encoder_type, str):
        # print("text_encoder_type is not a str")
        if hasattr(text_encoder_type, "text_encoder_type"):
            text_encoder_type = text_encoder_type.text_encoder_type
        elif text_encoder_type.get("text_encoder_type", False):
            text_encoder_type = text_encoder_type.get("text_encoder_type")
        elif os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type):
            pass
        else:
            raise ValueError(
                "Unknown type of text_encoder_type: {}".format(type(text_encoder_type))
            )
    print("final text_encoder_type: {}".format(text_encoder_type))

    tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)
    return tokenizer


def get_pretrained_language_model(text_encoder_type):
    if text_encoder_type == "bert-base-uncased" or (os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type)):
        return BertModel.from_pretrained(text_encoder_type)
    if text_encoder_type == "roberta-base":
        return RobertaModel.from_pretrained(text_encoder_type)

    raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))

将这个文件的代码修改成：（这样可以载入本地的bert-base-uncased文件）
（我在修改过程中参考了如下帖子：
请问，离线运行，配置文件需要修改哪些地方？需要下载哪些文件？
Huggingface-Download files from the Hub
Unable to download online，已参考#75
google-bert/bert-base-uncased
但是之前的作者代码修改有问题，我又参考了如下帖子进行修改：
下载BERT模型到本地,并且加载使用
 pytorch-pretrained-bert的模型下载慢的问题）
（这些帖子还给了使用镜像下载bert的方法，不用怎么修改函数）
最终发现作者报错是因为没使用绝对路径：
from_pretrained()方法加载本地模型时报错：huggingface_hub.utils._validators.HFValidationError）
上面的错误我都修改好了，大家直接运行我下面的代码即可（记得把bert-base-uncased放到对的位置上）
我最近再新开一个帖子，跟大家说怎么去本地下载bert-base-uncased模型

from transformers import AutoTokenizer, BertModel, BertTokenizer, RobertaModel, RobertaTokenizerFast
import os

def get_tokenlizer(text_encoder_type):
    # import ipdb;ipdb.set_trace();
    if not isinstance(text_encoder_type, str):
        # print("text_encoder_type is not a str")
        if hasattr(text_encoder_type, "text_encoder_type"):
            text_encoder_type = text_encoder_type.text_encoder_type
        elif text_encoder_type.get("text_encoder_type", False):
            text_encoder_type = text_encoder_type.get("text_encoder_type")
        elif os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type):
            pass
        else:
            raise ValueError(
                "Unknown type of text_encoder_type: {}".format(type(text_encoder_type))
            )
    print("final text_encoder_type: {}".format(text_encoder_type))
    
    # 新添加代码片段
    tokenizer_path = "/root/autodl-tmp/GroundingDINO/bert-base-uncased"    # 这个需要使用绝对路径才可以。他这里使用了相对路径，有可能报错。
    tokenizer = BertTokenizer.from_pretrained(tokenizer_path, use_fast=False)
    return tokenizer

    '''
    tokenizer = AutoTokenizer.from_pretrained(text_encoder_type)
    return tokenizer
    '''
    

def get_pretrained_language_model(text_encoder_type):
    # import ipdb;ipdb.set_trace();
    if text_encoder_type == "bert-base-uncased" or (os.path.isdir(text_encoder_type) and os.path.exists(text_encoder_type)):
        # 新添加代码片段
        model_path = "/root/autodl-tmp/GroundingDINO/bert-base-uncased"
        return BertModel.from_pretrained(model_path)
        # return BertModel.from_pretrained(text_encoder_type)
    if text_encoder_type == "roberta-base":
        return RobertaModel.from_pretrained(text_encoder_type)

    raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))

运行程序

创建一个test.py文件：
在grounding dino文件夹下，新建一个test.py文件，其代码参考这个帖子：
https://zhuanlan.zhihu.com/p/670262724
但是因为这个作者的风格是ipynb风格的代码，我将其改成.py文件能运行的代码如下：

import os
import torch
import requests
from groundingdino.util.inference import load_model, load_image, predict, annotate
import supervision as sv

# settings
DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

HOME = os.getcwd()
CONFIG_PATH = os.path.join(HOME, "groundingdino/config/GroundingDINO_SwinT_OGC.py")
WEIGHTS_NAME = "groundingdino_swint_ogc.pth"
WEIGHTS_PATH = os.path.join(HOME, "weights", WEIGHTS_NAME)

model = load_model(CONFIG_PATH, WEIGHTS_PATH)

# 下载示例图片
data_dir = os.path.join(HOME, 'data')
os.makedirs(data_dir, exist_ok=True)
os.chdir(data_dir)

image_urls = [
    "https://media.roboflow.com/notebooks/examples/dog.jpeg",
    "https://media.roboflow.com/notebooks/examples/dog-2.jpeg",
    "https://media.roboflow.com/notebooks/examples/dog-3.jpeg",
    "https://media.roboflow.com/notebooks/examples/dog-4.jpeg",
]

for url in image_urls:
    image_name = url.split('/')[-1]
    response = requests.get(url)
    with open(image_name, 'wb') as f:
        f.write(response.content)

IMAGE_NAME = "dog-2.jpeg"
IMAGE_PATH = os.path.join(HOME, "data", IMAGE_NAME)

TEXT_PROMPT = "straw" # 可以换成"dog"，则只给出dog
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

image_source, image = load_image(IMAGE_PATH)

boxes, logits, phrases = predict(
    model=model,  
    image=image,  
    caption=TEXT_PROMPT,  
    box_threshold=BOX_TRESHOLD,  
    text_threshold=TEXT_TRESHOLD
)

annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)

# 为了在Python脚本中展示图片，你可能需要使用其他库，如OpenCV或PIL等
# 这里只是保存图片，你可能需要根据你的需求进行修改
sv.plot_image(annotated_frame, (16, 16))

之后python test.py运行就可以了。

也可以运行如下程序2：
创建一个test2.py

from groundingdino.util.inference import load_model, load_image, predict, annotate
import cv2

model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
IMAGE_PATH = "weights/dog-3.jpeg"
TEXT_PROMPT = "chair . person . dog ."
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

image_source, image = load_image(IMAGE_PATH)

boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=TEXT_PROMPT,
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
cv2.imwrite("annotated_image.jpg", annotated_frame)

python test2.py

之后就根据自己的需求对代码做进一步的修改即可。

我自己对代码做的修改：

检测dog-3.jpeg图片中的chair，person，dog三个物体

from groundingdino.util.inference import load_model, load_image, predict, annotate
import cv2

model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
IMAGE_PATH = "weights/dog-3.jpeg"
TEXT_PROMPT = "chair . person . dog ."
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

image_source, image = load_image(IMAGE_PATH)

boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=TEXT_PROMPT,
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)
import ipdb;ipdb.set_trace()
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)    #这个文件里的phrases直接就可以拿来用，而且也可以和boxes一起拿来用。
cv2.imwrite("annotated_image.jpg", annotated_frame)