3DGS语义分割之LangSplat

原创已于 2024-09-02 10:05:06 修改 · 3.3k 阅读

29 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #深度学习 #AIGC #计算机视觉

于 2024-06-03 22:00:00 首次发布

DeepLearning 专栏收录该内容

69 篇文章

订阅专栏

LangSplat是CVPR2024的paper. 实现3DGS的语义分割（可文本检索语义）
github: https://github.com/minghanqin/LangSplat?tab=readme-ov-file

主要思想是在3DGS中加入了CLIP的降维语义特征，可用文本检索目标，实现分割。

配置环境：
environment.yml一步一步执行。

conda create -n langsplat python=3.7.13
conda activate langsplat

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install numpy
pip install tqdm
pip install matplotlib
pip install submodules/langsplat-rasterization
pip install submodules/simple-knn
pip install open-clip-torch
pip install mediapy
pip install tensorboard
pip install opencv-python

pip install submodules/segment-anything-langsplat
pip install submodules/langsplat-rasterization
pip install submodules/simple-knn

作者有训练好的ckpt 和 output, 直接下载下来，按照issues里面的step操作了一下。
https://github.com/minghanqin/LangSplat/issues/18

在这里插入图片描述

执行成功，但是在本机效果很差，有点像模型没有被训练，直接用随机值预测的一样。
iou=0.02， localization accuracy=0.1

结果图

RGB：
在这里插入图片描述

apple gt:

在这里插入图片描述

预测的：

在这里插入图片描述

其他的目标也都一样效果不好。

于是决定重新训练一遍。

首先，准备数据集：
下载lerf_ovs。

刚开始里面是images, sparse这两个要用到的文件夹。

然后按github的说明，运行3DGS, 得到一个权重和点云文件。
也就是output这个文件夹全部。
在这里插入图片描述

进入到3DGS的github:https://github.com/graphdeco-inria/gaussian-splatting
按照它的说明。

# HTTPS
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive

python train.py -s ~/dataset/lerf_ovs/teatime

这个模块是要安装的，每修改一次都要重新安装，

pip install submodules/diff-gaussian-rasterization

训练好了，但是没有pth文件
需要这样训练才会保存pth文件

python train.py -s ~/dataset/lerf_ovs/teatime --checkpoint_iterations 30000

然后把输出的output文件夹整个copy到~/dataset/lerf_ovs/teatime下面。

下一步，
提取语言特征

python preprocess.py --dataset_path ~/dataset/lerf_ovs/teatime

训练autoencoder
autoencoder文件夹下的train.py

cd autoencoder
python train.py --dataset_path ~/dataset/lerf_ovs/teatime --dataset_name teatime --encoder_dims 256 128 64 32 3 --decoder_dims 16 32 64 128 256 256 512 --lr 0.0007

训练3D语义特征
autoencoder文件夹下的test.py

python test.py --dataset_path ~/dataset/lerf_ovs/teatime --dataset_name teatime

下面要训练不同feature_level的LangSplat模型
–start_checkpoint是上面训练的3DGS模型
LangSplat文件夹下的train.py,
为了防止重名，可把这个改为train_model.py

python train.py -s ~/dataset/lerf_ovs/teatime -m output/teatime --start_checkpoint ~/dataset/lerf_ovs/teatime/output/teatime/chkpnt30000.pth --feature_level 1
#同样的训练level 2 和 level 3

渲染

 python render.py -s ~/dataset/lerf_ovs/teatime -m output/teatime_1 --include_feature
 #同样的渲染level 2 和 level 3

最后eval, 需要改一下eval.sh中的gt路径。

cd eval
sh eval.sh

现在的效果好多了。
iou=0.5514， localization accuracy=0.7966

现在产生的apple mask

在这里插入图片描述

render.py产生的并不是ply文件，而是npy和png文件, 每张图片分别对应一个.

output/teatime_1/train/ours_None/renders里面是png文件，
是3DGS根据相机的位姿生成对应的渲染语义图片。

在这里插入图片描述
所以其实LangSplat生成的结果是3DGS，
即output/teatime下面的cnkpnt30000.pth,
里面存的有语义信息，通过渲染可以得到不同视角下的语义图片。

12 条评论

程序莫 2025.09.15
为什么我渲染结果还是普通照片，而不是类似博客中那样的语义图
- 2301_79712261回复程序莫 2025.09.17
  那准确率高嘛
- 程序莫回复程序莫 2025.09.17
  渲染的时候带上参数就可以渲染出语义特征图了
- 程序莫回复2301_79712261 2025.09.17
  就只用了公开的数据集，自己的还不知道怎么制作
- 2301_79712261回复程序莫 2025.09.17
  你好我想问一下，你是用自己的数据集还是官方数据呀，准确率高吗

不要辣椒油丶丶 2024.07.27
博主，方便指导一下怎么解决cuda指定问题不，"visibility_filter" : radii > 0, RuntimeError: CUDA error: an illegal memory access was encountered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 我看了好多博客发现都解决不了[face]emoji:054.png[/face]
- 知了0313回复不要辣椒油丶丶 2025.09.19
  我也遇到了这个问题，请问你解决了嘛~
- 涐昰﹃錁溏ゝ回复不要辣椒油丶丶 2024.11.28
  哥，这个问题解决了吗？

不要辣椒油丶丶 2024.07.20
太用心了博主，可以的[face]emoji:003.png[/face]
- weixin_47229965回复C₃H₅N₃O₉227 2025.11.02
  请问你解决吗，因为好像只有lerf才有lable，比如3D-ovs是不是没有啊
- asfgjkas回复C₃H₅N₃O₉227 2025.09.13
  https://github.com/minghanqin/LangSplat/issues/62应该是路径的问题。
- C₃H₅N₃O₉227回复不要辣椒油丶丶 2024.12.18
  你好，请问在执行sh eval.sh时报错： File "evaluate_iou_loc.py", line 89, in eval_gt_lerfdata return gt_ann, (h, w), img_paths UnboundLocalError: local variable 'h' referenced before assignment 请问您有遇见过吗？