报错内容
在运行 stable-diffusion-webui-docker
过程中,依然存在 Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
的报错,具体如下:
auto-1 | File "/stable-diffusion-webui/modules/sd_hijack.py", line 348, in forward
auto-1 | inputs_embeds = self.wrapped(input_ids)
auto-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
auto-1 | return self._call_impl(*args, **kwargs)
auto-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
auto-1 | return forward_call(*args, **kwargs)
auto-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 163, in forward
auto-1 | return F.embedding(
auto-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2264, in embedding
auto-1 | return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
auto-1 | RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
产生原因
运行 stable-diffusion-webui
时遇到过该报错,我在 docker-compose.yaml
启动命令中指定了 --divice-id 0
后恢复正常,但是,在指定 ollama 的 gpu device 为1后,再运行 ollama 和 stable-diffusion-webui
,依然存在该报错。而 stable-diffusion-webui
中的启动,的确是指定了 device id
services:
download:
build: ./services/download/
profiles: ["download"]
volumes:
- *v1
auto: &automatic
<<: *base_service
profiles: ["auto"]
build: ./services/AUTOMATIC1111
image: sd-auto:78
environment:
- CLI_ARGS=--allow-code --medvram --xformers --enable-insecure-extension-access --api --device-id 0
而在 &base-service
中,只提供了 device_ids: ['0']
x-base_service: &base_service
ports:
- "${WEBUI_PORT:-7860}:7860"
volumes:
- &v1 /mnt/data/dify-models/plugins/stable-diffusion/data:/data
- &v2 /mnt/data/dify-models/plugins/stable-diffusion/output:/output
stop_signal: SIGKILL
tty: true
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [compute, utility]
解决方案
修改为 device_ids: ['0', '1']
后能够正常使用