一、diffusers版本>=0.32.0时,模型量化问题
1、bitsandbytes,运行正常但是采样速度变慢
需要下载对应分支pip install git+https://github.com/BenjaminBossan/peft@fix-low-cpu-mem-usage-bnb-8bit
transformer_8bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=DiffusersBitsAndBytesConfig(load_in_8bit=True),
torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=transformer_8bit,
torch_dtype=torch.bfloat16,
).to("cuda")pipe.load_lora_weights(
hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"),
adapter_name="hyper-sd"
)
pipe.set_adapters("hyper-sd", adapter_weights=0.125)
2、optimum,导入lora时存在问题,待解决
Traceback (most recent call last):
File "/home/iv/anaconda3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/iv/anaconda3/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/IVAlgoHTTP/app.py", line 112, in process_data
IVmodelHandle.loadModel()
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/IVAlgoHTTP/algos/BaseModel.py", line 28, in loadModel
self.model = GlobalConfig.get_model()
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/IVAlgoHTTP/algos/GlobalConfig.py", line 76, in get_model
modelHandler = model()
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/IVAlgoHTTP/algos/FluxTxt2Img/Flux_txt2img.py", line 145, in __init__
self.pipe.load_lora_weights(hf_hub_download("Shakker-Labs/FLUX.1-dev-LoRA-Logo-Design", "FLUX-dev-lora-Logo-Design.safetensors"), adapter_name="logo")
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/66v/lib/python3.10/site-packages/diffusers/loaders/lora_pipeline.py", line 1559, in load_lora_weights
transformer_lora_state_dict = self._maybe_expand_lora_state_dict(
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/66v/lib/python3.10/site-packages/diffusers/loaders/lora_pipeline.py", line 2085, in _maybe_expand_lora_state_dict
base_weight_param = transformer_state_dict[base_param_name]
KeyError: 'single_transformer_blocks.0.attn.to_k.weight._data'
二、某些lora加载失败
解决办法:diffusers版本过低,升级diffusers
Traceback (most recent call last):
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/IVAlgoHTTP-test/algos/AnystyleImg2Img/newfilter.py", line 1042, in <module>
model = AnyStyleImg2ImgModel()
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/IVAlgoHTTP-test/algos/AnystyleImg2Img/newfilter.py", line 725, in __init__
self.model.pipline.pipe.load_lora_weights(load_file(os.path.join(self.model_root, self.config['SDXL'][self.default_style]['lora_repo'][0]), device=self.device), adapter_name="mix_lora")
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/venv310/lib/python3.10/site-packages/diffusers/loaders/lora.py", line 1255, in load_lora_weights
self.load_lora_into_text_encoder(
File "/home/iv/Algo_new/LouHaijie/IVAlgoHTTP/venv310/lib/python3.10/site-packages/diffusers/loaders/lora.py", line 555, in load_lora_into_text_encoder
rank[rank_key] = text_encoder_lora_state_dict[rank_key].shape[1]
KeyError: 'text_model.encoder.layers.0.self_attn.out_proj.lora_B.weight'
三、如何在diffusers0.31.0版本使用flux-redux
1、安装diffuser0.31.0版本,基础模型的pipeline的类和方法都从该库中调用;
2、从github上安装diffusers0.32.0-release版本,flux-redux的类的方法都从该库中调用;
3、修改diffusers0.32.0-release库:
diffusers/pipelines/pipeline_utils.py:from diffusers import 修改为from [当前库名] import;
4、修改diffusers0.31.0库:diffusers/models/attention_processor.py:
hidden_states = F.scaled_dot_product_attention(query, key, value, dropout_p=0.0, is_causal=False)
修改为:
hidden_states = F.scaled_dot_product_attention(
query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False)
四、diffusers版本为0.31.0时,使用optimum量化,输出图片为噪声
修改optimum/quanto/tensor/weights/qbytes.py.
and torch.cuda.get_device_capability(data.device)[0] >= 8
为
and torch.cuda.get_device_capability(data.device)[0] >= 20
修改后:
if (
qtype == qtypes["qfloat8_e4m3fn"]
and activation_qtype is None
and scale.dtype in [torch.float16, torch.bfloat16]
and len(size) == 2
and (data.device.type == "cuda" and torch.version.cuda)
and axis == 0
and torch.cuda.get_device_capability(data.device)[0] >= 8
):
out_features, in_features = size
if (
in_features >= 64
and out_features >= 64
and (
(in_features % 64 == 0 and out_features % 128 == 0)
or (in_features % 128 == 0 and out_features % 64 == 0)
)
):
return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
五、diffusers要同时设两个lora权重时,有一个不生效果
可能是设权重的方式错了:
错误方式:
self.pipe.set_adapters([LoraA], adapter_weights=[A])
self.pipe.set_adapters([LoraB], adapter_weights=[B])
正确方式:
self.pipe.set_adapters([LoraA, LoraB], adapter_weights=[A, B])