python的tril，pad，block_diag在LLM上的使用

农民小飞侠

已于 2023-07-26 08:38:46 修改

阅读量315

点赞数

CC 4.0 BY-SA版权

文章标签： python 开发语言人工智能

于 2023-07-26 08:35:04 首次发布

本文链接：https://blog.youkuaiyun.com/w5688414/article/details/131930661

文章介绍了如何使用numpy进行矩阵拼接和填充，包括下三角矩阵的拼接、二维position_ids的padding处理以及attention_mask的构建方法。还提到了在不同场景中如chatglm中attention_mask的特定构造方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近需要对position_ids和attention_mask进行重构，所以需要掌握numpy的一些操作，以下是一些示例，

多个下三角矩阵拼接：

import numpy as np
from scipy.linalg import block_diag

A = np.ones((2,2))
B = np.ones((3,3)) 

b = [A,B]
print(np.tril(block_diag(*b)))

[[1. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 1. 1. 0.]
 [0. 0. 1. 1. 1.]]

二维的position_ids的padding：

encoded_inputs ={}
encoded_inputs["position_ids"] =np.array([[1,2,3,4],[4,5,6,7]])
difference = 4
encoded_inputs["position_ids"] = np.pad(
                    encoded_inputs["position_ids"], pad_width=[(0, 0), (difference, 0)]
                )

print(encoded_inputs)

{'position_ids': array([[0, 0, 0, 0, 1, 2, 3, 4],
       [0, 0, 0, 0, 4, 5, 6, 7]])}

attention_mask的拼接：

encoded_inputs["attention_mask"] = np.zeros((1,3,3))
encoded_inputs["attention_mask"] = np.pad(
                    encoded_inputs["attention_mask"],
                    pad_width=[(0, 0), (difference, 0), (difference, 0)],
                    mode="constant",
                    constant_values=1,
                )
print(encoded_inputs)

{'position_ids': array([[0, 0, 0, 0, 1, 2, 3, 4],
       [0, 0, 0, 0, 4, 5, 6, 7]]), 'attention_mask': array([[[1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0.]]])}

list里面多个list进行拼接：

inputs =[[1,2],[0,1]]
# inputs =[[[1,2],[0,1]],[[1,2,3,4],[4,5,6,7]]]

# out = sum(inputs,[[]])
out = np.concatenate(inputs, axis=-1)
print(out)

[1 2 0 1]

chatglm里面的attention_mask的创建：

seq_length = 4
context_length=2
attention_mask = np.ones((seq_length, seq_length))
attention_mask = np.tril(attention_mask)
attention_mask[:, :context_length] = 1
attention_mask = (attention_mask < 0.5).astype("int64")
print(attention_mask)

[[0 0 1 1]
 [0 0 1 1]
 [0 0 0 1]
 [0 0 0 0]]

我发现LLM的输入里面attention_mask，position_ids的构造会不一样，其他的都还好，所以这里分享出来，与大家共同进步。