embedding默认存储类型是float32,4B。当文本量非常大的时候,其存储量和计算量都相当大。量化将float32数据转换为低精度的数据,int8或者二进制。
数据准备
在量化前,先展示下原始数据的量级,包含形状和大小
import csv
import numpy as np
from typing import Literal
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
def calculate_data_storage_size(var):
return var.dtype.itemsize * np.prod(var.shape)
model = SentenceTransformer("../../../DataCollection/officials/all-MiniLM-L6-v2",
device='cuda:7')
dataset_path = "../../../DataCollection/embedding_data/quora_duplicate_questions.tsv"
max_corpus_size = 500 # We limit our corpus to only the first 50k questions
corpus_sentences = set()
with open(dataset_path, encoding="utf8") as fIn:
reader = csv.DictReader(fIn, delimiter="\t", quoting=csv.QUOTE_MINIMAL)
for row in reader:
corpus_sentences.add(row["question1"])
corpus_sentences.add(row["question2"])
if len(corpus_sentences) >= max_corpus_size:
break
corpus_sentences = list(corpus_sentences)
# corpus_sentences = ["I am driving to the lake.", "It is a beautiful day."]
normal_embeddings = model.encode(corpus_sentences)
print(f'normal_embeddings shape: {normal_embeddings.shape}')
print(f'normal_embeddings size: {calculate_data_storage_size(normal_embeddings)}')
输出如下,float32的存储大小为768000B
normal_embeddings shape: (500, 384)
normal_embeddings size: 768000
二进制量化 binary
指将嵌入向量中的 float32 值转换为 1 bit 值,从而使内存和存储使用量减少 32 倍。
这是sentence transformers的官方函数
embeddings_binary = quantize_embeddings(normal_embeddings, precision="binary")
print(f'embeddings_binary shape: {embeddings_binary.shape}')
print(f'embeddings_binary size: {calculate_data_storage_size(embeddings_binary)}')
embeddings_binary shape: (500, 48)
embeddings_binary size: 24000
这是手撕代码,只是以0为阈值转换为bit,再将bit拼成uint8,如果要返回int8则减去128
from typing import Literal
def to_binary(embeddings, type:Literal["binary", "ubinary"]="binary"):
binary_mat