ms-swift官方中文文档
https://swift.readthedocs.io/zh-cn/latest/BestPractices/Reranker.html
原文
默认会从每条数据中取出MAX_POSITIVE_SAMPLES条正样本和MAX_NEGATIVE_SAMPLES条负样本,每条正样本会和MAX_NEGATIVE_SAMPLES条负样本组成一个group,因此每条数据会扩展成MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES)条数据。 如果数据中正例/负例数量不足,会取全部正例/负例,如果数据中正例和负例数量超过MAX_POSITIVE_SAMPLES和MAX_NEGATIVE_SAMPLES,会进行随机采样。 IMPORTANT:展开后的数据会放在同一个batch中,因此每个设备上的实际批处理大小(effective batch size)将是 per_device_train_batch_size × MAX_POSITIVE_SAMPLES × (1 + MAX_NEGATIVE_SAMPLES)。请注意调整 per_device_train_batch_size 以避免显存不足。
MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES) 为什么这里是1+?而不是MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES) ?
这与他的训练范式有关,他本来就是point2point的,只是损失函数有区别
什么是point2point?直接给prompt
比如
你是一名优秀的数据专家,请从refer_doc中选择与用户query最相关的数据
<query>
迪迦奥特曼是哪一年播出的
</query>
<refer_doc>
a.迪迦奥特曼是1996年在日本首映的
b.迪迦奥特曼的人间体是大古
c.盖亚奥特曼是大地毁灭者
</refer_doc>
你是一名优秀的数据专家,请从refer_doc中选择与用户query最相关的数据
<query>
迪迦奥特曼是哪一年播出的
</query>
<refer_doc>
a.迪迦奥特曼是1996年在日本首映的
</refer_doc>
你是一名优秀的数据专家,请从refer_doc中选择与用户query最相关的数据
<query>
迪迦奥特曼是哪一年播出的
</query>
<refer_doc>
a.迪迦奥特曼的人间体是大古
</refer_doc>
你是一名优秀的数据专家,请从refer_doc中选择与用户query最相关的数据
<query>
迪迦奥特曼是哪一年播出的
</query>
<refer_doc>
a.盖亚奥特曼是大地毁灭者
</refer_doc>
query Official Chinese Documentation of ms-swift
https://swift.readthedocs.io/zh-cn/latest/BestPractices/Reranker.html
Original Text
By default, MAX_POSITIVE_SAMPLES positive samples and MAX_NEGATIVE_SAMPLES negative samples will be taken from each data point. Each positive sample will be paired with MAX_NEGATIVE_SAMPLES negative samples to form a group. Therefore, each data point will be expanded into MAX_POSITIVE_SAMPLES x (1 + MAX_NEGATIVE_SAMPLES) data points. If the number of positive/negative examples in the data is insufficient, all positive/negative examples will be taken. If the number of positive and negative examples exceeds MAX_POSITIVE_SAMPLES and MAX_NEGATIVE_SAMPLES, random sampling will be performed. IMPORTANT: The expanded data will be placed in the same batch, so the actual batch size on each device (effective batch size) will be per_device_train_batch_size × MAX_POSITIVE_SAMPLES × (1 + MAX_NEGATIVE_SAMPLES). Please adjust per_device_train_batch_size to avoid running out of GPU memory.
Why is it 1+ here instead of MAX_POSITIVE_SAMPLESx(1 + MAX_NEGATIVE_SAMPLES) ?
This is related to his training paradigm. He is originally point2point, just with a different loss function.
What is point2point? Just give the prompt.
For example
You are an excellent data expert, please select the data most relevant to the user's query from refer_doc
<query>
When was Ultraman Tiga broadcasted?
</query>
<refer_doc>
a. Ultraman Tiga premiered in Japan in 1996
b. Ultraman Tiga's human form is Takeru
c. Gaia Ultraman is the Earth Destroyer
</refer_doc>
point2point
You are an excellent data expert, please select the most relevant data from refer_doc for the user's query
<query>
When was Ultraman Tiga broadcasted
</query>
<refer_doc>
a. Ultraman Tiga premiered in Japan in 1996
</refer_doc>
You are an excellent data expert, please select the data from refer_doc that is most relevant to the user's query
<query>
When was Ultraman Tiga aired
</query>
<refer_doc>
a. Tiga Ultraman's human form is Gao
</refer_doc>
You are an excellent data expert, please select the most relevant data from refer_doc to the user's query
<query>
When was Ultraman Tiga aired?
</query>
<refer_doc>
a. Ultraman Gaia is the Earth Destroyer
</refer_doc>
MAX_POSITIVE_SAMPLES x (1 + MAX_NEGATIVE_chouchSAMPLES)
为什么是1 + MAX_NEGATIVE_SAMPLES?
例子
MAX_POSITIVE_SAMPLES = 1
MAX_NEGATIVE_chouchSAMPLES = 2
{
query: a
pos:[A]
neg:[B,C]
}
a-A a-B a-C
1x(2+1) =3 //待入官方公式,对上了
因为他最后都是point2point的,
https://huggingface.co/Qwen/Qwen3-Reranker-0.6B
可以看出来输入一个query和多个docs 请求reranker的时候
还是通过推理 query-doc 来解决的
至于listwise ,也只是拿到point2point的loss后再编排而已,怎么编排让point2point的效果更好
4366

被折叠的 条评论
为什么被折叠?



