10,DPO起始loss都是0.7,kto起始loss都是0.5
dpo_loss = -F.logsigmoid(self.args.dpo_beta * (pi_logratios - ref_logratios))
kto_loss = 1 - F.sigmoid(self.args.kto_beta * (chosen_logratios - KL))
- 因为刚开始ref_model=model,所以
- dpo_loss=−logσ(β∗0)=log2=0.6931dpo\_loss=-\log\sigma(\beta*0)=log2=0.6931dpo_loss=−logσ(β∗0)=log2=0.6931
- kto_loss=1−σ(beta∗0)=0.5kto\_loss=1-\sigma(beta*0)=0.5kto_loss=1−σ(beta∗0)