beam_search.py

  1. ransformer.py class Transformer(CaptioningModel)
  2. captioning_model.py
    class CaptioningModel(Module):
	    def beam_search(self, visual: utils.TensorOrSequence, max_len: int, eos_idx: int, beam_size: int, out_size=1,
	                    return_probs=False, **kwargs):
	        bs = BeamSearch(self, max_len, eos_idx, beam_size)
	        return bs.apply(visual, out_size, return_probs, **kwargs)
  1. beam_search.py
def apply(self, visual: utils.TensorOrSequence, out_size=1, return_probs=False, **kwargs):
	        with self.model.statefulness(self.b_s):
	            for t in range(self.max_len):
	                visual, outputs = self.iter(t, visual, outputs, return_probs, **kwargs)

in containers.py

    @contextmanager
    def statefulness(self, batch_size: int):
        self.enable_statefulness(batch_size)
        try:
            yield
        finally:
            self.disable_statefulness()
    def iter(self, t: int, visual: utils.TensorOrSequence, outputs, return_probs, **kwargs):
        cur_beam_size = 1 if t == 0 else self.beam_size
        word_logprob = self.model.step(t, self.selected_words, visual, None, mode='feedback', **kwargs) # ((batch_size*beam_size), cur_len, vocab_size). word_logprob is the output from decoder
        word_logprob = word_logprob.view(self.b_s, cur_beam_size, -1) # (batch_size, beam_size, vocab_size). Take out the token probability of the last time step, that is, the current conditional probability
        candidate_logprob = self.seq_logprob + word_logprob  # (batch_size, beam_size, vocab_size). Calculating the conditional probability of the sequence. because the log is taken, it can be added directly

‘t’ denotes time step. when time step is 0, each sentence will always begin with a special token 'bos’. Thus, cur_beam_size=1.
cur_beam_size = 1 if t == 0 else self.beam_size is same as:

if t==0:
	cur_beam_size=1
else:
	cur_beam_size=self.beam_size

在这里插入图片描述

    def step(self, t, prev_output, visual, seq, mode='teacher_forcing', **kwargs):
        it = None
        if mode == 'teacher_forcing':
            raise NotImplementedError
        elif mode == 'feedback':
            if t == 0:
                self.enc_output, self.mask_enc = self.encoder(visual)
                if isinstance(visual, torch.Tensor):
                    it = visual.data.new_full((visual.shape[0], 1), self.bos_idx).long()
                else:
                    it = visual[0].data.new_full((visual[0].shape[0], 1), self.bos_idx).long()
            else:
                it = prev_output

        return self.decoder(it, self.enc_output, self.mask_enc)

在这里插入图片描述
if t == 0, all the values in “it” are self.bos_idx which is 2. if t == other numbers, ‘it’ will store the previous selected words by beam search method.

c在这里插入图片描述
C5W3L04 Refining Beam Search https://www.youtube.com/watch?v=gb__z7LlN_4

Why do we use log probability?

  1. if you are implementing these probabilities are all numbers much less less than 1 and multiplying a lot of numbers less than 1 result in a tiny tiny number which can result in numerical underflow meaning that is too small for the floating-point representation in your computer to store accurately so in practice instead of maximizing this product, we will take logs which is more numerically stable. Then the log of a product becomes a sum of a log and maximizing this sum of log probability should give you the same result in terms of selecting the most likely sentence because the log function is a strictly monotonically increasing function.

why do we use length normalization?

  1. if you have a very long sentence, the probability is going to be low because you’re multiplying as many terms which are less than 1 together you just tend to end up with a smaller probability. So objective (product one and log one) has a undesirable effect that it may be unnaturally tends to prefer very short sentence beecause the probability of a short sentence is determined just by multiplying fewer of these numbers which are less than 1 and so the product will just be not quite as small. And the same thing is true for the log of a probability that is always less than or equal to 1. You’re actually in this range of the log (blue circle in the slide), so the more terms you have together, the more negative. So there’s one other changeto the algorithm that make it work better. The thing you could do is to normalize this by the number of words in your output sentences and so this takes the average of the log of the probability of each word now. This significantly reduces the penalty for outputting longer sentences.
    In practice, instead of dividing by T which is the number of words in the output sentence, we used a softer approach Ty to the power of alpha where 0<=alpha<=1. So if alpha=1, then you’re completely normalizing by length. if alpha=0, then Ty to the power of 0 would be 1, then you’re just not normalizing at all. if alpha is another parameter, then this is somewhere in between for normalization and no normalization.

在这里插入图片描述
在这里插入图片描述
C5W3L05 Error Analysis of Beam Search https://www.youtube.com/watch?v=ZGUZwk7xIwk
You can figure out whether it is the beam search algorithm that’s causing problems and worth spending time on or whether it might be your RNN model that is causing problems that were spending time by error analysis.

root@llm-test-f9dfff4cb-hgjsf:/workspace/pytorch/vllm-0.7.2# python3 benchmarks/benchmark_serving.py --backend vllm --model /mnt/DeepSeek-R1-bf16 --dataset-name random --host 0.0.0.0 --port 8000 --served_model_name DeepSeek-R1-bf16 --num-prompts 500 --max-concurrency 1 Namespace(backend='vllm', base_url=None, host='0.0.0.0', port=8000, endpoint='/v1/chat/completions', dataset=None, dataset_name='random', dataset_path=None, max_concurrency=1, model='/mnt/DeepSeek-R1-bf16', tokenizer=None, best_of=1, use_beam_search=False, num_prompts=500, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=1024, random_output_len=128, random_range_ratio=1.0, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, tokenizer_mode='auto', served_model_name='DeepSeek-R1-bf16') Starting initial single prompt test run... Traceback (most recent call last): File "/workspace/pytorch/vllm-0.7.2/benchmarks/benchmark_serving.py", line 1241, in <module> main(args) File "/workspace/pytorch/vllm-0.7.2/benchmarks/benchmark_serving.py", line 881, in main benchmark_result = asyncio.run( File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/workspace/pytorch/vllm-0.7.2/benchmarks/benchmark_serving.py", line 567, in benchmark raise ValueError( ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Bad Request解释
最新发布
03-22
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值