One day

本文介绍了C++中的格式化输入输出方法,如printf()和scanf()的使用技巧;概述了ACM竞赛中常用的变量类型及其适用场景;并提供了一些编程注意事项,如避免常见的编译和运行错误。

1.c的格式化输入输出:
  printf()和scanf()函数可以较好实现格式化输入输出,例子如下:
  printf("%3d",3); //结果为__3 (_为空格)
  printf("%03d%02d",5,3); //结果为 5__3
  printf("%.2lf",3.14156);//结果为3.14
  scanf("%2d",&a);//当输入流为123时,a=12
  scanf("[%2d]",&a);//当输入流为[123456][123]时,a=12
  scanf("%d-%d-%d",&a,&b,&c);//当输入流为123-1-32时,a=123,b=1,c=32
  更高级的还要限制输入字符集等功能,大家有兴趣可以百度
2.ACM常用变量类型:
  int 32位整型变量,能表达数据范围为-2^31~2^31-1
  char 8位整型变量,能表达数据范围为0~255,可表示整数或字符,字符以ASCII码形式存储,char数组可以用于存储字符串
  double 64位浮点型变量,可表示浮点数
  float 32位浮点型变量,可表示浮点数。!!!注意,一般情况请勿用,偶尔会因为一些未知原因导致结果错误,可完全用double代替
  string 无限长度c++字符串变量,只能用cin输入,因其内置众多库函数,必要时刻可用于代替char数组,详见库函数介绍
  如果我说在gcc下long和int的范围是一样大小的,你们别伤心,所以没必要用long,要就用long long
  long long 64位整形变量,为了弥补int数据范围的不足,long long为gcc编译器(linux下)所支持的64位整形变量,范围是-2^64~2^64-1,定义格式为long long a;,输入格式为cin>>a;或者scanf("%lld",&a);输出类似。
  __int64 同long long,不同点为VC6.0等windows下编译器不支持long long,而支持__int64,输入格式为scanf("%I64d",&a);  (深大OJ用的是gcc编译器)
  指针类型及自定义类类型有兴趣自己了解
3.编程注意事项
1)血的教训告诉我们,记得int main()和return 0;
2)不要用c++系统头文件的.h形式(iostream.h等)
3)要对数据范围心里有数,爆int用long long,爆long long换想法,实在不行就尝试高精度
4)codeblocks用的是Linux下编译器,VC6.0及VS各版本用的Windows下编译器,注意区分不同。我们OJ及大多数OJ是Linux下编译器,因此强力推荐codeblocks
5)printf及scanf的速度比cin,cout快,当输入(输出)数据很多的时候,请用scanf(printf),不然Time Case Limit,当然,全部都用c版本输入输出是一种很好的方案。
6)OJ的内存限制一般为65536KB,即指大概最多可开8000000左右的数组,大家在开数组时要有度,但勿吝啬,多开几十的数组是常见做法。
  尽量不要用动态分配(new,malloc,calloc等),不是指不可以,但我们已经在用它们时得到过教训(各种Runtime Error)
  大数组请定义为全局变量,因为一个函数中定义的数组大小是无法达到8000000这种数字的,一般为十万级
7)请不要把VC6.0中允许的错误写法使用到OJ上,唯有Compile Error,且尽少在循环体里定义数据。说的是这种:
   for(int i=0;;) do something;
   for(i=0;;) do something;//你们应该知道这行中的i为未定义
8)OJ的时间限制一般为1ms~10ms,意味着在程度运行时我们可以进行大约千万级次的循环,比如,若有四层for嵌套而每个for都100次的话,将是100*100*100*100,也将是超时Time Case Limit。
9)少用指针
4.常见库函数
math.h
  sqrt(n) 求浮点数的平方根
  pow(x,y) 求x的y次方
  sin() cos() tan() asin() acos() atan() 求浮点数的三角函数,其中acos(-1.0)即PI的浮点值
  log(x) 求x的ln
  exp(x) 求e的x次方
  log10(x) 求x的lg
  fabs(x) 求浮点数x的绝对值
stdlib.h
  abs(x) 求整型变量x的绝对值
string.h
  strlen(s) 求char数组s[]的字符串长度
  strcpy(x,y) 把字符串y拷贝为字符串x
  strcmp(x,y) 按字典序比较x和y的大小,x小于y返回-1,x等于y返回0,x大于y返回1
  algorithm
  abs() 求x的绝对值
  sort(a,a+n) 对数组a进行升序排序,n为数组长度
就那么多,其它的以后再了解。

from turtledemo.penrose import start import numpy as np import torch import time import math torch.set_printoptions(8) def gelu(x): """ Task: Use the torch API to implement the approximate calculation formula of the `GELU` activation function. The formula is as follows (you need to paste it into the latex online conversion website) Website: https://www.latexlive.com/ Input: Tensor Output: Tensor """ return 0.5*x*(1+torch.tanh(math.sqrt(2/math.pi)*(x+0.044715*torch.pow(x,3)))) def softmax(x,dim=-1): """ Task: Use torch API to implement `softmax` function, search the specific formula by yourself Input: Tensor Output: Tensor """ x_max=torch.max(x,dim=dim,keepdim=True).values x_stable=x-x_max exp_x=torch.exp(x_stable) return exp_x/torch.sum(exp_x,dim=dim,keepdim=True) def layer_norm(x, g_b, eps:float = 1e-5): """ Task: Use torch API to implement `layernorm` function, search `layernorm` by yourself Input: x: Tensor g_b: dictionary that load from gpt2 weight. g-gamma and b-bias are the keys Output: Tensor """ """ if torch.isnan(x).any(): print("Nan\n") assert(0)""" g, b = torch.Tensor(g_b['g']), torch.Tensor(g_b['b']) x = x.clone().detach().to(torch.float32) g=g.to(x.device) b=b.to(x.device) normalized_shape=g.shape dims=list(range(-len(normalized_shape),0)) #warning # if not isinstance(x, torch.Tensor): # x = torch.tensor(x, dtype=torch.float32) x = x.float() # print(x,"/n") mean=x.mean(dim=-1,keepdim=True) var=x.var(dim=-1,keepdim=True) x_=(x-mean)/torch.sqrt(var+eps) #print(x_*g+b,"/n") return x_*g+b def linear(x, w_b): # [m, in], [in, out], [out] -> [m, out] """ Task: implement linear layer Input: x: Tensor w_b: dictionary that load from gpt2 weight. w-weight and b-bias are the keys Output: Tensor """ w, b = torch.Tensor(w_b['w']), torch.Tensor(w_b['b']) w,b=w.to(x.device),b.to(x.device) # print(torch.matmul(x,w)+b) return torch.matmul(x,w)+b #warning def ffn(x, mlp): # [n_seq, n_embd] -> [n_seq, n_embd] """ Task: use `gelu` `linear` to implement ffn Notes: x --linear--> --gelu--> --linear--> output Input: x: Tensor mlp: dictionary that load from gpt2 weight. w_b1 and w_b2 are the params of two linear layer Output: Tensor """ w_b1, w_b2 = mlp['c_fc'], mlp['c_proj'] # print(x,"\n") x=linear(x,w_b1) x=gelu(x) x=linear(x,w_b2) return x def attention(q, k, v, mask,past_kv=None): # [n_q, d_k], [n_k, d_k], [n_k, d_v], [n_q, n_k] -> [n_q, d_v] """ Task: use torch API to implement attention computation according to formula(1) of the following paper where d_k account for the last dimension of `k` Paper: https://arxiv.org/abs/1706.03762 Input: q: Tensor k: Tensor v: Tensor mask: Tensor mlp: dictionary that load from gpt2 weight. w_b1 and w_b2 are the params of two linear layer Output: Tensor """ if past_kv is not None: past_key,past_value= past_kv k=torch.cat([past_key,k],dim=0) v=torch.cat([past_value,v],dim=0) current_k=(k,v) atten_score=torch.matmul(q,k.transpose(-2,-1)) d_k=k.size(-1) atten_score=atten_score/torch.sqrt(torch.tensor(d_k,dtype=atten_score.dtype)) if mask is not None: if past_kv is not None: seq_len=k.size(0) #warning causal_mask=torch.triu(torch.ones(seq_len,seq_len)*-1e9,diagonal=1) causal_mask=causal_mask[-q.size(0):] else: causal_mask=mask atten_score=atten_score.masked_fill_(causal_mask==0,-1e9) atten_weights=softmax(atten_score) output=torch.matmul(atten_weights,v) return output,current_k def mha(x, attn, n_head,past_kv=None): # [n_seq, n_embd] -> [n_seq, n_embd] """ Task: Complete the code of the multi-head attention Input: x: Tensor attn: dictionary that load from gpt2 weight. c_attn and c_proj are the params of two linear layer n_head: number of head Output: Tensorying multi-head attention and linear transformation, shape [n_seq, n_embd]. """ c_attn, c_proj = attn['c_attn'], attn['c_proj'] # qkv projection #print(x,"/n") x = linear(x, c_attn) # [n_seq, n_embd] -> [n_seq, 3*n_embd] #print(x,"/n") # Split into qkv """ Task: Split the q,k,v matrix from the tensor x Notes: [n_seq, 3*n_embd] -> 3 * [n_seq, n_embd] """ n_seq,n_embd=x.shape n_embd_total=n_embd//3 q,k,v=torch.split(x,n_embd_total,dim=-1) qkv =[q,k,v] # need to modify #warning4 # Split into heads qkv_heads = [qkv_part.chunk(n_head, dim=-1) for qkv_part in qkv] # 3 * [n_seq, n_embd] -> 3 * n_head * [n_seq, n_embd/n_head] qkv_heads = list(zip(*qkv_heads)) # [3, n_head, n_seq, n_embd/n_head] # Causal mask to hide future inputs from being attended to """ Task: Construct mask matrix Notes: | 0 -inf -inf ... -inf | | 0 0 -inf ... -inf | | 0 0 0 ... -inf | |... ... ... ... ... | | 0 0 0 ... 0 | Mask is a tensor whose dimension is [n_seq, n_seq] """ if past_kv is None: past_kv_per_head = [None] * n_head else: past_kv_per_head = past_kv if past_kv is None: causal_mask = torch.triu(torch.ones(n_seq, n_seq) * -1e9, diagonal=1) else: causal_mask = None causal_mask = torch.triu(torch.ones(n_seq, n_seq)*-1e9, diagonal=1)#warning3 # need to modify out_heads=[] new_kv_per_head=[] for i,(q,k,v) in enumerate(qkv_heads):#warning out_head,new_kv=attention(q,k,v,causal_mask,past_kv_per_head[i]) out_heads.append(out_head) new_kv_per_head.append(new_kv) # Perform attention over each head # Merge heads """ Task: merge multi-heads results Notes: n_head * [n_seq, n_embd/n_head] --> [n_seq, n_embd] """ # print(x,"/n") x = torch.cat(out_heads,dim=-1) # need to modify # # Out projection x = linear(x, c_proj) # [n_seq, n_embd] -> [n_seq, n_embd] #print(x,"/n") return x,new_kv_per_head def transformer_block(x, block, n_head,past_kv=None): # [n_seq, n_embd] -> [n_seq, n_embd] mlp, attn, ln_1, ln_2 = block['mlp'], block['attn'], block['ln_1'], block['ln_2'] # print(x,"/n") # multi-head causal self attention #print(x,"/n") attn_out,new_kv=mha(layer_norm(x,ln_1),attn,n_head,past_kv) x = x + attn_out # [n_seq, n_embd] -> [n_seq, n_embd] #problem A!!!! #print(x,"/n") # position-wise feed forward network x = x + ffn(layer_norm(x, ln_2), mlp) # [n_seq, n_embd] -> [n_seq, n_embd] return x,new_kv def gpt2(inputs, params, n_head,past_kvs=None): # [n_seq] -> [n_seq, n_vocab] wte, wpe, blocks, ln_f = params['wte'], params['wpe'], params['blocks'], params['ln_f'] # token + positional embeddings wte=torch.Tensor(wte) wpe=torch.Tensor(wpe) if past_kvs is None: x = wte[inputs] + wpe[range(len(inputs))] # [n_seq] -> [n_seq, n_embd] start_pos=0 else: x=wte[inputs[-1:]]+wpe[[len(inputs)-1]] start_pos=len(inputs)-1 # print(x.shape,params,n_head,"1/n") # x = transformer_block(x, blocks, n_head=n_head) #print(x.shape,"2/n") #print(x,"/n") # forward pass through n_layer transformer blocks x=torch.Tensor(x) new_past_kvs=[] for i,block in enumerate(blocks): past_kv=past_kvs[i] if past_kvs is not None else None #warning x ,new_kv= transformer_block(x, block, n_head=n_head,past_kv=past_kv) # [n_seq, n_embd] -> [n_seq, n_embd] new_past_kvs.append(new_kv) # projection to vocab # print(x,"/n") x = layer_norm(x, ln_f) # [n_seq, n_embd] -> [n_seq, n_embd] return x @ wte.T,new_past_kvs # [n_seq, n_embd] -> [n_seq, n_vocab] def apply_repetition_penalty(logits, generated_tokens, penalty=1.2): for token in set(generated_tokens[-20:]): if token < len(logits): logits[token] = logits[token] / penalty return logits def generate(inputs, params, n_head, n_tokens_to_generate,temperature=0.8, repetition_penalty=1.2): from tqdm import tqdm past_kvs = None generated=inputs.copy() for _ in tqdm(range(n_tokens_to_generate), "generating"): # auto-regressive decode loop logits ,past_kvs= gpt2(generated, params, n_head,past_kvs) # model forward pass # next_id = np.argmax(logits[-1]) # greedy sampling #warning # print(logits,"/n") # inputs.append(int(next_id)) # append prediction to input last_logits=logits[-1] last_logits=apply_repetition_penalty(last_logits, generated, repetition_penalty) if temperature>0: last_logits=last_logits/temperature probs=softmax(last_logits,dim=-1) next_id=torch.multinomial(probs,num_samples=1).item() else:next_id=torch.argmax(last_logits).item() generated.append(next_id) return generated[len(inputs):] # only return generated ids def greedy_speculative_generate(inputs, draft_params, target_params, hparams_draft, hparams_target, n_tokens_to_generate, K): """ Task: Load 124M and 1558M models at the same time, use greedy sampling, and complete speculative decoding Inputs: inputs (list): The initial list of token IDs from the prompt. draft_params, target_params: Model weights for the draft and target models. hparams_draft, hparams_target: Hyperparameters for both models. n_tokens_to_generate (int): The number of new tokens to generate. K (int): The number of tokens the draft model speculates at each step (e.g., 4). Returns: list: A list of newly generated token IDs. """ draft_past_kvs=None target_past_kvs=None generated_ids = [] current_inputs = list(inputs) while len(generated_ids) < n_tokens_to_generate: draft_tokens=[] draft_inputs=list(current_inputs) draft_current_past_kvs=draft_past_kvs for _ in range(K): if len(generated_ids)+len(draft_tokens)>=n_tokens_to_generate: break logits=gpt2(draft_inputs,draft_params,hparams_draft['n_head'],past_kvs=draft_current_past_kvs) #warning5 next_id=np.argmax(logits[-1]) draft_tokens.append(next_id) draft_inputs.append(next_id) if not draft_tokens: break # warning6 target_input=current_inputs+draft_tokens target_current_past_kvs=target_past_kvs target_logits,target_current_past_kvs=gpt2(target_input,target_params,hparams_target['n_head'],target_current_past_kvs) # warning 7 accepted_token=[] for i, draft_token in enumerate(draft_tokens): target_position=len(current_inputs)+i target_token=np.argmax(target_logits[target_position].detach().numpy()) if draft_token==target_token: accepted_token.append(draft_token) else: accepted_token.append(target_token) break #print(accepted_token,"/n") if len(accepted_token)==len(draft_tokens): draft_past_kvs=draft_current_past_kvs target_past_kvs=target_current_past_kvs else: draft_past_kvs=None generated_ids.extend(accepted_token) current_inputs.extend(accepted_token) return generated_ids def main(prompt: str, n_tokens_to_generate: int = 5, model_size: str = "124M", models_dir: str = "models"): from utils import load_encoder_hparams_and_params # load encoder, hparams, and params from the released open-ai gpt-2 files encoder, hparams, params = load_encoder_hparams_and_params(model_size, models_dir) # encode the input string using the BPE tokenizer input_ids = encoder.encode(prompt) # make sure we are not surpassing the max sequence length of our model assert len(input_ids) + n_tokens_to_generate < hparams["n_ctx"] # generate output ids start = time.time() output_ids = generate(input_ids, params, hparams["n_head"], n_tokens_to_generate,temperature=0.7,repetition_penalty=1.2) end = time.time() print(f"Time taken to generate {n_tokens_to_generate} tokens: {end - start:.2f}s") # print("/n output_ids/n",output_ids) # decode the ids back into a string output_text = encoder.decode(output_ids) # print(output_text) return output_text if __name__ == "__main__": import fire fire.Fire(main)这个大模型的输出是 that become become become one day become one day one day become one day one one day one day one day one one one one one one one one one one one one one one one one one one one预计输出是the most powerful machine on the planet,怎么修改得到正确结果
10-14
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值