The append_features method of Module

本文探讨了Ruby中如何通过覆盖append_features方法为特定类添加来自模块的单例方法。通过具体示例代码展示了这一过程,并强调了调用super的重要性。
In the same way that the class could include some singleton methods,the Module could include some singleton methods,too.But if we implement a class which include a Module which contians some singleton methods,does these methods become to the class's singleton methods?The answer is 'no'.There is a hook called append_features that we can override.It's called with a parameter that is the destination class or module.For an example of its use,see the following codes:
ruby 代码
 
  1. module MyMod  
  2.   def MyMod.append_features(someClass)  
  3.     def someClass.modmeth  
  4.       puts "Module (class) method"  
  5.     end  
  6.     super  
  7.   end  
  8. end  
  9. class AppendFeatures  
  10.   include MyMod  
  11.   def initialize  
  12.       
  13.   end  
  14. end  
  15.   
  16. AppendFeatures.modmeth  

Note:
you should call the "super" when you override the append_features method.

It's very interesting.
These code copy from the <ruby way=""> book.</ruby>
class _PointnetSAModuleBase(nn.Module): def init(self): super().init() self.npoint = None self.groupers = None self.mlps = None self.pool_method = 'max_pool' def forward(self, xyz: torch.Tensor, features: torch.Tensor = None, new_xyz=None) -> (torch.Tensor, torch.Tensor): """ :param xyz: (B, N, 3) tensor of the xyz coordinates of the features :param features: (B, N, C) tensor of the descriptors of the the features :param new_xyz: :return: new_xyz: (B, npoint, 3) tensor of the new features' xyz new_features: (B, npoint, \sum_k(mlps[k][-1])) tensor of the new_features descriptors """ new_features_list = [] xyz_flipped = xyz.transpose(1, 2).contiguous() if new_xyz is None: new_xyz = pointnet2_utils.gather_operation( xyz_flipped, pointnet2_utils.furthest_point_sample(xyz, self.npoint) ).transpose(1, 2).contiguous() if self.npoint is not None else None for i in range(len(self.groupers)): new_features = self.groupers[i](xyz, new_xyz, features) # (B, C, npoint, nsample) new_features = self.mlpsi # (B, mlp[-1], npoint, nsample) if self.pool_method == 'max_pool': new_features = F.max_pool2d( new_features, kernel_size=[1, new_features.size(3)] ) # (B, mlp[-1], npoint, 1) elif self.pool_method == 'avg_pool': new_features = F.avg_pool2d( new_features, kernel_size=[1, new_features.size(3)] ) # (B, mlp[-1], npoint, 1) else: raise NotImplementedError new_features = new_features.squeeze(-1) # (B, mlp[-1], npoint) new_features_list.append(new_features) return new_xyz, torch.cat(new_features_list, dim=1)你可以给我详细讲解一下这个模块吗,一个语句一个语句的来讲解
05-24
self.SA_modules.append( nn.Sequential( PointnetSAModuleMSG( npoint=cfg.RPN.SA_CONFIG.NPOINTS[k], radii=cfg.RPN.SA_CONFIG.RADIUS[k], nsamples=cfg.RPN.SA_CONFIG.NSAMPLE[k], mlps=mlps, use_xyz=use_xyz, bn=cfg.RPN.USE_BN ), SelfAttention(channel_out) ) )这是SA_modules的定义代码块,而 for i in range(len(self.SA_modules)): li_xyz, li_features = self.SA_modules[i](l_xyz[i], l_features[i]) l_xyz.append(li_xyz) l_features.append(li_features)是SA_modules的调用代码块,而这是PointnetSAModuleMSG类的父类的代码:class _PointnetSAModuleBase(nn.Module): def init(self): super().init() self.npoint = None self.groupers = None self.mlps = None self.pool_method = 'max_pool' def forward(self, xyz: torch.Tensor, features: torch.Tensor = None, new_xyz=None) -> (torch.Tensor, torch.Tensor): """ :param xyz: (B, N, 3) tensor of the xyz coordinates of the features :param features: (B, N, C) tensor of the descriptors of the the features :param new_xyz: :return: new_xyz: (B, npoint, 3) tensor of the new features' xyz new_features: (B, npoint, \sum_k(mlps[k][-1])) tensor of the new_features descriptors """ new_features_list = [] xyz_flipped = xyz.transpose(1, 2).contiguous() if new_xyz is None: new_xyz = pointnet2_utils.gather_operation( xyz_flipped, pointnet2_utils.furthest_point_sample(xyz, self.npoint) ).transpose(1, 2).contiguous() if self.npoint is not None else None for i in range(len(self.groupers)): new_features = self.groupers[i](xyz, new_xyz, features) # (B, C, npoint, nsample) new_features = self.mlpsi # (B, mlp[-1], npoint, nsample) if self.pool_method == 'max_pool': new_features = F.max_pool2d( new_features, kernel_size=[1, new_features.size(3)] ) # (B, mlp[-1], npoint, 1) elif self.pool_method == 'avg_pool': new_features = F.avg_pool2d( new_features, kernel_size=[1, new_features.size(3)] ) # (B, mlp[-1], npoint, 1) else: raise NotImplementedError new_features = new_features.squeeze(-1) # (B, mlp[-1], npoint) new_features_list.append(new_features) return new_xyz, torch.cat(new_features_list, dim=1);运行时程序报错提示我在调用SA_modules时传递的三个参数,现在看来应该是多出了参数channel_out,我该怎么修改代码才能让SA_modules顺利接受三个参数并正常运行
05-23
class PointnetSAModuleMSG(_PointnetSAModuleBase): """ Pointnet set abstraction layer with multiscale grouping and attention mechanism """ def init(self, *, npoint: int, radii: List[float], nsamples: List[int], mlps: List[List[int]], bn: bool = True, use_xyz: bool = True, pool_method='max_pool', instance_norm=False): """ :param npoint: int :param radii: list of float, list of radii to group with :param nsamples: list of int, number of samples in each ball query :param mlps: list of list of int, spec of the pointnet before the global pooling for each scale :param bn: whether to use batchnorm :param use_xyz: :param pool_method: max_pool / avg_pool :param instance_norm: whether to use instance_norm """ super().init() assert len(radii) == len(nsamples) == len(mlps) self.npoint = npoint self.groupers = nn.ModuleList() self.mlps = nn.ModuleList() # Add attention module self.attentions = nn.ModuleList() for i in range(len(radii)): radius = radii[i] nsample = nsamples[i] self.groupers.append( pointnet2_utils.QueryAndGroup(radius, nsample, use_xyz=use_xyz) if npoint is not None else pointnet2_utils.GroupAll(use_xyz) ) mlp_spec = mlps[i] if use_xyz: mlp_spec[0] += 3 # Add attention module for each scale self.attentions.append(Attention(mlp_spec[-1])) self.mlps.append(pt_utils.SharedMLP(mlp_spec, bn=bn, instance_norm=instance_norm)) self.pool_method = pool_method def forward(self, xyz, features): """ :param xyz: (B, N, 3) xyz coordinates of the points :param features: (B, N, C) input features :return: (B, npoint, mlp[-1]) tensor """ new_features_list = [] for i in range(len(self.groupers)): grouper = self.groupers[i] mlp = self.mlps[i] attention = self.attentions[i] # Group points and features grouped_xyz, grouped_features = grouper(xyz, features) # Apply MLP to each group grouped_features = mlp(grouped_features) # Apply attention mechanism to the features of each group grouped_features = attention(grouped_features) # Perform pooling over each group if self.pool_method == 'max_pool': pooled_features = torch.max(grouped_features, dim=2)[0] else: pooled_features = torch.mean(grouped_features, dim=2) new_features_list.append(pooled_features) # Concatenate features from different scales new_features = torch.cat(new_features_list, dim=1) return new_features在该类中使用的QueryAndGroup类会主动将该类所继承的父类的返回值传入QueryAndGroup类中的forward函数吗
05-22
可以帮我详细解释一下嘛class AwqQuantizer: def __init__( self, awq_model, model, tokenizer, w_bit, group_size, zero_point, version, calib_data, split, text_column, duo_scaling, modules_to_not_convert=None, export_compatible=False, apply_clip=True, n_parallel_calib_samples=None, max_calib_samples=128, max_calib_seq_len=512, max_chunk_memory=1024 * 1024 * 1024, ) -> None: self.awq_model = awq_model self.model = model self.tokenizer = tokenizer self.w_bit = w_bit self.group_size = group_size self.zero_point = zero_point self.version = version self.calib_data = calib_data self.split = split self.text_column = text_column self.duo_scaling = duo_scaling self.export_compatible = export_compatible self.apply_clip = apply_clip self.n_parallel_calib_samples = n_parallel_calib_samples self.max_calib_samples = max_calib_samples self.max_calib_seq_len = max_calib_seq_len self.max_chunk_memory = max_chunk_memory self.modules_to_not_convert = ( modules_to_not_convert if modules_to_not_convert is not None else [] ) self.modules, self.module_kwargs, self.inps = self.init_quant( n_samples=self.max_calib_samples, max_seq_len=self.max_calib_seq_len ) def pseudo_quantize_tensor(self, w: torch.Tensor): # 量化 org_w_shape = w.shape if self.group_size > 0: # 分组 assert org_w_shape[-1] % self.group_size == 0, f"org_w_shape ({org_w_shape[-1]}) must be a multiple of group_size ({self.group_size})!" w = w.reshape(-1, self.group_size) assert w.dim() == 2 assert torch.isnan(w).sum() == 0 # 非对称量化 if self.zero_point: max_val = w.amax(dim=1, keepdim=True) min_val = w.amin(dim=1, keepdim=True) max_int = 2**self.w_bit - 1 min_int = 0 scales = (max_val - min_val).clamp(min=1e-5) / max_int zeros = (-torch.round(min_val / scales)).clamp_(min_int, max_int) w = ( torch.clamp(torch.round(w / scales) + zeros, min_int, max_int) - zeros ) * scales zeros = zeros.view(org_w_shape[0], -1) else: # 对称量化 max_val = w.abs().amax(dim=1, keepdim=True) max_val = max_val.clamp(min=1e-5) max_int = 2 ** (self.w_bit - 1) - 1 min_int = -(2 ** (self.w_bit - 1)) scales = max_val / max_int zeros = None w = torch.clamp(torch.round(w / scales), min_int, max_int) * scales assert torch.isnan(scales).sum() == 0 assert torch.isnan(w).sum() == 0 scales = scales.view(org_w_shape[0], -1) w = w.reshape(org_w_shape) return w, scales, zeros def pseudo_dequantize_tensor( # 反量化 self, w: nn.Linear, scales: torch.Tensor, zeros: Optional[torch.Tensor] = None ): # get repeated count repeat_count = w.weight.data.shape[-1] // scales.shape[-1] scales = scales.repeat(1, repeat_count).reshape(w.weight.data.shape) # dequantize if self.zero_point: zeros = zeros.repeat(1, repeat_count).reshape(w.weight.data.shape) w = (w.weight.data - zeros) * scales else: w = w.weight.data * scales return w def quantize(self): for i in tqdm(range(len(self.modules)), desc="AWQ"): # 遍历模块列表 # Move module and inputs to correct device common_device = next(self.modules[i].parameters()).device if common_device is None or str(common_device) == "cpu": if torch.cuda.is_available(): best_device = "cuda:" + str(i % torch.cuda.device_count()) else: best_device = get_best_device() self.modules[i] = self.modules[i].to(best_device) common_device = next(self.modules[i].parameters()).device if self.module_kwargs.get("position_ids") is not None: self.module_kwargs["position_ids"] = self.module_kwargs[ "position_ids" ].to(common_device) if self.module_kwargs.get("attention_mask") is not None: self.module_kwargs["attention_mask"] = self.module_kwargs[ "attention_mask" ].to(common_device) self.inps = self.inps.to(common_device) # We need to move the rotary embedding every time we move to a new module. # Transformers 4.45.0 moved rotary embedding to model definition as of this PR: # https://github.com/huggingface/transformers/pull/32617 self.awq_model.move_embed(self.model, common_device) # Transformers >= 4.48.0 requires positional embeddings should be computed before forward pass if ( transformers.__version__ >= "4.48.0" and self.module_kwargs.get("position_embeddings") is None ): self.module_kwargs["position_embeddings"] = self.model.model.rotary_emb( self.inps, self.module_kwargs["position_ids"] ) if (transformers.__version__ >= "4.48.0" and self.module_kwargs.get('attention_mask') is None): self.module_kwargs['attention_mask'] = None for k, v in self.module_kwargs.items(): # position embeddings found in tuple if isinstance(v, tuple): self.module_kwargs[k] = tuple( item.to(common_device) if isinstance(item, (torch.Tensor, nn.Module)) else item for item in v ) # [STEP 1]: Get layer, extract linear modules, extract input features 提取模块中的所有线性层 named_linears = get_named_linears(self.modules[i]) # Filter out the linear layers we don't want to exclude 过滤掉不需要量化的线性层(由 modules_to_not_convert 控制) named_linears = exclude_layers_to_not_quantize( named_linears, self.modules_to_not_convert ) input_feat = self._get_input_feat(self.modules[i], named_linears) clear_memory() # [STEP 2]: Compute and apply scale list module_config: List[Dict] = self.awq_model.get_layers_for_scaling( self.modules[i], input_feat, self.module_kwargs ) scales_list = [ # 对每一层搜索最佳缩放因子(scale) self._search_best_scale(self.modules[i], **layer) for layer in module_config ] apply_scale(self.modules[i], scales_list, input_feat_dict=input_feat) # 将缩放因子应用到模块中,调整输入或权重 scales_list = append_str_prefix( scales_list, get_op_name(self.model, self.modules[i]) + "." ) # [STEP 3]: Compute and apply clipping list 计算并应用裁剪因子 if self.apply_clip: clip_list = self._search_best_clip( self.modules[i], named_linears, input_feat ) apply_clip(self.modules[i], clip_list) clip_list = append_str_prefix( clip_list, get_op_name(self.model, self.modules[i]) + "." ) # [STEP 4]: Quantize weights 量化权重 if not self.export_compatible: self._apply_quant(self.modules[i], named_linears) clear_memory() def pack(self): for i in tqdm(range(len(self.modules)), desc="Packing"): named_linears = get_named_linears(self.modules[i]) named_linears = exclude_layers_to_not_quantize( named_linears, self.modules_to_not_convert ) self._apply_quant(self.modules[i], named_linears) clear_memory() def _apply_quant(self, module, named_linears: Dict[str, nn.Linear]): for name, linear_layer in named_linears.items(): # NOTE: small regression in perplexity if linear layer uses .cpu().float() linear_layer = linear_layer.to(get_best_device()).half() linear_layer.weight.data, scales, zeros = self.pseudo_quantize_tensor( linear_layer.weight.data ) # 得到量化后的权重、缩放因子和零点 if self.version == "gemm": # 根据 version 选择不同的量化线性层实现 scales = scales.t().contiguous() if zeros is not None: zeros = zeros.t().contiguous() q_linear_module = WQLinear_GEMM elif self.version == "gemv": q_linear_module = WQLinear_GEMV elif self.version == "marlin": q_linear_module = WQLinear_Marlin elif self.version == "gemv_fast": q_linear_module = WQLinear_GEMVFast else: raise ValueError(f"Unknown version {self.version}") q_linear = q_linear_module.from_linear( linear=linear_layer, w_bit=self.w_bit, group_size=self.group_size, init_only=False, scales=scales, zeros=zeros, ) linear_layer.cpu() q_linear.to(next(module.parameters()).device) set_op_by_name(module, name, q_linear) clear_memory() @torch.no_grad() def _module_forward( self, x: torch.Tensor, module: torch.nn.Module, module_kwargs: Dict ) -> torch.Tensor: if self.n_parallel_calib_samples is None: # runs through all samples at once module_output = module(x, **module_kwargs) if isinstance(module_output, tuple): module_output = module_output[0] else: # memory efficiently runs through all calibration samples # but only n_parallel_calib_samples at a time module_output = [] partitioned_inputs = torch.split(x, self.n_parallel_calib_samples) for x_partial in partitioned_inputs: partial_output = module(x_partial, **module_kwargs) if isinstance(partial_output, tuple): partial_output = partial_output[0] module_output.append(partial_output.cpu()) module_output = torch.cat(module_output, dim=0) return module_output @torch.no_grad() def _search_best_scale( self, module, prev_op, layers: List[nn.Linear], inp: torch.Tensor, module2inspect=None, kwargs={}, ): if module2inspect is None: assert len(layers) == 1 module2inspect = layers[0] if "use_cache" in kwargs: kwargs.pop("use_cache") # Put x on the right device inp = inp.to(next(module2inspect.parameters()).device) # [STEP 1]: Compute per-channel mean of normalised weights # All layer weights are concatted together weight = torch.cat([_m.weight for _m in layers], dim=0) org_shape = weight.shape # The weights are reshaped to be organised by quantization group weight = weight.view(-1, self.group_size) # Calculates the relative magnitude of the weights within each of the quantization groups, # and rescales each group individually so that each group has weights on a 0-1 scale. w_scale = weight.abs() / (weight.abs().amax(dim=1, keepdim=True) + 1e-6) # Resizes the rescaled weight matrix back up to its original dimensions w_scale = w_scale.view(org_shape) # Gets the average rescaled magnitude for each output channel w_mean = w_scale.mean(0) clear_memory(weight) # [STEP 2]: Compute per-channel mean of the input activation with chunking # move inp to cpu to avoid memory leak inp_flat = inp.cpu().abs().view(-1, inp.shape[-1]) num_elements = inp_flat.size(0) num_channels = inp_flat.size(1) element_size_bytes = inp_flat.element_size() * 2 # multiplied by 2 for FP32 # Calculate chunk size dynamically based on max_chunk_memory chunk_size = int(self.max_chunk_memory // (element_size_bytes * num_channels)) chunk_size = min(chunk_size, num_elements) # Use float32 for sum calculation x_sum = torch.zeros(num_channels, dtype=torch.float32, device=inp.device) for i in range(0, num_elements, chunk_size): end = min(i + chunk_size, num_elements) chunk_sum = inp_flat[i:end].to(torch.float32).sum(dim=0) x_sum += chunk_sum.to(inp.device) x_mean = (x_sum / num_elements).to(inp.dtype) clear_memory(x_sum) # [STEP 3]: Compute output of module with torch.no_grad(): module_kwargs = self._sanitize_kwargs(kwargs, module2inspect) fp16_output = self._module_forward(inp, module2inspect, module_kwargs) fp16_output = fp16_output.clip(torch.finfo(fp16_output.dtype).min, torch.finfo(fp16_output.dtype).max) # [STEP 4]: Compute loss best_scales = self._compute_best_scale( inp, w_mean, x_mean, module2inspect, layers, fp16_output, module_kwargs ) return ( get_op_name(module, prev_op), tuple([get_op_name(module, m) for m in layers]), best_scales, ) def _compute_best_scale( self, x: torch.Tensor, w_mean: torch.Tensor, x_mean: torch.Tensor, module2inspect: torch.nn.Module, linears2scale: List[nn.Linear], fp16_output: torch.Tensor, kwargs: Dict={}, ): """ Compute loss and select best scales L(s) = || Q(W * s) (s^-1 * X) - W * X || Q: weight quantization function | pseudo_quantize_tensor(W * s) X: inputs from calib dataset | X W: original weights in FP16 | layer s: per channel scaling factor | s^-1 * X """ n_grid = 20 history = [] best_ratio = -1 best_scales = None best_error = float("inf") org_sd = {k: v.cpu() for k, v in module2inspect.state_dict().items()} device = x.device x_mean = x_mean.view(-1).to(device) w_mean = w_mean.view(-1).to(device) for ratio in range(n_grid): # create new scales ratio = ratio / n_grid # NOTE: s^-1 * x is fused here, according to paper if self.duo_scaling: scales = (x_mean.pow(ratio) / (w_mean.pow(1 - ratio) + 1e-4)).clamp(min=1e-4) else: scales = x_mean.pow(ratio).clamp(min=1e-4).view(-1) scales = scales / (scales.max() * scales.min()).sqrt() scales_view = scales.view(1, -1).to(device) # avoid scaling values that overflow scales[torch.isinf(scales)] = 1 scales[torch.isnan(scales)] = 1 # Q(W * s) for fc in linears2scale: fc.weight.mul_(scales_view) fc.weight.data = ( self.pseudo_quantize_tensor(fc.weight.data)[0] / scales_view ) # W * X int_w_output = self._module_forward(x, module2inspect, kwargs) int_w_output = int_w_output.clip(torch.finfo(int_w_output.dtype).min, torch.finfo(int_w_output.dtype).max) # compute mean squared error (L2 norm) loss = self._compute_loss(fp16_output, int_w_output, device) history.append(loss) if loss < best_error: best_error = loss best_ratio = ratio best_scales = scales.clone() module2inspect.load_state_dict(org_sd) if best_ratio == -1: logging.debug(history) raise Exception assert torch.isnan(best_scales).sum() == 0, best_scales return best_scales.detach().cpu() @torch.no_grad() def _compute_loss( self, fp16_output: torch.Tensor, int_w_output: torch.Tensor, device: torch.device, ): loss = 0.0 fp16_output_flat = fp16_output.view(-1) int_w_output_flat = int_w_output.view(-1) num_elements = fp16_output_flat.size(0) element_size_bytes = fp16_output.element_size() # Calculate chunk size dynamically based on max_chunk_memory # Divide the max_chunk_memory by twice the element size chunk_size = self.max_chunk_memory // (element_size_bytes * 2) chunk_size = min(chunk_size, num_elements) # Split the computation into chunks fp16_chunks = torch.split(fp16_output_flat, chunk_size) int_w_chunks = torch.split(int_w_output_flat, chunk_size) # Compute the loss for each chunk for fp16_chunk, int_w_chunk in zip(fp16_chunks, int_w_chunks): chunk_loss = (fp16_chunk.to(device) - int_w_chunk.to(device)).float().pow(2).sum().item() loss += chunk_loss # Normalize the loss by the total number of elements loss /= num_elements return loss @torch.no_grad() def _search_best_clip(self, layer, named_linears, input_feat): clip_list = [] avoid_clipping = ["q_", "k_", "query", "key", "Wqkv"] for name in named_linears: # due to qk bmm, it is hard to clip precisely if any([_ in name for _ in avoid_clipping]): continue named_linears[name].to(get_best_device()) max_val = self._compute_best_clip( named_linears[name].weight, input_feat[name] ) clip_list.append((name, max_val)) named_linears[name].cpu() return clip_list @torch.no_grad() def _compute_best_clip( self, w: torch.Tensor, input_feat: torch.Tensor, n_grid=20, max_shrink=0.5, n_sample_token=512, ): assert w.dim() == 2 org_w_shape = w.shape # w [co, ci] -> [co, 1, n_group, group size] # input_feat [n_token, ci] -> [1, n_token, n_group, group size] # 分组 group_size = self.group_size if self.group_size > 0 else org_w_shape[1] input_feat = input_feat.view(-1, input_feat.shape[-1]) input_feat = input_feat.reshape(1, input_feat.shape[0], -1, group_size) # Compute input feature step size (minimum 1) # 下采样 step_size = max(1, input_feat.shape[1] // n_sample_token) input_feat = input_feat[:, ::step_size] # 权重分组 w = w.reshape(org_w_shape[0], 1, -1, group_size) oc_batch_size = 256 if org_w_shape[0] % 256 == 0 else 64 # prevent OOM assert org_w_shape[0] % oc_batch_size == 0 w_all = w best_max_val_all = [] # 收集每一批的最优剪裁值 for i_b in range(org_w_shape[0] // oc_batch_size): w = w_all[i_b * oc_batch_size : (i_b + 1) * oc_batch_size] org_max_val = w.abs().amax(dim=-1, keepdim=True) # co, 1, n_group, 1 每个权重组的原始最大绝对值 best_max_val = org_max_val.clone() # 记录当前最优剪裁值 min_errs = torch.ones_like(org_max_val) * 1e9 # 用于记录当前最小误差 input_feat = input_feat.to(w.device) org_out = (input_feat * w).sum(dim=-1) # co, n_token, n_group # 原始浮点权重下的输出 for i_s in range(int(max_shrink * n_grid)): # 遍历多个剪裁值(max_val),模拟量化过程 max_val = org_max_val * (1 - i_s / n_grid) min_val = -max_val cur_w = torch.clamp(w, min_val, max_val) q_w = self.pseudo_quantize_tensor(cur_w)[0] cur_out = (input_feat * q_w).sum(dim=-1) # co, 1, n_group, 1 err = (cur_out - org_out).pow(2).mean(dim=1).view(min_errs.shape) del cur_w del cur_out cur_best_idx = err < min_errs min_errs[cur_best_idx] = err[cur_best_idx] best_max_val[cur_best_idx] = max_val[cur_best_idx] best_max_val_all.append(best_max_val) best_max_val = torch.cat(best_max_val_all, dim=0) clear_memory(input_feat) clear_memory(org_out) return best_max_val.squeeze(1) def init_quant(self, n_samples=128, max_seq_len=512): modules = self.awq_model.get_model_layers(self.model) samples = get_calib_dataset( data=self.calib_data, tokenizer=self.tokenizer, n_samples=n_samples, max_seq_len=max_seq_len, split=self.split, text_column=self.text_column, ) samples = torch.cat(samples, dim=0) inps = [] layer_kwargs = {} best_device = get_best_device() modules[0] = modules[0].to(best_device) self.awq_model.move_embed(self.model, best_device) # get input and kwargs to layer 0 # with_kwargs is only supported in PyTorch 2.0 # use this Catcher hack for now class Catcher(nn.Module): def __init__(self, module): super().__init__() self.module = module def forward(self, *args, **kwargs): # assume first input to forward is hidden states if len(args) > 0: hidden_states = args[0] del args else: first_key = list(kwargs.keys())[0] hidden_states = kwargs.pop(first_key) inps.append(hidden_states) layer_kwargs.update(kwargs) raise ValueError # early exit to break later inference # patch layer 0 to catch input and kwargs modules[0] = Catcher(modules[0]) try: self.model(samples.to(next(self.model.parameters()).device)) except ValueError: # work with early exit pass modules[0] = modules[0].module # restore # Update the layer kwargs with `prepare_inputs_for_generation` method # that takes care of everything to avoid unexpected errors. layer_kwargs = self.model.prepare_inputs_for_generation(samples, **layer_kwargs) # Pop the input_ids as they are not needed at all. layer_kwargs.pop("input_ids") del samples inps = inps[0] modules[0] = modules[0].cpu() self.awq_model.move_embed(self.model, "cpu") clear_memory() if layer_kwargs.get("attention_mask") is not None: layer_kwargs["attention_mask"] = layer_kwargs["attention_mask"].to( best_device ) elif "qwen" in self.awq_model.model_type: layer_kwargs["attention_mask"] = None return modules, layer_kwargs, inps def _get_input_feat(self, layer, named_linears): # firstly, get input features of all linear layers def cache_input_hook(m, x, y, name, feat_dict): x = x[0] x = x.detach().cpu() feat_dict[name].append(x) input_feat = defaultdict(list) handles = [] # FIXME: Workaround for Mixtral to use block_sparse_moe input features if self.awq_model.model_type == "mixtral": named_linears = { **named_linears, "block_sparse_moe": layer.block_sparse_moe, } if self.awq_model.model_type == "deepseek_v2" or self.awq_model.model_type == "deepseek_v3": named_linears = { **named_linears, "mlp": layer.mlp, } if self.awq_model.model_type == "qwen3_moe": named_linears = { **named_linears, "mlp": layer.mlp, } for name in named_linears: handles.append( named_linears[name].register_forward_hook( functools.partial(cache_input_hook, name=name, feat_dict=input_feat) ) ) self.inps = self.inps.to(next(layer.parameters()).device) # in case multi-gpu # get output as next layer's input # Sanitize the kwargs in case we use transformers version that contains # kwargs that are not handled by the module. # Useful for trust_remote_code models. module_kwargs = self._sanitize_kwargs(self.module_kwargs, layer) self.inps = self._module_forward(self.inps, layer, module_kwargs) for h in handles: h.remove() # now solve for scaling and clipping def cat_and_assert(k, v): x = torch.cat(v, dim=0) assert x.shape[0] != 0, ( f"{k} has a zero dimension. This can happen if no data was passed through (e.g. an expert in MoE not being activated). " "Try increasing max_calib_samples (warning: this can significantly increase quantization time and memory usage.)" ) return x input_feat = {k: cat_and_assert(k, v) for k, v in input_feat.items()} return input_feat def _sanitize_kwargs(self, inputs_kwargs, module): """ Remove the arguments that are not supported in the module's forward pass to avoid breaking behaviour between different versions of transformers. Args: inputs_kwargs (`dict`): The input dictionary to pass to the model layer module (`torch.nn.Module`): Target module to quantize. """ module_signature = inspect.signature(module.forward).parameters sanitized_kwargs = {} for k, v in inputs_kwargs.items(): if k in module_signature: sanitized_kwargs[k] = v return sanitized_kwargs
07-31
内容概要:本文围绕EKF SLAM(扩展卡尔曼滤波同步定位与地图构建)的性能展开多项对比实验研究,重点分析在稀疏与稠密landmark环境下、预测与更新步骤同时进行与非同时进行的情况下的系统性能差异,并进一步探讨EKF SLAM在有色噪声干扰下的鲁棒性表现。实验考虑了不确定性因素的影响,旨在评估不同条件下算法的定位精度与地图构建质量,为实际应用中EKF SLAM的优化提供依据。文档还提及多智能体系统在遭受DoS攻击下的弹性控制研究,但核心内容聚焦于SLAM算法的性能测试与分析。; 适合人群:具备一定机器人学、状态估计或自动驾驶基础知识的科研人员及工程技术人员,尤其是从事SLAM算法研究或应用开发的硕士、博士研究生和相关领域研发人员。; 使用场景及目标:①用于比较EKF SLAM在不同landmark密度下的性能表现;②分析预测与更新机制同步与否对滤波器稳定性与精度的影响;③评估系统在有色噪声等非理想观测条件下的适应能力,提升实际部署中的可靠性。; 阅读建议:建议结合MATLAB仿真代码进行实验复现,重点关注状态协方差传播、观测更新频率与噪声模型设置等关键环节,深入理解EKF SLAM在复杂环境下的行为特性。稀疏 landmark 与稠密 landmark 下 EKF SLAM 性能对比实验,预测更新同时进行与非同时进行对比 EKF SLAM 性能对比实验,EKF SLAM 在有色噪声下性能实验
内容概要:本文围绕“基于主从博弈的售电商多元零售套餐设计与多级市场购电策略”展开,结合Matlab代码实现,提出了一种适用于电力市场化环境下的售电商优化决策模型。该模型采用主从博弈(Stackelberg Game)理论构建售电商与用户之间的互动关系,售电商作为领导者制定电价套餐策略,用户作为跟随者响应电价并调整用电行为。同时,模型综合考虑售电商在多级电力市场(如日前市场、实时市场)中的【顶级EI复现】基于主从博弈的售电商多元零售套餐设计与多级市场购电策略(Matlab代码实现)购电组合优化,兼顾成本最小化与收益最大化,并引入不确定性因素(如负荷波动、可再生能源出力变化)进行鲁棒或随机优化处理。文中提供了完整的Matlab仿真代码,涵盖博弈建模、优化求解(可能结合YALMIP+CPLEX/Gurobi等工具)、结果可视化等环节,具有较强的可复现性和工程应用价值。; 适合人群:具备一定电力系统基础知识、博弈论初步认知和Matlab编程能力的研究生、科研人员及电力市场从业人员,尤其适合从事电力市场运营、需求响应、售电策略研究的相关人员。; 使用场景及目标:① 掌握主从博弈在电力市场中的建模方法;② 学习售电商如何设计差异化零售套餐以引导用户用电行为;③ 实现多级市场购电成本与风险的协同优化;④ 借助Matlab代码快速复现顶级EI期刊论文成果,支撑科研项目或实际系统开发。; 阅读建议:建议读者结合提供的网盘资源下载完整代码与案例数据,按照文档目录顺序逐步学习,重点关注博弈模型的数学表达与Matlab实现逻辑,同时尝试对目标函数或约束条件进行扩展改进,以深化理解并提升科研创新能力。
内容概要:本文介绍了基于粒子群优化算法(PSO)的p-Hub选址优化问基于粒子群优化算法的p-Hub选址优化(Matlab代码实现)题的Matlab代码实现,旨在解决物流与交通网络中枢纽节点的最优选址问题。通过构建数学模型,结合粒子群算法的全局寻优能力,优化枢纽位置及分配策略,提升网络传输效率并降低运营成本。文中详细阐述了算法的设计思路、实现步骤以及关键参数设置,并提供了完整的Matlab仿真代码,便于读者复现和进一步改进。该方法适用于复杂的组合优化问题,尤其在大规模网络选址中展现出良好的收敛性和实用性。; 适合人群:具备一定Matlab编程基础,从事物流优化、智能算法研究或交通运输系统设计的研究生、科研人员及工程技术人员;熟悉优化算法基本原理并对实际应用场景感兴趣的从业者。; 使用场景及目标:①应用于物流中心、航空枢纽、快递分拣中心等p-Hub选址问题;②帮助理解粒子群算法在离散优化问题中的编码与迭代机制;③为复杂网络优化提供可扩展的算法框架,支持进一步融合约束条件或改进算法性能。; 阅读建议:建议读者结合文中提供的Matlab代码逐段调试运行,理解算法流程与模型构建逻辑,重点关注粒子编码方式、适应度函数设计及约束处理策略。可尝试替换数据集或引入其他智能算法进行对比实验,以深化对优化效果和算法差异的理解。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值