【性能优化关键一步】:利用str_replace计数参数精准控制替换行为

第一章:str_replace计数参数的核心作用

在PHP字符串处理中,`str_replace` 函数不仅用于替换指定子串,其可选的第四个参数——计数(count)——提供了关键的调试与逻辑控制能力。该参数以引用方式传递,函数执行后会返回实际发生替换的次数,帮助开发者精准掌握操作结果。

计数参数的工作机制

当调用 `str_replace` 时,若提供第四个参数,PHP会将替换发生的总次数写入该变量。这一特性在需要条件判断或日志记录的场景中尤为有用。

// 示例:使用计数参数检测替换行为
$original = "Hello world, welcome to the world of PHP.";
$search   = "world";
$replace  = "universe";
$count    = 0;

$result = str_replace($search, $replace, $original, $count);

echo "修改后的文本: $result\n"; // 输出替换结果
echo "替换次数: $count\n";       // 输出:替换次数: 2

// 可基于$count进行逻辑控制
if ($count > 0) {
    echo "警告:原始文本中存在需替换的敏感词。\n";
}

典型应用场景

  • 监控模板引擎中占位符的替换次数,确保所有变量被正确注入
  • 在数据清洗流程中统计非法字符的出现频率
  • 验证输入过滤规则是否生效,防止绕过安全机制
参数行为对比表
参数位置名称是否必需作用
1$search要查找的值
2$replace用于替换的新值
3$subject被操作的字符串或数组
4$count返回替换发生的次数

第二章:计数参数的理论基础与工作机制

2.1 理解str_replace函数的基本语法结构

基本语法与参数说明
在PHP中,str_replace用于执行字符串替换操作,其基本语法如下:

str_replace(mixed $search, mixed $replace, mixed $subject, int &$count = null)
- $search:要查找的值(可为字符串或数组); - $replace:替换为的值; - $subject:被搜索和替换的原始字符串或数组; - $count(可选):引用参数,用于记录替换发生的次数。
执行逻辑与返回值
该函数遍历$subject,将所有匹配$search的子串替换为$replace,并返回新字符串。若$subject为数组,则对每个元素执行替换。
  • 支持大小写敏感匹配
  • 允许批量替换(通过数组参数)
  • 不修改原变量,返回新结果

2.2 计数参数在替换流程中的角色解析

计数参数的核心作用
在字符串替换操作中,计数参数控制替换的次数,决定是否全局替换或仅替换前N次匹配。该参数提升了操作的灵活性,适用于需部分更新的场景。
代码示例与分析
text = "apple banana apple cherry apple"
result = text.replace("apple", "fruit", 2)
print(result)
上述代码中,第三个参数 2 为计数参数,表示仅替换前两次匹配的 "apple"。输出结果为:fruit banana fruit cherry apple,可见第三次及之后的匹配未被替换。
参数行为对比表
计数参数值替换行为
0不进行任何替换
1仅替换第一次匹配
2替换前两次匹配
-1 或省略全局替换所有匹配项

2.3 引用传递与变量更新的底层机制

在现代编程语言中,引用传递不直接传递值,而是传递对象内存地址。这使得函数内部对参数的修改能影响外部变量。
数据同步机制
当变量通过引用传入函数时,栈中存储的是指向堆内存的指针。多个引用可指向同一对象,任一引用的修改都会反映在共享数据上。
func updateValue(data *int) {
    *data = 42
}
// 调用:x := 10; updateValue(&x) — x 变为 42
该代码中,*data 解引用操作修改堆内存中的原始值,实现跨作用域更新。
  • 引用传递减少大对象复制开销
  • 需警惕意外的数据污染
  • 垃圾回收器依赖引用计数追踪对象生命周期

2.4 性能影响:有限次替换 vs 全量替换

在缓存更新策略中,有限次替换与全量替换对系统性能有显著差异。
有限次替换机制
该策略仅更新受影响的少量数据项,降低I/O开销。适用于局部变更场景,减少锁竞争。
// 有限次替换示例:仅更新指定key
func updateCache(keys []string, data map[string]interface{}) {
    for _, k := range keys {
        if val, exists := data[k]; exists {
            cache.Set(k, val, ttl)
        }
    }
}
上述代码仅遍历传入键进行更新,时间复杂度为O(n),n为变更键数量,资源消耗可控。
全量替换机制
全量替换会清空并重建整个缓存,带来高延迟和瞬时CPU峰值。常见于配置全局刷新。
  • 优点:保证数据一致性
  • 缺点:内存波动大,GC压力增加
策略响应时间吞吐量影响
有限次替换
全量替换

2.5 边界情况分析:零替换与负数行为

在数值处理中,边界情况常引发意外行为。零值替换可能导致除零异常或逻辑短路,需特别校验。
常见边界场景
  • 输入为0时是否触发默认替换逻辑
  • 负数参与运算时符号传播问题
  • 浮点数精度丢失对比较的影响
代码示例与分析
func safeDivide(a, b float64) (float64, bool) {
    if b == 0 {
        return 0, false // 避免除零
    }
    result := a / b
    return result, true
}
该函数显式处理除数为零的情况,返回安全默认值并附带状态标识。参数 b 为零时直接拒绝运算,避免崩溃;负数输入则正常传递符号,符合数学预期。
边界输入对照表
输入 a输入 b输出值成功?
500
-62-3
10-5-2

第三章:典型应用场景实践

3.1 日志脱敏处理中的精准替换

在日志系统中,敏感信息如身份证号、手机号需在存储前进行脱敏。为确保数据安全与合规,精准替换策略尤为重要。
正则匹配与动态掩码
通过正则表达式识别敏感字段,并采用动态掩码替换。例如,使用 Go 实现手机号脱敏:

func DesensitizePhone(text string) string {
    re := regexp.MustCompile(`(\d{3})\d{4}(\d{4})`)
    return re.ReplaceAllString(text, "${1}****${2}")
}
该函数匹配中国大陆手机号格式,保留前三位与后四位,中间四位以 `*` 替代,确保可读性与隐私平衡。
多类型敏感数据映射表
维护一个敏感数据类型与正则规则的映射,便于统一管理:
数据类型正则模式替换格式
身份证\d{6}[Xx\d]\d{6}\d{3}[\dXx]******XXXXXX***X
银行卡\d{6}\d{8}\d{4}**** **** **** XXXX

3.2 模板引擎中占位符的可控填充

在现代模板引擎中,占位符的可控填充是实现动态内容渲染的核心机制。通过预定义变量语法,开发者可在模板中声明待替换字段,并在运行时注入上下文数据。
占位符语法与解析流程
常见的占位符形式为 {{variable}},模板引擎在解析阶段会遍历模板文本,识别此类模式并映射上下文中的对应值。
func render(template string, data map[string]string) string {
    result := template
    for key, value := range data {
        placeholder := "{{" + key + "}}"
        result = strings.ReplaceAll(result, placeholder, value)
    }
    return result
}
上述 Go 示例展示了简单的字符串替换逻辑。函数接收模板和键值对数据,逐项替换占位符。虽然基础,但体现了填充机制的本质:**模式匹配 + 上下文绑定**。
安全与转义控制
为防止 XSS 攻击,多数引擎默认对输出进行 HTML 转义。可通过特殊语法如 {{{raw}}}{{!unescaped}} 控制是否跳过转义,实现精细化输出管理。

3.3 防止过度替换导致的数据污染

在数据处理流程中,频繁或无条件的字段替换可能引入错误值或丢失原始信息,造成数据污染。为避免此类问题,需建立替换规则的判定机制。
条件化替换策略
采用条件判断控制替换行为,确保仅在满足特定条件下执行更新操作:
def safe_replace(data, key, new_value, condition_func):
    # 仅在条件函数返回True且原键存在时替换
    if key in data and condition_func(new_value):
        data[key] = new_value
    return data
上述函数通过传入的 condition_func 验证新值合法性,防止非法数据写入。例如可限制数值范围、格式匹配等。
常见防护措施
  • 使用正则表达式校验字符串格式
  • 设置默认值兜底机制
  • 记录替换日志用于审计追溯

第四章:性能优化实战案例

4.1 批量文本处理时的资源消耗对比

在处理大规模文本数据时,不同处理方式对系统资源的占用差异显著。流式处理与批处理在内存和CPU使用上呈现明显区别。
内存占用对比
处理方式平均内存占用峰值内存
批量加载1.8 GB2.4 GB
流式读取0.3 GB0.6 GB
代码实现示例

# 批量加载:一次性读入全部文本
with open("large_file.txt", "r") as f:
    texts = f.readlines()  # 占用大量内存
processed = [clean(text) for text in texts]
该方式将整个文件载入内存,适用于小规模数据。当文件超过数百MB时,易引发内存溢出。
优化方案
  • 采用逐行读取避免内存堆积
  • 结合生成器减少中间对象创建
  • 使用内存映射文件(mmap)提升大文件访问效率

4.2 结合正则表达式实现高效混合替换

在处理复杂文本替换任务时,单纯的字符串匹配已无法满足需求。结合正则表达式可实现模式化替换,大幅提升处理效率。
基础语法与捕获组应用
通过捕获组提取关键信息并动态重构内容,是混合替换的核心机制。

const text = "用户ID: 10086, 订单号: ORD-2023-9527";
const result = text.replace(/ORD-(\d{4})-(\d+)/, "REF-$2-$1");
// 输出:用户ID: 10086, 订单号: REF-9527-2023
该示例中,(\d{4})(\d+) 捕获年份与序列号,替换时通过 $2$1 调整顺序,实现结构重组。
多规则替换策略
  • 使用修饰符 g 实现全局替换
  • 结合 ^$ 控制匹配边界
  • 利用非捕获组 (?:...) 提升性能

4.3 在高并发服务中减少不必要的字符串操作

在高并发场景下,频繁的字符串拼接与转换会显著增加内存分配和GC压力,影响服务吞吐量。应优先使用高效的数据结构和预分配策略来降低开销。
避免隐式字符串转换
在日志记录或错误构造中,避免直接拼接复杂对象。应延迟字符串化操作至必要时刻。

var buf strings.Builder
buf.Grow(128) // 预分配缓冲区
for i := 0; i < len(ids); i++ {
    buf.WriteString(strconv.Itoa(ids[i]))
    if i < len(ids)-1 {
        buf.WriteByte(',')
    }
}
return buf.String()
该代码通过预分配缓冲区并使用 strings.Builder 减少内存拷贝。相比 += 拼接,性能提升可达数倍,尤其在循环中效果显著。
使用字节切片替代字符串操作
对于大量原始数据处理,直接操作 []byte 可避免多次编码转换。
  • 使用 bytes.Buffer 替代字符串拼接
  • 通过 sync.Pool 缓存临时缓冲区
  • 避免在循环中调用 fmt.Sprintf

4.4 基于计数反馈的动态替换策略设计

在缓存系统中,静态替换策略难以适应动态变化的访问模式。基于计数反馈的动态替换策略通过实时统计页面访问频率,调整替换优先级,提升命中率。
核心机制
每个缓存项维护一个访问计数器,定期衰减以反映近期活跃度。高访问频次且未被频繁命中的“冷门热点”将获得更高保留权重。
算法实现示例

type CacheEntry struct {
    Key    string
    Value  interface{}
    Count  int // 访问计数
    Age    int // 存活周期
}

func (c *Cache) Update(key string) {
    if entry, exists := c.items[key]; exists {
        entry.Count++
        entry.Age = 0
    }
}
该结构体记录关键元数据,Update 方法在每次命中时递增计数并重置年龄,用于后续淘汰决策。
淘汰策略对比
策略命中率适应性
LRU78%
Count-based89%

第五章:总结与最佳实践建议

构建高可用微服务架构的运维策略
在生产环境中维护微服务系统时,应优先实现自动化的健康检查与熔断机制。以下是一个基于 Go 的简单健康检查中间件示例:

func HealthCheckMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if r.URL.Path == "/health" {
            w.WriteHeader(http.StatusOK)
            w.Write([]byte("OK"))
            return
        }
        next.ServeHTTP(w, r)
    })
}
配置管理的最佳实践
集中式配置管理可显著降低部署复杂度。推荐使用如下结构组织配置项:
  • 将环境相关参数(如数据库连接、API 密钥)外部化
  • 使用版本控制管理配置模板,但禁止提交敏感信息
  • 在 Kubernetes 中通过 ConfigMap 与 Secret 实现解耦
  • 定期轮换凭据并启用配置变更审计日志
性能监控与告警设置
有效的监控体系应覆盖多个维度。以下是关键指标的采集建议:
监控维度推荐工具采样频率
请求延迟Prometheus + Grafana每10秒
错误率Datadog APM实时流处理
JVM 堆内存VisualVM + JMX Exporter每30秒
部署流程图:
代码提交 → CI 构建镜像 → 安全扫描 → 推送至私有仓库 → Helm 更新发布 → 流量灰度导入
# rate_set/rate_sync.py import json import os import re import logging import sys from pathlib import Path from utils import resource_path from datetime import datetime from typing import Dict, List, Tuple, Any # ------------------------------- # 日志配置 # ------------------------------- PROJECT_ROOT = Path(__file__).parent.parent.resolve() LOG_DIR = PROJECT_ROOT / "output" / "log" LOG_DIR.mkdir(parents=True, exist_ok=True) LOG_FILE = LOG_DIR / f"rate_sync_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log" class RateSetSynchronizer: MAX_ENUM_PER_LINE = 4 # enum 每行最多几个宏 MAX_DATA_ITEMS_PER_LINE = 4 # data 数组每行最多几个值 MAX_INDEX_ITEMS_PER_LINE = 15 # index 数组每行最多几个值 def __init__(self, c_file_path=None, dry_run=False, config_path="config/config.json"): self.logger = logging.getLogger(f"{__name__}.RateSetSynchronizer") # 加载配置 self.config_file_path = resource_path(config_path) if not os.path.exists(self.config_file_path): raise FileNotFoundError(f"配置文件不存在: {self.config_file_path}") with open(self.config_file_path, 'r', encoding='utf-8') as f: self.config = json.load(f) self.dry_run = dry_run # C 文件路径 if c_file_path is None: internal_c_path = self.config["target_c_file"] self.c_file_path = resource_path(internal_c_path) else: self.c_file_path = Path(c_file_path) if not self.c_file_path.exists(): raise FileNotFoundError(f"找不到 C 源文件: {self.c_file_path}") # === 单一锚点标记 === self.block_start = self.config["STR_RATE_SET_DATA"] self.block_end = self.config["END_RATE_SET_DATA"] # 数组与枚举名 self.data_array_name = "rate_sets_2g_20m" self.index_array_name = "rate_sets_index_2g_20m" self.enum_name = "rate_set_2g_20m" # 扫描所有 .c 文件(排除自身) self.rate_set_dir = Path(__file__).parent self.rate_files = [ f for f in self.rate_set_dir.iterdir() if f.is_file() and f.suffix == ".c" and f.name != "rate_sync.py" ] # 加载配置 self.target_map = self.config("target_map") def extract_sub_rate_sets(self, content: str) -> List[Dict[str, Any]]: """ 提取 /*NAME*/ N, WL_RATE_xxx... 子集,支持多行、空格、换行等常见格式 """ sub_sets = [] # 移除所有 ); 结尾符号(不影响结构) cleaned_content = re.sub(r'[);]', '', content) # === 第一阶段:用非贪婪方式找出所有 /*...*/ N, ... 块 === # 匹配:/*NAME*/ 任意空白 数字 , 任意内容(直到下一个 /* 或结尾) block_pattern = r'/\*\s*([A-Z0-9_]+)\s*\*/\s*(\d+)\s*,?[\s\n]*((?:(?!\s*/\*\s*[A-Z0-9_]+\s*\*/).)*)' matches = re.findall(block_pattern, cleaned_content, re.DOTALL | re.IGNORECASE) self.logger.info(f"从文件中初步匹配到 {len(matches)} 个 rate set 定义块") for name, count_str, body in matches: try: count = int(count_str) except ValueError: self.logger.warning(f"计数无效,跳过: {name} = '{count_str}'") continue # 从 body 中提取所有 WL_RATE_XXX rate_items = re.findall(r'WL_RATE_[A-Za-z0-9_]+', body) if len(rate_items) < count: self.logger.warning(f"[{name}] 条目不足: 需要 {count}, 实际 {len(rate_items)} → 截断处理") rate_items = rate_items[:count] else: rate_items = rate_items[:count] self.logger.debug(f" 提取成功: {name} (count={count}) → {len(rate_items)} 项") sub_sets.append({ "name": name.strip(), "count": count, "rates": rate_items }) self.logger.info(f"共成功提取 {len(sub_sets)} 个有效子集") return sub_sets def parse_all_structures(self, full_content: str) -> Dict: """ 直接从完整 C 文件中解析 enum/data/index 结构 """ result = { 'existing_enum': {}, 'data_entries': [], 'index_values': [], 'data_len': 0 } # === 解析 enum === enum_pattern = rf'enum\s+{re.escape(self.enum_name)}\s*\{{([^}}]+)\}};' enum_match = re.search(enum_pattern, full_content, re.DOTALL) if enum_match: body = enum_match.group(1) entries = re.findall(r'(RATE_SET_[^=,\s]+)\s*=\s*(\d+)', body) result['existing_enum'] = {k: int(v) for k, v in entries} self.logger.info(f"解析出 {len(entries)} 个已有枚举项") else: self.logger.warning(f"未找到 enum 定义: {self.enum_name}") # === 解析 data 数组 === data_pattern = rf'static const unsigned char {re.escape(self.data_array_name)}\[\] = \{{([^}}]+)\}};' data_match = re.search(data_pattern, full_content, re.DOTALL) if not data_match: raise ValueError(f"未找到 data 数组: {self.data_array_name}") data_code = data_match.group(1) result['data_entries'] = [item.strip() for item in re.split(r'[,\n]+', data_code) if item.strip()] result['data_len'] = len(result['data_entries']) # === 解析 index 数组 === index_pattern = rf'static const unsigned short {re.escape(self.index_array_name)}\[\] = \{{([^}}]+)\}};' index_match = re.search(index_pattern, full_content, re.DOTALL) if not index_match: raise ValueError(f"未找到 index 数组: {self.index_array_name}") index_code = index_match.group(1) result['index_values'] = [int(x.strip()) for x in re.split(r'[,\n]+', index_code) if x.strip()] return result def build_injection(self, new_subsets: List[Dict], existing_enum: Dict[str, int], current_data_len: int) -> Tuple[List[str], List[int], List[str]]: """ 构建要注入的新内容 返回: (new_data, new_indices, new_enums) """ new_data = [] new_indices = [] new_enums = [] current_offset = 0 # 当前相对于新块起始的偏移 next_enum_value = max(existing_enum.values(), default=-1) + 1 self.logger.info(f"开始构建注入内容,当前最大枚举值 = {next_enum_value}") for subset in new_subsets: enum_name = subset["name"] # ✅ 使用完整名称,避免前缀冲突! if enum_name in existing_enum: self.logger.info(f"跳过已存在的枚举项: {enum_name} = {existing_enum[enum_name]}") current_offset += 1 + subset["count"] continue # 添加长度 + 所有速率 new_data.append(str(subset["count"])) new_data.extend(subset["rates"]) # 索引是“从旧 data 尾部开始”的全局偏移 global_index = current_data_len + current_offset new_indices.append(global_index) # 枚举定义 new_enums.append(f" {enum_name} = {next_enum_value}") self.logger.debug(f"新增枚举: {enum_name} → value={next_enum_value}, index={global_index}") next_enum_value += 1 current_offset += 1 + subset["count"] self.logger.info(f"构建完成:新增 {len(new_data)} 个数据项,{len(new_indices)} 个索引,{len(new_enums)} 个枚举") return new_data, new_indices, new_enums def format_list(self, items: List[str], indent: str = " ", width: int = 8) -> str: """格式化数组为多行字符串""" lines = [] for i in range(0, len(items), width): chunk = items[i:i + width] lines.append(indent + ", ".join(chunk) + ",") return "\n".join(lines).rstrip(",") def _safe_write_back(self, old_content: str, new_content: str) -> bool: """安全写回文件,带备份""" if old_content == new_content: self.logger.info("主文件内容无变化,无需写入") return False if self.dry_run: self.logger.info("DRY-RUN 模式启用,跳过实际写入") print("[DRY RUN] 差异预览(前 20 行):") diff = new_content.splitlines()[:20] for line in diff: print(f" {line}") return True # 创建备份 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") backup = self.c_file_path.with_name(f"{self.c_file_path.stem}_{timestamp}.c.bak") try: self.c_file_path.rename(backup) self.logger.info(f"原文件已备份为: {backup.name}") except Exception as e: self.logger.error(f"备份失败: {e}") raise # 写入新内容 try: self.c_file_path.write_text(new_content, encoding='utf-8') self.logger.info(f"✅ 成功写入更新后的文件: {self.c_file_path.name}") return True except Exception as e: self.logger.error(f"写入失败: {e}", exc_info=True) raise def inject_new_data(self) -> bool: """执行同步逻辑""" try: content = self.c_file_path.read_text(encoding='utf-8') except Exception as e: self.logger.error(f"读取源文件失败: {e}") raise self.logger.info(f"正在处理 C 文件: {self.c_file_path.name}") self.logger.info(f"查找锚点:\n START: {self.block_start}\n END: {self.block_end}") start_pos = content.find(self.block_start) end_pos = content.find(self.block_end) if start_pos == -1: raise ValueError(f"未找到起始锚点: {self.block_start}") if end_pos == -1: raise ValueError(f"未找到结束锚点: {self.block_end}") if end_pos <= start_pos: raise ValueError("结束锚点位于起始锚点之前") inner_start = start_pos + len(self.block_start) block_content = content[inner_start:end_pos].strip() if not block_content.strip(): self.logger.warning("锚点之间为空,将全新生成内容") # 解析已有数据 try: parsed = self.parse_all_structures(content) # ← 传全文 except Exception as e: self.logger.error(f"解析锚点内代码失败: {e}") raise # 收集所有新数据 all_new_data = [] all_new_indices = [] all_new_enums = [] self.logger.info(f"开始扫描 {len(self.rate_files)} 个 .c 源文件以提取新 rate set...") for file_path in self.rate_files: self.logger.info(f"→ 处理子文件: {file_path.name}") try: file_content = file_path.read_text(encoding='utf-8') subsets = self.extract_sub_rate_sets(file_content) if not subsets: self.logger.info(f" └─ 未提取到任何有效 rate set") continue new_data, new_indices, new_enums = self.build_injection( subsets, parsed['existing_enum'], parsed['data_len'] ) all_new_data.extend(new_data) all_new_indices.extend(new_indices) all_new_enums.extend(new_enums) self.logger.info(f" └─ 提取 {len(subsets)} 个新子集,准备注入") except Exception as e: self.logger.error(f"处理文件失败 [{file_path}]: {e}") continue # 在生成完 all_new_data, all_new_indices, all_new_enums 后: if not all_new_data: self.logger.info("没有新数据需要注入") return False # 使用精准写入方式(不再调用 update_single_block) try: updated = self._write_back_in_blocks(content, parsed, all_new_data, all_new_indices, all_new_enums) except Exception as e: self.logger.error(f"写入失败: {e}", exc_info=True) raise return updated def _write_back_in_blocks(self, content: str, parsed: Dict, new_data: List[str], new_indices: List[int], new_enums: List[str]) -> bool: """ 精准定位 enum/data/index 三个子结构,分别追加内容,并倒序替换。 类似于 power_sync.py 的策略。 """ self.logger.info("开始执行精准子块写入...") replacements = [] # (start, end, replacement) def remove_comments(text: str) -> str: text = re.sub(r'//.*$', '', text, flags=re.MULTILINE) text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL) return text.strip() try: # === 1. 更新 ENUM === if new_enums: enum_pattern = rf'(enum\s+{re.escape(self.enum_name)}\s*\{{)([^}}]*)\}}\s*;' match = re.search(enum_pattern, content, re.DOTALL | re.IGNORECASE) if not match: raise ValueError(f"未找到枚举定义: {self.enum_name}") header = match.group(1) body_content = match.group(2) lines = [ln for ln in body_content.split('\n') if ln.strip()] last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " clean_last = remove_comments(last_line) first_macro_match = re.search(r'RATE_SET_[A-Z0-9_]+', clean_last) eq_match = re.search(r'=\s*\d+', clean_last) # 计算等号列对齐位置 target_eq_col = 30 if first_macro_match and eq_match: raw_before_eq = last_line[:first_macro_match.start() + eq_match.start()] expanded_before_eq = raw_before_eq.expandtabs(4) target_eq_col = len(expanded_before_eq) new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' for enum_def in new_enums: # enum_def 形如 " RATE_SET_NEW_1 = 2" macro_name = enum_def.split('=')[0].strip().split()[-1] value = enum_def.split('=')[1].strip().rstrip(',') current_len = len(macro_name.replace('\t', ' ')) padding = max(1, target_eq_col - current_len) formatted = f"{macro_name}{' ' * padding}= {value}" visible_macros = len(re.findall(r'RATE_SET_[A-Z0-9_]+', remove_comments(last_line))) if visible_macros < self.MAX_ENUM_PER_LINE and last_line.strip(): insertion = f" {formatted}," updated_last = last_line.rstrip() + insertion new_body = body_content.rsplit(last_line, 1)[0] + updated_last last_line = updated_last else: prefix_padding = ' ' * max(0, len(line_indent.replace('\t', ' ')) - len(line_indent)) new_line = f"\n{line_indent}{prefix_padding}{formatted}," new_body += new_line last_line = new_line.strip() new_enum_code = f"{header}{new_body}\n}};" full_start = match.start() full_end = match.end() replacements.append((full_start, full_end, new_enum_code)) self.logger.debug(f"计划更新 enum: 添加 {len(new_enums)} 项") # === 2. 更新 DATA 数组 === if new_data: data_pattern = rf'(static const unsigned char {re.escape(self.data_array_name)}\[\]\s*=\s*\{{)([^}}]*)(\}}\s*;)' match = re.search(data_pattern, content, re.DOTALL) if not match: raise ValueError(f"未找到 data 数组: {self.data_array_name}") header = match.group(1) body_content = match.group(2).strip() footer = match.group(3) # 获取最后一行用于继承缩进 lines = body_content.splitlines() last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' # 分批加入新数据(每行最多MAX_DATA_ITEMS_PER_LINE个) for i in range(0, len(new_data), self.MAX_DATA_ITEMS_PER_LINE): chunk = new_data[i:i + self.MAX_DATA_ITEMS_PER_LINE] line = "\n" + line_indent + ", ".join(chunk) + "," new_body += line new_data_code = f"{header}{new_body}\n{footer}" full_start = match.start() full_end = match.end() replacements.append((full_start, full_end, new_data_code)) self.logger.debug(f"计划更新 data 数组: 添加 {len(new_data)} 个元素") # === 3. 更新 INDEX 数组 === if new_indices: index_pattern = rf'(static const unsigned short {re.escape(self.index_array_name)}\[\]\s*=\s*\{{)([^}}]*)(\}}\s*;)' match = re.search(index_pattern, content, re.DOTALL) if not match: raise ValueError(f"未找到 index 数组: {self.index_array_name}") header = match.group(1) body_content = match.group(2).strip() footer = match.group(3) lines = body_content.splitlines() last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' str_indices = [str(x) for x in new_indices] chunk_size = self.MAX_INDEX_ITEMS_PER_LINE # ← 使用独立配置 for i in range(0, len(str_indices), chunk_size): chunk = str_indices[i:i + chunk_size] line = "\n" + line_indent + ", ".join(chunk) + "," new_body += line new_index_code = f"{header}{new_body}\n{footer}" full_start = match.start() full_end = match.end() replacements.append((full_start, full_end, new_index_code)) self.logger.debug(f"计划更新 index 数组: 添加 {len(new_indices)} 个索引") # === 4. 应用所有替换(倒序避免偏移错乱)=== if not replacements: self.logger.info("没有需要写入的变更") return False start_marker_pos = content.find(self.block_start) end_marker_pos = content.find(self.block_end) if start_marker_pos == -1 or end_marker_pos == -1: raise ValueError("关键锚点缺失,请检查 C 文件是否被破坏") for start, end, r in replacements: if not (start_marker_pos < start < end_marker_pos): raise RuntimeError(f"检测到修改超出允许区域!位置 {start} 不在锚点范围内") # 按起始位置倒序排序 replacements.sort(key=lambda x: x[0], reverse=True) final_content = content for start, end, r in replacements: self.logger.info(f"写入变更 [{start}:{end}] → 新内容片段:\n{r[:100]}...") final_content = final_content[:start] + r + final_content[end:] if content == final_content: self.logger.info("文件内容无变化,无需写入") return False # 备份并写回 return self._safe_write_back(content, final_content) except Exception as e: self.logger.error(f"精准写入失败: {e}", exc_info=True) raise def run(self): self.logger.info("开始同步 RATE_SET 数据...") try: changed = self.inject_new_data() if changed: print("✅ 同步完成") else: print("✅ 无新数据,无需更新") return { "success": True, "changed": changed, "file": str(self.c_file_path), "backup": f"{self.c_file_path.stem}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.c.bak" if changed and not self.dry_run else None } except Exception as e: self.logger.error(f"同步失败: {e}", exc_info=True) print("❌ 同步失败,详见日志。") return {"success": False, "error": str(e)} def main(): logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(name)s: %(message)s', handlers=[ logging.FileHandler(LOG_FILE, encoding='utf-8'), logging.StreamHandler(sys.stdout) ], force=True ) dry_run = False # 设置为 True 可进行试运行 try: sync = RateSetSynchronizer(dry_run=dry_run) sync.run() print("同步完成!") except FileNotFoundError as e: logging.error(f"文件未找到: {e}") print("❌ 文件错误,请检查路径。") sys.exit(1) except PermissionError as e: logging.error(f"权限错误: {e}") print("❌ 权限不足,请关闭编辑器或以管理员运行。") sys.exit(1) except Exception as e: logging.error(f"程序异常退出: {e}", exc_info=True) print("❌ 同步失败,详见日志。") sys.exit(1) if __name__ == '__main__': main() 在此基础上修改
10-26
很好,但是你的代码风格和我已有 代码区别太大,# power/power_sync.py import json import os import re import logging import sys from pathlib import Path from shutil import copy2 from datetime import datetime from utils import resource_path from typing import Dict, List, Tuple, Any # ------------------------------- # 日志配置 # ------------------------------- PROJECT_ROOT = Path(__file__).parent.parent.resolve() LOG_DIR = PROJECT_ROOT / "output" / "log" LOG_DIR.mkdir(parents=True, exist_ok=True) LOG_FILE = LOG_DIR / f"power_sync_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log" class PowerTableSynchronizer: def __init__(self, c_file_path=None, dry_run=False, config_path="config/config.json"): self.logger = logging.getLogger(__name__) # === Step 1: 使用 resource_path 解析所有路径 === self.config_file_path = resource_path(config_path) self.logger.info(f"配置文件: {self.config_file_path}") if not os.path.exists(self.config_file_path): raise FileNotFoundError(f"配置文件不存在: {self.config_file_path}") try: with open(self.config_file_path, 'r', encoding='utf-8') as f: self.config = json.load(f) self.logger.info(f"配置文件已加载: {self.config_file_path}") except json.JSONDecodeError as e: raise ValueError(f"配置文件格式错误,JSON 解析失败: {self.config_file_path}") from e except Exception as e: raise RuntimeError(f"读取配置文件时发生未知错误: {e}") from e self.dry_run = dry_run # === Step 2: 目标 C 文件处理 === if c_file_path is None: if "target_c_file" not in self.config: raise KeyError("config 文件缺少 'target_c_file' 字段") internal_c_path = self.config["target_c_file"] logging.info(f"使用内置 C 文件: {internal_c_path}") self.c_file_path =resource_path(internal_c_path) self._is_internal_c_file = True else: self.c_file_path = Path(c_file_path) self._is_internal_c_file = False if not self.c_file_path.exists(): raise FileNotFoundError(f"找不到 C 源文件: {self.c_file_path}") # === Step 3: 初始化数据容器 === self.locale_enums = {} # enum_name -> {"macros": [macro], "values": {macro: idx}} self.power_tables = {} # table_name -> [lines] self.table_pending_appends = {} # table_name -> List[str] # === Step 4: 加载锚点标记 === for marker_key in ["STR_POWER_LOCALE_ENUM", "END_POWER_LOCALE_ENUM", "STR_POWER_TABLE", "END_POWER_TABLE"]: if marker_key not in self.config: raise KeyError(f"config 文件缺少 '{marker_key}' 字段") self.start_enum_marker = self.config["STR_POWER_LOCALE_ENUM"] self.end_enum_marker = self.config["END_POWER_LOCALE_ENUM"] self.start_table_marker = self.config["STR_POWER_TABLE"] self.end_table_marker = self.config["END_POWER_TABLE"] # === Step 5: 功率表文件 === gen_file = PROJECT_ROOT / "output" / "tx_limit_table.c" if not gen_file.exists(): self.logger.error(f" 找不到生成文件: {gen_file}") raise FileNotFoundError(f"请先运行 excel_to_clm.py 生成 tx_limit_table.c: {gen_file}") try: self.power = gen_file.read_text(encoding='utf-8') except Exception as e: self.logger.error(f" 读取 {gen_file} 失败: {e}") raise # 加载 locale_targets 配置 if "locale_targets" not in self.config: raise KeyError("config 文件缺少 'locale_targets' 字段") required_keys = {"enum", "table", "suffix"} for i, item in enumerate(self.config["locale_targets"]): if not isinstance(item, dict) or not required_keys.issubset(item.keys()): raise ValueError(f"locale_targets[{i}] 缺少必要字段 {required_keys}: {item}") self.locale_targets = self.config["locale_targets"] self.logger.info(f"已加载 {len(self.locale_targets)} 个 Locale 映射目标") def offset_to_lineno(self, content: str, offset: int) -> int: """将字符偏移量转换为行号(从1开始)""" return content.count('\n', 0, offset) + 1 def _extract_brace_content(self, content: str, start_brace_pos: int) -> tuple[str | None, int]: depth = 0 i = start_brace_pos while i < len(content): c = content[i] if c == '{': depth += 1 elif c == '}': depth -= 1 if depth == 0: inner = content[start_brace_pos + 1:i].strip() return inner, i + 1 # 返回内部内容 和 '}' 后的下一个索引 i += 1 return None, -1 def parse_c_power_definitions(self): """解析 C 源文件中的 enum locale_xxx_idx 和 static const unsigned char locales_xxx[]""" self.logger.info("解析 C 文件中的功率表定义...") self.logger.info("...") content = self.c_file_path.read_text(encoding='utf-8') # --- 解析 ENUM 区域 --- try: enum_start_idx = content.find(self.start_enum_marker) enum_end_idx = content.find(self.end_enum_marker) if enum_start_idx == -1 or enum_end_idx == -1: raise ValueError("未找到 LOCALE ENUM 标记块") enum_block = content[enum_start_idx:enum_end_idx] start_line = self.offset_to_lineno(content, enum_start_idx) end_line = self.offset_to_lineno(content, enum_end_idx) self.logger.info(f"找到 ENUM 标记范围:第 {start_line} 行 → 第 {end_line} 行") enum_pattern = re.compile( r'(enum\s+locale_[a-zA-Z0-9_]+(?:_[a-zA-Z0-9_]+)*_idx\s*\{)([^}]*)\}\s*;', re.DOTALL | re.IGNORECASE ) for match in enum_pattern.finditer(enum_block): enum_decl = match.group(0) self.logger.debug(f" 解析枚举声明: {enum_decl}") enum_name_match = re.search(r'locale_[\w\d_]+_idx', enum_decl, re.IGNORECASE) if not enum_name_match: continue enum_name = enum_name_match.group(0) body = match.group(2) # 在 parse_c_power_definitions() 中 body_no_comment = re.sub(r'//.*|/\*.*?\*/', '', body, flags=re.DOTALL) # 只提取 = 数字 的宏 valid_assignments = re.findall( r'(LOCALE_[A-Za-z0-9_]+)\s*=\s*(-?\b\d+\b)', body_no_comment ) macro_list = [m[0] for m in valid_assignments] value_map = {m: int(v) for m, v in valid_assignments} self.locale_enums[enum_name] = { "macros": macro_list, "values": value_map, "raw_body": body } self.logger.info(f" 解析枚举 {enum_name}: {len(macro_list)} 个宏") except Exception as e: self.logger.error(f"解析 ENUM 失败: {e}", exc_info=True) # --- 解析 TABLE 区域 --- try: table_start_idx = content.find(self.start_table_marker) table_end_idx = content.find(self.end_table_marker) if table_start_idx == -1 or table_end_idx == -1: raise ValueError("未找到 POWER TABLE 标记块") table_block = content[table_start_idx:table_end_idx] start_line = self.offset_to_lineno(content, table_start_idx) end_line = self.offset_to_lineno(content, table_end_idx) self.logger.info(f"找到 TABLE 标记范围:第 {start_line} 行 → 第 {end_line} 行") # === 增强解析 TABLE:按 /* Locale X */ 分块提取 === array_matches = list(re.finditer( r''' ^ # 行首(配合 MULTILINE) \s* # 可选前导空白 (?:static\s+)? # 可选 static (?:const\s+)? # 可选 const (?:PROGMEM\s+)? # 可选 PROGMEM(常见于嵌入式) (?:unsigned\s+char|uint8_t) # 支持两种类型 \s+ # 类型与变量名之间至少一个空白 ([a-zA-Z_]\w*) # 数组名(如 locales_2g_ht) \s*\[\s*\] # 匹配 [ ],允许空格 ''', table_block, re.VERBOSE | re.MULTILINE | re.IGNORECASE )) if not array_matches: self.logger.warning("未在 TABLE 区域找到任何 power table 数组定义") # === 新增调试信息 === sample = table_block[:1000] self.logger.debug(f"TABLE block 前 1000 字符内容:\n{sample}") else: for match in array_matches: table_name = match.group(1) self.logger.info( f" 找到数组定义: {table_name} @ 第 {self.offset_to_lineno(table_block, match.start())} 行") self.logger.debug(f" 正则匹配到数组名: '{table_name}' (原始匹配: {match.group(0)})") self.logger.debug(f" match.end() = {match.end()}, " f"后续字符 = '{table_block[match.end():match.end() + 20].replace(chr(10), '\\n')}'") # 查找 '{' 的位置 brace_start = table_block.find('{', match.end()) if brace_start == -1: self.logger.warning(f" 未找到 起始符 → 跳过数组 {table_name}") continue else: self.logger.debug( f" 找到 '{{' 位置: 偏移量 {brace_start}, 行号 {self.offset_to_lineno(table_block, brace_start)}") # 提取大括号内的内容 inner_content, end_pos = self._extract_brace_content(table_block, brace_start) if inner_content is None: self.logger.warning(f" 提取 {table_name} 的大括号内容失败 → inner_content 为 None") continue else: self.logger.info(f" 成功提取 {table_name} 的大括号内容,长度: {len(inner_content)} 字符") # self.logger.info(f"--- 开始 ---") # self.logger.info(f"{inner_content}") # self.logger.info(f"--- 结束 ---") # 按行分割 lines = inner_content.splitlines() self.logger.info(f" {table_name} 共提取 {len(lines)} 行数据") # 可选:打印前几行预览(避免日志爆炸) preview_lines = min(10, len(lines)) for i in range(preview_lines): self.logger.debug(f"[{i:2d}] {lines[i]}") if len(lines) > 10: self.logger.debug("... 还有更多行") # 逐行解析 body_content,按 /* Locale X */ 分块 entries = [] # 存储每一块: {'locale_tag': 'a_359', 'lines': [...]} current_block = [] current_locale = None for line_num, line in enumerate(lines): stripped = line.strip() self.logger.debug(f"[Line {line_num:3d}] |{line}|") # 原始行(含空白) self.logger.debug(f" → stripped: |{stripped}|") # 检查是否是新的 Locale 注释 comment_match = re.match(r'/\*\s*Locale\s+([A-Za-z0-9_-]+)\s*\([^)]+\)\s*\*/', stripped, re.IGNORECASE) if comment_match: # 保存上一个 block if current_locale and current_block: entries.append({ 'locale_tag': current_locale, 'lines': [ln.rstrip(',').rstrip() for ln in current_block] }) # self.logger.info( # f" 保存前一个 Locale 数据块: {current_locale.} ({len(current_block)} 行)") # 开始新 block raw_name = comment_match.group(1) # 如 A-359 normalized = raw_name.replace('-', '_') # → A_359 current_locale = normalized current_block = [] #self.logger.info(f" 发现新 Locale 注释: '{raw_name}' → 标准化为 '{normalized}'") continue # 忽略空行、纯注释行 clean_line = re.sub(r'/\*.*?\*/|//.*', '', stripped).strip() if clean_line: current_block.append(stripped) self.logger.debug(f" 添加有效行: {stripped}") else: if not stripped: self.logger.debug(" 忽略空行") elif '//' in stripped or ('/*' in stripped and '*/' in stripped): self.logger.debug(f" 忽略纯注释行: {stripped}") else: self.logger.warning(f" 可疑但未处理的行: {stripped}") # 可能是跨行注释开头 # 保存最后一个 block if current_locale and current_block: entries.append({ 'locale_tag': current_locale, 'lines': [ln.rstrip(',').rstrip() for ln in current_block] }) self.power_tables[table_name] = entries self.logger.info(f" 解析数组 {table_name}: {len(entries)} 个 Locale 数据块") except Exception as e: self.logger.error(f"解析 TABLE 失败: {e}", exc_info=True) def validate_and_repair(self): self.logger.info("对原始数据块进行验证和修复...") self.logger.info("...") modified = False changes = [] # 提取所有 Locale 原始数据块(已由 extract_all_raw_locale_data 返回原始行) tx_power_data = self.extract_all_raw_locale_data() for target in self.locale_targets: enum_name = target["enum"] table_name = target["table"] suffix = target["suffix"] # 关键字段检查 if "assigned_locale" not in target: raise KeyError(f"locale_targets 缺少 'assigned_locale': {target}") locale = target["assigned_locale"] macro_name = f"LOCALE_{suffix}_IDX_{locale.replace('-', '_')}" # 检查是否能在源文件中找到该 Locale 数据 if locale not in tx_power_data: self.logger.warning(f" 在 tx_limit_table.c 中找不到 Locale 数据: {locale}") continue # 获取原始行列表(含缩进、注释、逗号) data_lines = tx_power_data[locale] # ← 这些是原始字符串行 # --- 处理 ENUM --- if enum_name not in self.locale_enums: self.logger.warning(f"未找到枚举定义: {enum_name}") continue enum_data = self.locale_enums[enum_name] macros = enum_data["macros"] values = enum_data["values"] next_idx = self._get_next_enum_index(enum_name) if macro_name not in macros: macros.append(macro_name) values[macro_name] = next_idx changes.append(f"ENUM + {macro_name} = {next_idx}") modified = True if "pending_updates" not in enum_data: enum_data["pending_updates"] = [] enum_data["pending_updates"].append((macro_name, next_idx)) # --- 处理 TABLE --- if table_name not in self.power_tables: self.logger.warning(f"未找到 power table 数组: {table_name}") continue self.logger.info(f"找到 power table 数组: {table_name}") current_entries = self.power_tables[table_name] # 已加载的条目列表 # 归一化目标 locale 名称用于比较 target_locale_normalized = locale.replace('-', '_') self.logger.debug(f" 目标 Locale 名称: {locale} → 标准化为 {target_locale_normalized}") # 检查是否已存在(仅比对 locale_tag) self.logger.debug(f"当前 {table_name} 中已有的 locale_tags: {[e['locale_tag'] for e in current_entries]}") already_exists = any( entry['locale_tag'] == target_locale_normalized for entry in current_entries ) if already_exists: self.logger.warning(f"Locale '{locale}' 已存在于 {table_name},跳过") continue # 直接记录原始行,不再清洗! current_entries.append({ 'locale_tag': target_locale_normalized, 'lines': data_lines # 原样保存原始行(用于后续显示或校验) }) changes.append(f"TABLE + {len(data_lines)} 行 → {table_name}") modified = True # 记录待写入的数据块(包含原始带格式内容) if table_name not in self.table_pending_appends: self.table_pending_appends[table_name] = [] self.table_pending_appends[table_name].append({ 'locale_tag': locale, # 原始名称 'data_lines': data_lines # 完整原始行(含缩进、注释、逗号) }) if changes: self.logger.info(f"共需添加 {len(changes)} 项:\n" + "\n".join(f" → {ch}" for ch in changes)) return modified def _get_next_enum_index(self, enum_name): """基于已解析的 values 获取下一个可用索引""" if enum_name not in self.locale_enums: self.logger.warning(f"未找到枚举定义: {enum_name}") return 0 value_map = self.locale_enums[enum_name]["values"] # 直接使用已解析的数据 if not value_map: return 0 # 只考虑非负数(排除 CLM_LOC_NONE=-1, CLM_LOC_SAME=-2 等保留值) used_indices = [v for v in value_map.values() if v >= 0] if used_indices: next_idx = max(used_indices) + 1 else: next_idx = 0 # 没有有效数值时从 0 开始 return next_idx def extract_all_raw_locale_data(self) -> Dict[str, List[str]]: """ 从 output/tx_limit_table.c 中提取所有 /* Locale XXX */ 后面的数据块(直到下一个 Locale 或 EOF) 使用逐行解析,保留原始格式(含缩进、注释、逗号),不进行任何清洗 """ lines = self.power.splitlines() locale_data = {} current_locale = None current_block = [] for i, line in enumerate(lines): stripped = line.strip() # 检查是否是新的 Locale 标记 match = re.match(r'/\*\s*Locale\s+([A-Za-z0-9_]+)\s*\*/', stripped, re.IGNORECASE) if match: # 保存上一个 block(直接保存原始行,不清洗) if current_locale: locale_data[current_locale] = current_block self.logger.debug(f" 已提取 Locale {current_locale},共 {len(current_block)} 行") # 开始新 block current_locale = match.group(1) current_block = [] self.logger.debug(f" 发现 Locale: {current_locale}") continue # 收集当前 locale 的内容(原样保留) if current_locale is not None: current_block.append(line.rstrip('\r\n')) # 仅去除换行符,其他不变 # 处理最后一个 block if current_locale: locale_data[current_locale] = current_block self.logger.debug(f" 已提取最后 Locale {current_locale},共 {len(current_block)} 行") self.logger.info(f" 成功提取 {len(locale_data)} 个 Locale 数据块: {list(locale_data.keys())}") return locale_data def _write_back_in_blocks(self): """将修改后的 enum 和 table 块写回原 C 文件,基于锚点 block 精准更新""" self.logger.info("正在写回修改后的数据...") if self.dry_run: self.logger.info("DRY-RUN: 跳过写入文件") return try: content = self.c_file_path.read_text(encoding='utf-8') # === Step 1: 查找所有锚点位置 === enum_start = content.find(self.start_enum_marker) enum_end = content.find(self.end_enum_marker) table_start = content.find(self.start_table_marker) table_end = content.find(self.end_table_marker) if -1 in (enum_start, enum_end, table_start, table_end): missing = [] if enum_start == -1: missing.append(f"起始 ENUM: {self.start_enum_marker}") if enum_end == -1: missing.append(f"结束 ENUM: {self.end_enum_marker}") if table_start == -1: missing.append(f"起始 TABLE: {self.start_table_marker}") if table_end == -1: missing.append(f"结束 TABLE: {self.end_table_marker}") raise ValueError(f"未找到锚点标记: {missing}") enum_block = content[enum_start:enum_end] table_block = content[table_start:table_end] self.logger.info(f" 修改枚举范围: 第 {self.offset_to_lineno(content, enum_start)} 行 → " f"{self.offset_to_lineno(content, enum_end)} 行") self.logger.info(f" 修改数组范围: 第 {self.offset_to_lineno(content, table_start)} 行 → " f"{self.offset_to_lineno(content, table_end)} 行") replacements = [] # (start, end, replacement) def remove_comments(text): text = re.sub(r'//.*$', '', text, flags=re.MULTILINE) text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL) return text.strip() # === Step 3: 更新 ENUMs === for target in self.locale_targets: enum_name_key = target["enum"] enum_data = self.locale_enums.get(enum_name_key) if not enum_data or "pending_updates" not in enum_data: continue insertions = enum_data["pending_updates"] if not insertions: continue pattern = re.compile( rf'(enum\s+{re.escape(enum_name_key)}\s*\{{)([^}}]*)\}}\s*;', re.DOTALL | re.IGNORECASE ) match = pattern.search(enum_block) if not match: self.logger.warning(f"未找到枚举: {enum_name_key}") continue header_part = match.group(1) body_content = match.group(2) lines = [ln for ln in body_content.split('\n') if ln.strip()] last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " expanded_last = last_line.expandtabs(4) clean_last = remove_comments(last_line) first_macro_match = re.search(r'LOCALE_[A-Z0-9_]+', clean_last) default_indent_len = len(line_indent.replace('\t', ' ')) target_macro_col = default_indent_len if first_macro_match: raw_before = last_line[:first_macro_match.start()] expanded_before = raw_before.expandtabs(4) target_macro_col = len(expanded_before) eq_match = re.search(r'=\s*\d+', clean_last) if eq_match and first_macro_match: eq_abs_start = first_macro_match.start() + eq_match.start() raw_eq_part = last_line[:eq_abs_start] expanded_eq_part = raw_eq_part.expandtabs(4) target_eq_col = len(expanded_eq_part) else: target_eq_col = target_macro_col + 30 new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' for macro_name, next_idx in insertions: current_visual_len = len(macro_name.replace('\t', ' ')) padding_to_eq = max(1, target_eq_col - target_macro_col - current_visual_len) formatted_macro = f"{macro_name}{' ' * padding_to_eq}= {next_idx}" visible_macros = len(re.findall(r'LOCALE_[A-Z0-9_]+', clean_last)) MAX_PER_LINE = 4 if visible_macros < MAX_PER_LINE and last_line.strip(): insertion = f" {formatted_macro}," updated_last = last_line.rstrip() + insertion new_body = body_content.rsplit(last_line, 1)[0] + updated_last last_line = updated_last clean_last = remove_comments(last_line) else: raw_indent_len = len(line_indent.replace('\t', ' ')) leading_spaces = max(0, target_macro_col - raw_indent_len) prefix_padding = ' ' * leading_spaces new_line = f"\n{line_indent}{prefix_padding}{formatted_macro}," new_body += new_line last_line = new_line.strip() clean_last = remove_comments(last_line) new_enum = f"{header_part}{new_body}\n}};" full_start = enum_start + match.start() full_end = enum_start + match.end() replacements.append((full_start, full_end, new_enum)) self.logger.debug(f"插入 ENUM: {dict(insertions)}") enum_data.pop("pending_updates", None) # === Step 4: 更新 TABLEs —— 使用 pending_appends 中的数据 === seen = set() table_names = [] for target in self.locale_targets: name = target["table"] if name not in seen: table_names.append(name) seen.add(name) for table_name in table_names: if table_name not in self.power_tables: self.logger.debug(f"跳过未定义的表: {table_name}") continue if table_name not in self.table_pending_appends: self.logger.debug(f"无待插入数据: {table_name}") continue data_to_insert = self.table_pending_appends[table_name] if not data_to_insert: continue pattern = re.compile( rf'(\b{re.escape(table_name)}\s*\[\s*\]\s*=\s*\{{)(.*?)(\}}\s*;\s*)', re.DOTALL | re.IGNORECASE ) match = pattern.search(table_block) if not match: self.logger.warning(f"未找到数组定义: {table_name}") continue header_part = match.group(1) body_content = match.group(2) footer_part = match.group(3) lines = [ln for ln in body_content.split('\n') if ln.strip()] last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " new_body = body_content.rstrip() # ==== 遍历每个待插入的 locale 数据块 ==== for item in data_to_insert: locale_tag = item['locale_tag'] locale_display = locale_tag.replace('_', '-') macro_suffix = locale_tag # 添加注释标记(与原始风格一致) new_body += f"\n{line_indent}/* Locale {locale_display} ({macro_suffix}) */" # 原始行加空格,不 strip,不加额外 indent for raw_line in item['data_lines']: # 仅排除纯空白行(可选),保留所有格式 if raw_line.strip(): # 排除空行 # 使用原始缩进,不再加 {line_indent} new_body += f"\n{line_indent}{raw_line}" # 构造新 table 内容 full_start = table_start + match.start() full_end = table_start + match.end() new_table = f"{header_part}{new_body}\n{footer_part}" replacements.append((full_start, full_end, new_table)) self.logger.debug(f"插入{len(data_to_insert)} 个 Locale 数据块到 {table_name}") # 清除防止重复写入 self.table_pending_appends.pop(table_name, None) # === Step 5: 应用所有替换(倒序避免偏移错乱)=== if not replacements: self.logger.info("无任何修改需要写入") return replacements.sort(key=lambda x: x[0], reverse=True) # 倒序应用 final_content = content for start, end, r in replacements: #self.logger.info(f"增加 [{start}:{end}] → 新内容:\n{r[:150]}...") final_content = final_content[:start] + r + final_content[end:] if content == final_content: self.logger.info("文件内容未发生变化,无需写入") return # 备份原文件 backup_path = self.c_file_path.with_suffix('.c.bak') copy2(self.c_file_path, backup_path) self.logger.info(f"已备份 → {backup_path}") # 写入新内容 self.c_file_path.write_text(final_content, encoding='utf-8') self.logger.info(f"成功写回 C 文件: {self.c_file_path}") self.logger.info(f"共更新 {len(replacements)} 个区块") except Exception as e: self.logger.error(f"写回文件失败: {e}", exc_info=True) raise def run(self): self.logger.info("开始同步 POWER LOCALE 定义...") try: self.parse_c_power_definitions() was_modified = self.validate_and_repair() if was_modified: if self.dry_run: self.logger.info("预览模式:检测到变更,但不会写入文件") else: self._write_back_in_blocks() # 执行写入操作 self.logger.info("同步完成:已成功更新 C 文件") else: self.logger.info("所有 Locale 已存在,无需修改") return was_modified except Exception as e: self.logger.error(f"同步失败: {e}", exc_info=True) raise def main(): logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(name)s: %(message)s', handlers=[ logging.FileHandler(LOG_FILE, encoding='utf-8'), logging.StreamHandler(sys.stdout) ], force=True ) logger = logging.getLogger(__name__) # 固定配置 c_file_path = "input/wlc_clm_data_6726b0.c" dry_run = False log_level = "INFO" config_path = "config/config.json" logging.getLogger().setLevel(log_level) print(f"开始同步 POWER LOCALE 定义...") print(f"C 源文件: {c_file_path}") if dry_run: print("启用 dry-run 模式:仅预览变更,不修改文件") try: sync = PowerTableSynchronizer( c_file_path=None, dry_run=dry_run, config_path=config_path, ) sync.run() print("同步完成!") print(f"详细日志已保存至: {LOG_FILE}") except FileNotFoundError as e: logger.error(f"文件未找到: {e}") print("请检查文件路径是否正确。") sys.exit(1) except PermissionError as e: logger.error(f"权限错误: {e}") print("无法读取或写入文件,请检查权限。") sys.exit(1) except Exception as e: logger.error(f"程序异常退出: {e}", exc_info=True) sys.exit(1) if __name__ == '__main__': main() 现在模仿上面代码完善文件代码
10-25
C:\Users\admin\PyCharmMiscProject\.venv\Scripts\python.exe F:\excle_to_clm\rate_set\rate_sync.py 2025-10-25 17:04:11,054 [INFO] root: 资源路径: F:\excle_to_clm 2025-10-25 17:04:11,055 [INFO] root: 资源路径: F:\excle_to_clm 2025-10-25 17:04:11,055 [INFO] __main__.RateSetSynchronizer: 开始同步 RATE_SET 数据... 2025-10-25 17:04:11,106 [INFO] __main__.RateSetSynchronizer: 正在处理 C 文件: wlc_clm_data_6726b0.c 2025-10-25 17:04:11,106 [INFO] __main__.RateSetSynchronizer: → 处理子文件: 2G_20M_EXT4_rate_set.c 2025-10-25 17:04:11,107 [INFO] __main__.RateSetSynchronizer: 解析出 23 个已有枚举项 2025-10-25 17:04:11,109 [INFO] __main__.RateSetSynchronizer: 从文件中初步匹配到 6 个 rate set 定义块 2025-10-25 17:04:11,109 [INFO] __main__.RateSetSynchronizer: 共成功提取 6 个有效子集 2025-10-25 17:04:11,110 [INFO] __main__.RateSetSynchronizer: 开始构建注入内容,当前最大枚举值 = 23 2025-10-25 17:04:11,111 [INFO] __main__.RateSetSynchronizer: 构建完成:新增 246 个数据项,6 个索引,6 个枚举 2025-10-25 17:04:11,111 [INFO] __main__.RateSetSynchronizer: 开始执行局部块写入操作... 2025-10-25 17:04:11,111 [WARNING] __main__.RateSetSynchronizer: ❌ 处理文件失败 [2G_20M_EXT4_rate_set.c]: 未找到枚举定义: rate_set_2g_20m_ext4 Traceback (most recent call last): File "F:\excle_to_clm\rate_set\rate_sync.py", line 353, in inject_new_data updated_content = self._write_back_in_blocks( full_content, parsed, new_data, new_indices, new_enums ) File "F:\excle_to_clm\rate_set\rate_sync.py", line 408, in _write_back_in_blocks raise ValueError(f"未找到枚举定义: {self.enum_name}") ValueError: 未找到枚举定义: rate_set_2g_20m_ext4 2025-10-25 17:04:11,116 [INFO] __main__.RateSetSynchronizer: → 处理子文件: 2G_20M_EXT_rate_set.c 2025-10-25 17:04:11,116 [INFO] __main__.RateSetSynchronizer: 解析出 19 个已有枚举项 2025-10-25 17:04:11,118 [INFO] __main__.RateSetSynchronizer: 从文件中初步匹配到 6 个 rate set 定义块 2025-10-25 17:04:11,118 [INFO] __main__.RateSetSynchronizer: 共成功提取 6 个有效子集 2025-10-25 17:04:11,118 [INFO] __main__.RateSetSynchronizer: 开始构建注入内容,当前最大枚举值 = 19 2025-10-25 17:04:11,118 [INFO] __main__.RateSetSynchronizer: 构建完成:新增 194 个数据项,6 个索引,6 个枚举 2025-10-25 17:04:11,118 [INFO] __main__.RateSetSynchronizer: 开始执行局部块写入操作... 2025-10-25 17:04:11,118 [WARNING] __main__.RateSetSynchronizer: ❌ 处理文件失败 [2G_20M_EXT_rate_set.c]: 未找到枚举定义: rate_set_2g_20m_ext Traceback (most recent call last): File "F:\excle_to_clm\rate_set\rate_sync.py", line 353, in inject_new_data updated_content = self._write_back_in_blocks( full_content, parsed, new_data, new_indices, new_enums ) File "F:\excle_to_clm\rate_set\rate_sync.py", line 408, in _write_back_in_blocks raise ValueError(f"未找到枚举定义: {self.enum_name}") ValueError: 未找到枚举定义: rate_set_2g_20m_ext 2025-10-25 17:04:11,119 [INFO] __main__.RateSetSynchronizer: → 处理子文件: 2G_20M_rate_set.c 2025-10-25 17:04:11,119 [INFO] __main__.RateSetSynchronizer: 解析出 32 个已有枚举项 2025-10-25 17:04:11,120 [INFO] __main__.RateSetSynchronizer: 从文件中初步匹配到 8 个 rate set 定义块 2025-10-25 17:04:11,120 [INFO] __main__.RateSetSynchronizer: 共成功提取 8 个有效子集 2025-10-25 17:04:11,120 [INFO] __main__.RateSetSynchronizer: 开始构建注入内容,当前最大枚举值 = 32 2025-10-25 17:04:11,120 [INFO] __main__.RateSetSynchronizer: 构建完成:新增 184 个数据项,8 个索引,8 个枚举 2025-10-25 17:04:11,120 [INFO] __main__.RateSetSynchronizer: 开始执行局部块写入操作... 2025-10-25 17:04:11,122 [INFO] __main__.RateSetSynchronizer: 成功构建新内容,总长度变化: 3905640 → 3910047 2025-10-25 17:04:11,122 [INFO] __main__.RateSetSynchronizer: ✅ 成功注入 8 条目到 rate_set_2g_20m 2025-10-25 17:04:11,122 [INFO] __main__.RateSetSynchronizer: → 处理子文件: 2G_40M_EXT4_rate_set.c 2025-10-25 17:04:11,123 [INFO] __main__.RateSetSynchronizer: 解析出 19 个已有枚举项 2025-10-25 17:04:11,125 [INFO] __main__.RateSetSynchronizer: 从文件中初步匹配到 5 个 rate set 定义块 2025-10-25 17:04:11,125 [INFO] __main__.RateSetSynchronizer: 共成功提取 5 个有效子集 2025-10-25 17:04:11,125 [INFO] __main__.RateSetSynchronizer: 开始构建注入内容,当前最大枚举值 = 19 2025-10-25 17:04:11,125 [INFO] __main__.RateSetSynchronizer: 构建完成:新增 241 个数据项,5 个索引,5 个枚举 2025-10-25 17:04:11,125 [INFO] __main__.RateSetSynchronizer: 开始执行局部块写入操作... 2025-10-25 17:04:11,125 [WARNING] __main__.RateSetSynchronizer: ❌ 处理文件失败 [2G_40M_EXT4_rate_set.c]: 未找到枚举定义: rate_set_2g_40m_ext4 Traceback (most recent call last): File "F:\excle_to_clm\rate_set\rate_sync.py", line 353, in inject_new_data updated_content = self._write_back_in_blocks( full_content, parsed, new_data, new_indices, new_enums ) File "F:\excle_to_clm\rate_set\rate_sync.py", line 408, in _write_back_in_blocks raise ValueError(f"未找到枚举定义: {self.enum_name}") ValueError: 未找到枚举定义: rate_set_2g_40m_ext4 2025-10-25 17:04:11,126 [INFO] __main__.RateSetSynchronizer: → 处理子文件: 2G_40M_EXT_rate_set.c 2025-10-25 17:04:11,126 [INFO] __main__.RateSetSynchronizer: 解析出 15 个已有枚举项 2025-10-25 17:04:11,127 [INFO] __main__.RateSetSynchronizer: 从文件中初步匹配到 5 个 rate set 定义块 2025-10-25 17:04:11,127 [INFO] __main__.RateSetSynchronizer: 共成功提取 5 个有效子集 2025-10-25 17:04:11,127 [INFO] __main__.RateSetSynchronizer: 开始构建注入内容,当前最大枚举值 = 15 2025-10-25 17:04:11,127 [INFO] __main__.RateSetSynchronizer: 构建完成:新增 189 个数据项,5 个索引,5 个枚举 2025-10-25 17:04:11,127 [INFO] __main__.RateSetSynchronizer: 开始执行局部块写入操作... 2025-10-25 17:04:11,128 [WARNING] __main__.RateSetSynchronizer: ❌ 处理文件失败 [2G_40M_EXT_rate_set.c]: 未找到枚举定义: rate_set_2g_40m_ext Traceback (most recent call last): File "F:\excle_to_clm\rate_set\rate_sync.py", line 353, in inject_new_data updated_content = self._write_back_in_blocks( full_content, parsed, new_data, new_indices, new_enums ) File "F:\excle_to_clm\rate_set\rate_sync.py", line 408, in _write_back_in_blocks raise ValueError(f"未找到枚举定义: {self.enum_name}") ValueError: 未找到枚举定义: rate_set_2g_40m_ext 2025-10-25 17:04:11,128 [INFO] __main__.RateSetSynchronizer: → 处理子文件: 2G_40M_rate_set.c 2025-10-25 17:04:11,128 [INFO] __main__.RateSetSynchronizer: 解析出 28 个已有枚举项 2025-10-25 17:04:11,130 [INFO] __main__.RateSetSynchronizer: 从文件中初步匹配到 3 个 rate set 定义块 2025-10-25 17:04:11,130 [INFO] __main__.RateSetSynchronizer: 共成功提取 3 个有效子集 2025-10-25 17:04:11,130 [INFO] __main__.RateSetSynchronizer: 开始构建注入内容,当前最大枚举值 = 28 2025-10-25 17:04:11,130 [INFO] __main__.RateSetSynchronizer: 构建完成:新增 91 个数据项,3 个索引,3 个枚举 2025-10-25 17:04:11,130 [INFO] __main__.RateSetSynchronizer: 开始执行局部块写入操作... 2025-10-25 17:04:11,130 [WARNING] __main__.RateSetSynchronizer: ❌ 处理文件失败 [2G_40M_rate_set.c]: 未找到枚举定义: rate_set_2g_40m Traceback (most recent call last): File "F:\excle_to_clm\rate_set\rate_sync.py", line 353, in inject_new_data updated_content = self._write_back_in_blocks( full_content, parsed, new_data, new_indices, new_enums ) File "F:\excle_to_clm\rate_set\rate_sync.py", line 408, in _write_back_in_blocks raise ValueError(f"未找到枚举定义: {self.enum_name}") ValueError: 未找到枚举定义: rate_set_2g_40m 2025-10-25 17:04:11,136 [INFO] __main__.RateSetSynchronizer: 原文件已备份为: wlc_clm_data_6726b0_20251025_170411.c.bak 2025-10-25 17:04:11,140 [INFO] __main__.RateSetSynchronizer: ✅ 成功写入更新后的文件: wlc_clm_data_6726b0.c ✅ 同步完成 同步完成! 进程已结束,退出代码为 0 # rate_set/rate_sync.py import json import os import re import logging import sys from pathlib import Path from utils import resource_path from datetime import datetime from typing import Dict, List, Tuple, Any # ------------------------------- # 日志配置 # ------------------------------- PROJECT_ROOT = Path(__file__).parent.parent.resolve() LOG_DIR = PROJECT_ROOT / "output" / "log" LOG_DIR.mkdir(parents=True, exist_ok=True) LOG_FILE = LOG_DIR / f"rate_sync_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log" class RateSetSynchronizer: MAX_ENUM_PER_LINE = 4 # enum 每行最多几个宏 MAX_DATA_ITEMS_PER_LINE = 4 # data 数组每行最多几个值 MAX_INDEX_ITEMS_PER_LINE = 15 # index 数组每行最多几个值 def __init__(self, c_file_path=None, dry_run=False, config_path="config/config.json"): self.logger = logging.getLogger(f"{__name__}.RateSetSynchronizer") # 加载配置 self.config_file_path = resource_path(config_path) if not os.path.exists(self.config_file_path): raise FileNotFoundError(f"配置文件不存在: {self.config_file_path}") with open(self.config_file_path, 'r', encoding='utf-8') as f: self.config = json.load(f) self.dry_run = dry_run # C 文件路径 if c_file_path is None: internal_c_path = self.config["target_c_file"] self.c_file_path = resource_path(internal_c_path) else: self.c_file_path = Path(c_file_path) if not self.c_file_path.exists(): raise FileNotFoundError(f"找不到 C 源文件: {self.c_file_path}") # === 单一锚点标记 === self.block_start = self.config["STR_RATE_SET_DATA"] self.block_end = self.config["END_RATE_SET_DATA"] # 数组与枚举名 self.data_array_name = "rate_sets_2g_20m" self.index_array_name = "rate_sets_index_2g_20m" self.enum_name = "rate_set_2g_20m" # 扫描所有 .c 文件(排除自身) self.rate_set_dir = Path(__file__).parent self.rate_files = [ f for f in self.rate_set_dir.iterdir() if f.is_file() and f.suffix == ".c" and f.name != "rate_sync.py" ] # 加载文件名和结构映射 self.target_map = self.config.get("target_map") if not isinstance(self.target_map, dict): raise ValueError("config.json 中缺少 'target_map' 字段或格式错误") self._validate_target_map() # ← 添加一致性校验 def _validate_target_map(self): """验证 target_map 是否一致,防止多个 full_key 映射到同一数组""" seen_data = {} seen_index = {} seen_enum = {} for key, cfg in self.target_map.items(): d = cfg["data"] i = cfg["index"] e = cfg["enum"] if d in seen_data: raise ValueError(f"data 数组冲突: '{d}' 被 '{seen_data[d]}' 和 '{key}' 同时使用") if i in seen_index: raise ValueError(f"index 数组冲突: '{i}' 被 '{seen_index[i]}' 和 '{key}' 同时使用") if e in seen_enum: raise ValueError(f"enum 名称冲突: '{e}' 被 '{seen_enum[e]}' 和 '{key}' 同时使用") seen_data[d] = key seen_index[i] = key seen_enum[e] = key def parse_filename(self, filename: str) -> str: """ 从文件名提取 band_bw_ext 类型键,用于查找 target_map 示例: 2G_20M_rate_set.c → 2G_20M_BASE 2G_20M_EXT_rate_set.c → 2G_20M_EXT 5G_80M_EXT4_rate_set.c → 5G_80M_EXT4 """ match = re.match(r'^([A-Z0-9]+)_([0-9]+M)(?:_(EXT\d*))?_rate_set\.c$', filename, re.I) if not match: raise ValueError(f"无法识别的文件名格式: {filename}") band, bw, ext = match.groups() ext_type = ext.upper() if ext else "BASE" return f"{band.upper()}_{bw.upper()}_{ext_type}" def extract_sub_rate_sets(self, content: str) -> List[Dict[str, Any]]: """ 提取 /*NAME*/ N, WL_RATE_xxx... 子集,支持多行、空格、换行等常见格式 """ sub_sets = [] # 移除所有 ); 结尾符号(不影响结构) cleaned_content = re.sub(r'[);]', '', content) # === 第一阶段:用非贪婪方式找出所有 /*...*/ N, ... 块 === # 匹配:/*NAME*/ 任意空白 数字 , 任意内容(直到下一个 /* 或结尾) block_pattern = r'/\*\s*([A-Z0-9_]+)\s*\*/\s*(\d+)\s*,?[\s\n]*((?:(?!\s*/\*\s*[A-Z0-9_]+\s*\*/).)*)' matches = re.findall(block_pattern, cleaned_content, re.DOTALL | re.IGNORECASE) self.logger.info(f"从文件中初步匹配到 {len(matches)} 个 rate set 定义块") for name, count_str, body in matches: try: count = int(count_str) except ValueError: self.logger.warning(f"计数无效,跳过: {name} = '{count_str}'") continue # 从 body 中提取所有 WL_RATE_XXX rate_items = re.findall(r'WL_RATE_[A-Za-z0-9_]+', body) if len(rate_items) < count: self.logger.warning(f"[{name}] 条目不足: 需要 {count}, 实际 {len(rate_items)} → 截断处理") rate_items = rate_items[:count] else: rate_items = rate_items[:count] self.logger.debug(f" 提取成功: {name} (count={count}) → {len(rate_items)} 项") sub_sets.append({ "name": name.strip(), "count": count, "rates": rate_items }) self.logger.info(f"共成功提取 {len(sub_sets)} 个有效子集") return sub_sets def parse_all_structures(self, full_content: str) -> Dict: """ 直接从完整 C 文件中解析 enum/data/index 结构 """ result = { 'existing_enum': {}, 'data_entries': [], 'index_values': [], 'data_len': 0 } # === 解析 enum === enum_pattern = rf'enum\s+{re.escape(self.enum_name)}\s*\{{([^}}]+)\}};' enum_match = re.search(enum_pattern, full_content, re.DOTALL) if enum_match: body = enum_match.group(1) entries = re.findall(r'(RATE_SET_[^=,\s]+)\s*=\s*(\d+)', body) result['existing_enum'] = {k: int(v) for k, v in entries} self.logger.info(f"解析出 {len(entries)} 个已有枚举项") else: self.logger.warning(f"未找到 enum 定义: {self.enum_name}") # === 解析 data 数组 === data_pattern = rf'static const unsigned char {re.escape(self.data_array_name)}\[\] = \{{([^}}]+)\}};' data_match = re.search(data_pattern, full_content, re.DOTALL) if not data_match: raise ValueError(f"未找到 data 数组: {self.data_array_name}") data_code = data_match.group(1) result['data_entries'] = [item.strip() for item in re.split(r'[,\n]+', data_code) if item.strip()] result['data_len'] = len(result['data_entries']) # === 解析 index 数组 === index_pattern = rf'static const unsigned short {re.escape(self.index_array_name)}\[\] = \{{([^}}]+)\}};' index_match = re.search(index_pattern, full_content, re.DOTALL) if not index_match: raise ValueError(f"未找到 index 数组: {self.index_array_name}") index_code = index_match.group(1) result['index_values'] = [int(x.strip()) for x in re.split(r'[,\n]+', index_code) if x.strip()] return result def build_injection(self, new_subsets: List[Dict], existing_enum: Dict[str, int], current_data_len: int) -> Tuple[List[str], List[int], List[str]]: """ 构建要注入的新内容 返回: (new_data, new_indices, new_enums) """ new_data = [] new_indices = [] new_enums = [] current_offset = 0 # 当前相对于新块起始的偏移 next_enum_value = max(existing_enum.values(), default=-1) + 1 self.logger.info(f"开始构建注入内容,当前最大枚举值 = {next_enum_value}") for subset in new_subsets: enum_name = subset["name"] # ✅ 使用完整名称,避免前缀冲突! if enum_name in existing_enum: self.logger.info(f"跳过已存在的枚举项: {enum_name} = {existing_enum[enum_name]}") current_offset += 1 + subset["count"] continue # 添加长度 + 所有速率 new_data.append(str(subset["count"])) new_data.extend(subset["rates"]) # 索引是“从旧 data 尾部开始”的全局偏移 global_index = current_data_len + current_offset new_indices.append(global_index) # 枚举定义 new_enums.append(f" {enum_name} = {next_enum_value}") self.logger.debug(f"新增枚举: {enum_name} → value={next_enum_value}, index={global_index}") next_enum_value += 1 current_offset += 1 + subset["count"] self.logger.info(f"构建完成:新增 {len(new_data)} 个数据项,{len(new_indices)} 个索引,{len(new_enums)} 个枚举") return new_data, new_indices, new_enums def format_list(self, items: List[str], indent: str = " ", width: int = 8) -> str: """格式化数组为多行字符串""" lines = [] for i in range(0, len(items), width): chunk = items[i:i + width] lines.append(indent + ", ".join(chunk) + ",") return "\n".join(lines).rstrip(",") def _safe_write_back(self, old_content: str, new_content: str) -> bool: """安全写回文件,带备份""" if old_content == new_content: self.logger.info("主文件内容无变化,无需写入") return False if self.dry_run: self.logger.info("DRY-RUN 模式启用,跳过实际写入") print("[DRY RUN] 差异预览(前 20 行):") diff = new_content.splitlines()[:20] for line in diff: print(f" {line}") return True # 创建备份 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") backup = self.c_file_path.with_name(f"{self.c_file_path.stem}_{timestamp}.c.bak") try: self.c_file_path.rename(backup) self.logger.info(f"原文件已备份为: {backup.name}") except Exception as e: self.logger.error(f"备份失败: {e}") raise # 写入新内容 try: self.c_file_path.write_text(new_content, encoding='utf-8') self.logger.info(f"✅ 成功写入更新后的文件: {self.c_file_path.name}") return True except Exception as e: self.logger.error(f"写入失败: {e}", exc_info=True) raise def inject_new_data(self) -> bool: try: full_content = self.c_file_path.read_text(encoding='utf-8') except Exception as e: self.logger.error(f"读取主 C 文件失败: {e}") raise self.logger.info(f"正在处理 C 文件: {self.c_file_path.name}") start_pos = full_content.find(self.block_start) end_pos = full_content.find(self.block_end) if start_pos == -1: raise ValueError(f"未找到起始锚点: {self.block_start}") if end_pos == -1: raise ValueError(f"未找到结束锚点: {self.block_end}") if end_pos <= start_pos: raise ValueError("结束锚点位于起始锚点之前") inner_start = start_pos + len(self.block_start) block_content = full_content[inner_start:end_pos].strip() all_changes_made = False # === 遍历每一个 rate set 子文件 === for file_path in self.rate_files: try: self.logger.info(f"→ 处理子文件: {file_path.name}") # --- 1. 解析文件名得到 full_key --- try: full_key = self.parse_filename(file_path.name) self.logger.debug(f" ├─ 解析出 key: {full_key}") except ValueError as ve: self.logger.warning(f" └─ 跳过无效文件名: {ve}") continue # --- 2. 查找 target_map 映射 --- target = self.target_map.get(full_key) if not target: self.logger.warning(f" └─ 未在 config.json 中定义映射关系: {full_key},跳过") continue # --- 3. 动态设置当前注入目标 --- self.data_array_name = target["data"] self.index_array_name = target["index"] self.enum_name = target["enum"] self.logger.debug(f" ├─ 绑定目标:") self.logger.debug(f" data: {self.data_array_name}") self.logger.debug(f" index: {self.index_array_name}") self.logger.debug(f" enum: {self.enum_name}") # --- 4. 解析主文件中的当前结构 --- try: parsed = self.parse_all_structures(full_content) except Exception as e: self.logger.error(f" └─ 解析主文件结构失败: {e}") continue # --- 5. 提取该子文件中的 rate sets --- file_content = file_path.read_text(encoding='utf-8') subsets = self.extract_sub_rate_sets(file_content) if not subsets: self.logger.info(f" └─ 无有效子集数据") continue # --- 6. 构建要注入的内容 --- new_data, new_indices, new_enums = self.build_injection( subsets, existing_enum=parsed['existing_enum'], current_data_len=parsed['data_len'] ) if not new_data: self.logger.info(f" └─ 无需更新") continue # --- 7. 写回新内容(精准插入)--- updated_content = self._write_back_in_blocks( full_content, parsed, new_data, new_indices, new_enums ) if updated_content != full_content: all_changes_made = True full_content = updated_content # 更新内存内容供后续文件使用 self.logger.info(f"✅ 成功注入 {len(subsets)} 条目到 {self.enum_name}") except Exception as e: self.logger.warning(f"❌ 处理文件失败 [{file_path.name}]: {e}", exc_info=True) continue # 最终写回磁盘 if all_changes_made: try: return self._safe_write_back(self.c_file_path.read_text(encoding='utf-8'), full_content) except Exception as e: self.logger.error(f"写入最终文件失败: {e}") raise else: self.logger.info("没有需要更新的内容") return False def _write_back_in_blocks(self, full_content: str, parsed: Dict, new_data: List[str], new_indices: List[int], new_enums: List[str]) -> str: """ 使用局部块操作策略:只在 /* START */ ... /* END */ 范围内修改内容 避免跨区域误改,无需额外边界校验 """ self.logger.info("开始执行局部块写入操作...") # === Step 1: 查找锚点位置并提取 block === start_pos = full_content.find(self.block_start) end_pos = full_content.find(self.block_end) if start_pos == -1 or end_pos == -1: raise ValueError(f"未找到锚点标记: {self.block_start} 或 {self.block_end}") if end_pos <= start_pos: raise ValueError("结束锚点位于起始锚点之前") inner_start = start_pos + len(self.block_start) block_content = full_content[inner_start:end_pos] replacements = [] # (start_in_block, end_in_block, replacement) def remove_comments(text: str) -> str: text = re.sub(r'//.*$', '', text, flags=re.MULTILINE) text = re.sub(r'/\*.*?\*/', '', text, flags=re.DOTALL) return text.strip() # === Step 2: 更新 ENUM === if new_enums: enum_pattern = rf'(enum\s+{re.escape(self.enum_name)}\s*\{{)([^}}]*)\}}\s*;' match = re.search(enum_pattern, block_content, re.DOTALL | re.IGNORECASE) if not match: raise ValueError(f"未找到枚举定义: {self.enum_name}") header = match.group(1) body_content = match.group(2) lines = [ln for ln in body_content.split('\n') if ln.strip()] last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " clean_last = remove_comments(last_line) first_macro_match = re.search(r'RATE_SET_[A-Z0-9_]+', clean_last) eq_match = re.search(r'=\s*\d+', clean_last) target_eq_col = 30 if first_macro_match and eq_match: raw_before_eq = last_line[:first_macro_match.start() + eq_match.start()] expanded_before_eq = raw_before_eq.expandtabs(4) target_eq_col = len(expanded_before_eq) new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' for enum_def in new_enums: macro_name = enum_def.split('=')[0].strip().split()[-1] value = enum_def.split('=')[1].strip().rstrip(',') current_len = len(macro_name.replace('\t', ' ')) padding = max(1, target_eq_col - current_len) formatted = f"{macro_name}{' ' * padding}= {value}" visible_macros = len(re.findall(r'RATE_SET_[A-Z0-9_]+', remove_comments(last_line))) if visible_macros < self.MAX_ENUM_PER_LINE and last_line.strip(): insertion = f" {formatted}," updated_last = last_line.rstrip() + insertion new_body = body_content.rsplit(last_line, 1)[0] + updated_last last_line = updated_last else: prefix_padding = ' ' * max(0, len(line_indent.replace('\t', ' ')) - len(line_indent)) new_line = f"\n{line_indent}{prefix_padding}{formatted}," new_body += new_line last_line = new_line.strip() new_enum_code = f"{header}{new_body}\n}};" replacements.append((match.start(), match.end(), new_enum_code)) self.logger.debug(f"计划更新 enum: 添加 {len(new_enums)} 项") # === Step 3: 更新 DATA 数组 === if new_data: data_pattern = rf'(static const unsigned char {re.escape(self.data_array_name)}\[\]\s*=\s*\{{)([^}}]*)(\}}\s*;)' match = re.search(data_pattern, block_content, re.DOTALL) if not match: raise ValueError(f"未找到 data 数组: {self.data_array_name}") header = match.group(1) body_content = match.group(2).strip() footer = match.group(3) lines = body_content.splitlines() last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' for i in range(0, len(new_data), self.MAX_DATA_ITEMS_PER_LINE): chunk = new_data[i:i + self.MAX_DATA_ITEMS_PER_LINE] line = "\n" + line_indent + ", ".join(chunk) + "," new_body += line new_data_code = f"{header}{new_body}\n{footer}" replacements.append((match.start(), match.end(), new_data_code)) self.logger.debug(f"计划更新 data 数组: 添加 {len(new_data)} 个元素") # === Step 4: 更新 INDEX 数组 === if new_indices: index_pattern = rf'(static const unsigned short {re.escape(self.index_array_name)}\[\]\s*=\s*\{{)([^}}]*)(\}}\s*;)' match = re.search(index_pattern, block_content, re.DOTALL) if not match: raise ValueError(f"未找到 index 数组: {self.index_array_name}") header = match.group(1) body_content = match.group(2).strip() footer = match.group(3) lines = body_content.splitlines() last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' str_indices = [str(x) for x in new_indices] chunk_size = self.MAX_INDEX_ITEMS_PER_LINE for i in range(0, len(str_indices), chunk_size): chunk = str_indices[i:i + chunk_size] line = "\n" + line_indent + ", ".join(chunk) + "," new_body += line new_index_code = f"{header}{new_body}\n{footer}" replacements.append((match.start(), match.end(), new_index_code)) self.logger.debug(f"计划更新 index 数组: 添加 {len(new_indices)} 个索引") # === Step 5: 倒序应用所有替换到 block_content === if not replacements: self.logger.info("无任何变更需要写入") return full_content # 倒序避免偏移错乱 for start, end, r in sorted(replacements, key=lambda x: x[0], reverse=True): block_content = block_content[:start] + r + block_content[end:] # === Step 6: 拼接回完整文件 === final_content = ( full_content[:inner_start] + block_content + full_content[end_pos:] ) self.logger.info(f"成功构建新内容,总长度变化: {len(full_content)} → {len(final_content)}") return final_content def run(self): self.logger.info("开始同步 RATE_SET 数据...") try: changed = self.inject_new_data() if changed: print("✅ 同步完成") else: print("✅ 无新数据,无需更新") return { "success": True, "changed": changed, "file": str(self.c_file_path), "backup": f"{self.c_file_path.stem}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.c.bak" if changed and not self.dry_run else None } except Exception as e: self.logger.error(f"同步失败: {e}", exc_info=True) print("❌ 同步失败,详见日志。") return {"success": False, "error": str(e)} def main(): logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(name)s: %(message)s', handlers=[ logging.FileHandler(LOG_FILE, encoding='utf-8'), logging.StreamHandler(sys.stdout) ], force=True ) dry_run = False # 设置为 True 可进行试运行 try: sync = RateSetSynchronizer(dry_run=dry_run) sync.run() print("同步完成!") except FileNotFoundError as e: logging.error(f"文件未找到: {e}") print("❌ 文件错误,请检查路径。") sys.exit(1) except PermissionError as e: logging.error(f"权限错误: {e}") print("❌ 权限不足,请关闭编辑器或以管理员运行。") sys.exit(1) except Exception as e: logging.error(f"程序异常退出: {e}", exc_info=True) print("❌ 同步失败,详见日志。") sys.exit(1) if __name__ == '__main__': main() 如何让self.logger.warning(f"❌ 处理文件失败 [{file_path.name}]: {e}", exc_info=True)之后不会报错
10-26
# rate_set/rate_sync.py import json import os import re import logging import sys from pathlib import Path from utils import resource_path from datetime import datetime from typing import Dict, List, Tuple, Any # ------------------------------- # 日志配置 # ------------------------------- PROJECT_ROOT = Path(__file__).parent.parent.resolve() LOG_DIR = PROJECT_ROOT / "output" / "log" LOG_DIR.mkdir(parents=True, exist_ok=True) LOG_FILE = LOG_DIR / f"rate_sync_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log" class RateSetSynchronizer: MAX_ENUM_PER_LINE = 4 # enum 每行最多几个宏 MAX_DATA_ITEMS_PER_LINE = 4 # data 数组每行最多几个值 MAX_INDEX_ITEMS_PER_LINE = 15 # index 数组每行最多几个值 def __init__(self, c_file_path=None, dry_run=False, config_path="config/config.json"): self.logger = logging.getLogger(f"{__name__}.RateSetSynchronizer") # 加载配置 self.config_file_path = resource_path(config_path) if not os.path.exists(self.config_file_path): raise FileNotFoundError(f"配置文件不存在: {self.config_file_path}") with open(self.config_file_path, 'r', encoding='utf-8') as f: self.config = json.load(f) self.dry_run = dry_run # C 文件路径 if c_file_path is None: internal_c_path = self.config["target_c_file"] self.c_file_path = resource_path(internal_c_path) else: self.c_file_path = Path(c_file_path) if not self.c_file_path.exists(): raise FileNotFoundError(f"找不到 C 源文件: {self.c_file_path}") # === 单一锚点标记 === self.block_start = self.config["STR_RATE_SET_DATA"] self.block_end = self.config["END_RATE_SET_DATA"] # 数组与枚举名 self.data_array_name = "rate_sets_2g_20m" self.index_array_name = "rate_sets_index_2g_20m" self.enum_name = "rate_set_2g_20m" # 扫描所有子目录中的 .c 文件(排除自身) self.rate_set_dir = Path(__file__).parent self.rate_files = [ f for f in self.rate_set_dir.rglob("*.c") # 递归匹配所有 .c 文件 if f.is_file() and f.name != "rate_sync.py" ] # 加载文件名和结构映射 self.target_map = self.config.get("rate_set_map") if not isinstance(self.target_map, dict): raise ValueError("config.json 中缺少 'rate_set_map' 字段或格式错误") self._validate_target_map() # ← 添加一致性校验 def _validate_target_map(self): """验证 rate_set_map 是否一致,防止多个 full_key 映射到同一数组""" seen_data = {} seen_index = {} seen_enum = {} for key, cfg in self.target_map.items(): d = cfg["data"] i = cfg["index"] e = cfg["enum"] if d in seen_data: raise ValueError(f"data 数组冲突: '{d}' 被 '{seen_data[d]}' 和 '{key}' 同时使用") if i in seen_index: raise ValueError(f"index 数组冲突: '{i}' 被 '{seen_index[i]}' 和 '{key}' 同时使用") if e in seen_enum: raise ValueError(f"enum 名称冲突: '{e}' 被 '{seen_enum[e]}' 和 '{key}' 同时使用") seen_data[d] = key seen_index[i] = key seen_enum[e] = key def parse_filename(self, filename: str) -> str: """ 从文件名提取 band_bw_ext 类型键,用于查找 rate_set_map 示例: 2G_20M_rate_set.c → 2G_20M_BASE 2G_20M_EXT_rate_set.c → 2G_20M_EXT 5G_80M_EXT4_rate_set.c → 5G_80M_EXT4 """ match = re.match(r'^([A-Z0-9]+)_([0-9]+M)(?:_(EXT\d*))?_rate_set\.c$', filename, re.I) if not match: raise ValueError(f"无法识别的文件名格式: {filename}") band, bw, ext = match.groups() ext_type = ext.upper() if ext else "BASE" return f"{band.upper()}_{bw.upper()}_{ext_type}" def extract_sub_rate_sets(self, content: str) -> List[Dict[str, Any]]: """ 提取 /*NAME*/ N, 后续多行 WL_RATE_xxx 列表 支持跨行、缩进、逗号、空行、注释干扰等 使用“按行扫描 + 状态机”方式,避免正则越界 """ self.logger.info("开始提取速率集...") self.logger.info("...") sub_sets = [] lines = [line.rstrip('\r\n') for line in content.splitlines()] # 保留原始行尾 i = 0 # 匹配 /*NAME*/ N, 的开头 header_pattern = re.compile(r'/\*\s*([A-Za-z0-9_]+)\s*\*/\s*(\d+)\s*,?') while i < len(lines): line = lines[i] stripped = line.strip() # 跳过空行和纯注释 if not stripped or stripped.startswith("//"): i += 1 continue # 查找头: /*NAME*/ N, match = header_pattern.search(stripped) if not match: i += 1 continue name = match.group(1) try: count = int(match.group(2)) except ValueError: self.logger.warning(f"⚠️ 计数无效,跳过: {name} = '{match.group(2)}'") i += 1 continue self.logger.info(f"🔍 发现块: {name}, 预期数量={count}") # 开始收集 body 内容(保留原始带缩进的行) body_lines = [] j = i + 1 max_lines_to_read = 200 while j < len(lines) and len(body_lines) < max_lines_to_read: ln = lines[j].strip() # 终止条件:遇到新 block / 结构结束 if ln.startswith("/*") or ln.startswith("}") or ln.startswith("enum"): break if ln and not ln.startswith("//"): body_lines.append(lines[j]) # ← 原样保存(含缩进) else: body_lines.append(lines[j]) # 也保留注释或空行(保持格式一致) j += 1 # 提取宏名用于校验(但不再用于生成数据) body_text = "\n".join(body_lines) all_macros = re.findall(r'WL_RATE_[A-Za-z0-9_]+', body_text) rate_items = all_macros[:count] if len(rate_items) < count: self.logger.warning(f"[{name}] 条目不足: 需要 {count}, 实际 {len(rate_items)}") # 构建结果:增加 raw_header 和 raw_body(关键改动) sub_sets.append({ "name": name, "count": count, "rates": rate_items, "raw_header": line, # ← 原始头行(如 /*...*/ 4,) "raw_body": body_lines, # ← 原始 body 行列表 "start_line": i, "end_line": j - 1 }) self.logger.debug(f"✅ 提取成功: {name} → {len(rate_items)} 个速率") i = j # 跳到下一个 block self.logger.info(f" 共提取 {len(sub_sets)} 个有效子集") return sub_sets def parse_all_structures(self, full_content: str) -> Dict: """ 直接从完整 C 文件中解析 enum/data/index 结构 """ self.logger.info("开始解析所有结构...") self.logger.info("...") result = { 'existing_enum': {}, 'data_entries': [], 'index_values': [], 'data_len': 0 } # === 解析 enum === enum_pattern = rf'enum\s+{re.escape(self.enum_name)}\s*\{{([^}}]+)\}};' enum_match = re.search(enum_pattern, full_content, re.DOTALL) if enum_match: body = enum_match.group(1) entries = re.findall(r'(RATE_SET_[^=,\s]+)\s*=\s*(\d+)', body) result['existing_enum'] = {k: int(v) for k, v in entries} self.logger.info(f"解析出 {len(entries)} 个已有枚举项") else: self.logger.warning(f"未找到 enum 定义: {self.enum_name}") # === 解析 data 数组 === data_pattern = rf'static const unsigned char {re.escape(self.data_array_name)}\[\] = \{{([^}}]+)\}};' data_match = re.search(data_pattern, full_content, re.DOTALL) if not data_match: raise ValueError(f"未找到 data 数组: {self.data_array_name}") data_code = data_match.group(1) result['data_entries'] = [item.strip() for item in re.split(r'[,\n]+', data_code) if item.strip()] result['data_len'] = len(result['data_entries']) # === 解析 index 数组 === index_pattern = rf'static const unsigned short {re.escape(self.index_array_name)}\[\] = \{{([^}}]+)\}};' index_match = re.search(index_pattern, full_content, re.DOTALL) if not index_match: raise ValueError(f"未找到 index 数组: {self.index_array_name}") index_code = index_match.group(1) result['index_values'] = [int(x.strip()) for x in re.split(r'[,\n]+', index_code) if x.strip()] return result def build_injection_with_format(self, new_subsets: List[Dict], existing_enum: Dict[str, int]) -> List[Dict]: """ 返回需要注入的原始块列表(包含 raw_header + raw_body) 不再返回 new_data/new_indices/new_enums """ valid_blocks = [] next_enum_value = max(existing_enum.values(), default=-1) + 1 self.logger.info(f"开始构建注入内容,当前最大枚举值 = {next_enum_value}") for subset in new_subsets: enum_name = subset["name"] if enum_name in existing_enum: self.logger.info(f"跳过已存在的枚举项: {enum_name} = {existing_enum[enum_name]}") continue # 只保存必要信息,不计算偏移 valid_blocks.append({ "enum_name": enum_name, "raw_header": subset["raw_header"], "raw_body": subset["raw_body"], "count": subset["count"], # 用于计算 data 占用空间 "enum_value": next_enum_value, # ✅ 必须存在! }) self.logger.debug(f" 准备注入: {enum_name}") self.logger.info(f"新增条目: {enum_name} enum={next_enum_value}") next_enum_value += 1 self.logger.info(f"构建完成:共 {len(valid_blocks)} 个新条目(保留原始格式)") return valid_blocks def format_list(self, items: List[str], indent: str = " ", width: int = 8) -> str: """格式化数组为多行字符串""" lines = [] for i in range(0, len(items), width): chunk = items[i:i + width] lines.append(indent + ", ".join(chunk) + ",") return "\n".join(lines).rstrip(",") def _safe_write_back(self, old_content: str, new_content: str) -> bool: """安全写回文件,带备份""" if old_content == new_content: self.logger.info("主文件内容无变化,无需写入") return False if self.dry_run: self.logger.info("DRY-RUN 模式启用,跳过实际写入") print("[DRY RUN] 差异预览(前 20 行):") diff = new_content.splitlines()[:20] for line in diff: print(f" {line}") return True # 创建备份 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") backup = self.c_file_path.with_name(f"{self.c_file_path.stem}_{timestamp}.c.bak") try: self.c_file_path.rename(backup) self.logger.info(f"原文件已备份为: {backup.name}") except Exception as e: self.logger.error(f"备份失败: {e}") raise # 写入新内容 try: self.c_file_path.write_text(new_content, encoding='utf-8') self.logger.info(f"✅ 成功写入更新后的文件: {self.c_file_path.name}") return True except Exception as e: self.logger.error(f"写入失败: {e}", exc_info=True) raise def inject_new_data(self) -> bool: try: full_content = self.c_file_path.read_text(encoding='utf-8') except Exception as e: self.logger.error(f"读取主 C 文件失败: {e}") raise self.logger.info(f"正在处理 C 文件: {self.c_file_path.name}") start_pos = full_content.find(self.block_start) end_pos = full_content.find(self.block_end) if start_pos == -1: raise ValueError(f"未找到起始锚点: {self.block_start}") if end_pos == -1: raise ValueError(f"未找到结束锚点: {self.block_end}") if end_pos <= start_pos: raise ValueError("结束锚点位于起始锚点之前") inner_start = start_pos + len(self.block_start) block_content = full_content[inner_start:end_pos].strip() all_changes_made = False # === 遍历每一个 rate set 子文件 === for file_path in self.rate_files: try: self.logger.info(f"→ 处理子文件: {file_path.name}") # --- 1. 解析文件名得到 full_key --- try: full_key = self.parse_filename(file_path.name) self.logger.debug(f" ├─ 解析出 key: {full_key}") except ValueError as ve: self.logger.warning(f" └─ 跳过无效文件名: {ve}") continue # --- 2. 查找 rate_set_map 映射 --- target = self.target_map.get(full_key) if not target: self.logger.warning(f" └─ 未在 config.json 中定义映射关系: {full_key},跳过") continue # --- 3. 动态设置当前注入目标 --- self.data_array_name = target["data"] self.index_array_name = target["index"] self.enum_name = target["enum"] self.logger.debug(f" ├─ 绑定目标:") self.logger.debug(f" data: {self.data_array_name}") self.logger.debug(f" index: {self.index_array_name}") self.logger.debug(f" enum: {self.enum_name}") # --- 4. 解析主文件中的当前结构 --- try: parsed = self.parse_all_structures(full_content) except Exception as e: self.logger.error(f" └─ 解析主文件结构失败: {e}") continue # --- 5. 提取该子文件中的 rate sets --- file_content = file_path.read_text(encoding='utf-8') subsets = self.extract_sub_rate_sets(file_content) if not subsets: self.logger.info(f" └─ 无有效子集数据") continue # --- 6. 构建要注入的内容 --- valid_blocks = self.build_injection_with_format( subsets, existing_enum=parsed['existing_enum'] ) if not valid_blocks: self.logger.info(f" └─ 无需更新") continue # --- 7. 写回新内容(精准插入)--- updated_content = self._write_back_in_blocks( full_content, parsed, valid_blocks ) if updated_content != full_content: all_changes_made = True full_content = updated_content # 更新内存内容供后续文件使用 self.logger.info(f"✅ 成功注入 {len(subsets)} 条目到 {self.enum_name}") except Exception as e: self.logger.warning(f"❌ 处理文件失败 [{file_path.name}]: {e}") if self.logger.isEnabledFor(logging.DEBUG): self.logger.debug("详细堆栈:", exc_info=True) continue # 最终写回磁盘 if all_changes_made: try: return self._safe_write_back(self.c_file_path.read_text(encoding='utf-8'), full_content) except Exception as e: self.logger.error(f"写入最终文件失败: {e}") raise else: self.logger.info("没有需要更新的内容") return False def _format_with_inline_fallback( self, lines: List[str], new_items: List[str], max_per_line: int, indent_marker: str = " ", item_separator: str = ", ", line_suffix: str = "", # 注意:现在我们不在这里加逗号! extract_func=None, align_eq_col: bool = False, detect_spacing_from_last_line: bool = True, ) -> str: if not lines: lines = [""] last_line = lines[-1].rstrip() indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else indent_marker clean_last = re.sub(r'//.*|/\*.*?\*/', '', last_line).strip() existing_items = extract_func(clean_last) if extract_func else re.findall(r'\w+', clean_last) current_count = len(existing_items) space_left = max(0, max_per_line - current_count) to_append_inline = new_items[:space_left] to_append_newline = new_items[space_left:] # === 检测真实分隔符 === actual_sep = item_separator if detect_spacing_from_last_line and len(existing_items) >= 2: first = re.escape(existing_items[0]) second = re.escape(existing_items[1]) match = re.search(f"({first})(\\s+)({second})", last_line) if match: actual_sep = match.group(2) # === 对齐等号列:关键修复 → 对齐后再加逗号 === formatted_new_items = [] for item in new_items: raw_item = item.rstrip(',') # 去掉可能已有的逗号避免重复 if align_eq_col: m = re.match(r'(\w+)\s*=\s*(\d+)', raw_item) if m: name, val = m.groups() # 计算目标列位置 target_eq_col = None for i in range(len(lines) - 1, -1, -1): ln = lines[i] eq_match = re.search(r'=\s*\d+', ln) if eq_match: raw_before = ln[:eq_match.start()] expanded_before = raw_before.expandtabs(4) target_eq_col = len(expanded_before) break if target_eq_col is None: target_eq_col = 30 padding = max(1, target_eq_col - len(name.replace('\t', ' ').expandtabs(4))) spaces = ' ' * padding aligned_item = f"{name}{spaces}= {val}" formatted_new_items.append(aligned_item) else: formatted_new_items.append(raw_item) else: formatted_new_items.append(raw_item) # 现在统一加逗号:每个 item 都要加! # 注意:是否加逗号应该由调用者或此函数控制,不要混合 final_formatted_items = [f"{item}," for item in formatted_new_items] to_append_inline = final_formatted_items[:space_left] to_append_newline = final_formatted_items[space_left:] # === 构建结果 === result_lines = lines[:-1] # 保留前面所有行 final_main_line = lines[-1].rstrip() # 添加 inline 项 if to_append_inline: joined_inline = actual_sep.join(to_append_inline) if final_main_line.strip(): final_main_line += actual_sep + joined_inline else: final_main_line = joined_inline result_lines.append(final_main_line) # 添加新行(每行最多 max_per_line 个) if to_append_newline: for i in range(0, len(to_append_newline), max_per_line): chunk = to_append_newline[i:i + max_per_line] joined = actual_sep.join(chunk) result_lines.append(f"{line_indent}{joined}") return '\n'.join(result_lines) def _write_back_in_blocks(self, full_content: str, parsed: Dict, valid_blocks: List[Dict]) -> str: """ 使用局部块操作策略:只在 /* START */ ... /* END */ 范围内修改内容 关键改进:直接插入 raw_header + raw_body,保留原始格式 """ self.logger.info("开始执行局部块写入操作...") self.logger.info("...") # 在 _write_back_in_blocks 最上方添加: base_data_offset = parsed['data_len'] current_new_data_size = 0 # 动态记录已写入的新 data 大小 start_pos = full_content.find(self.block_start) end_pos = full_content.find(self.block_end) if start_pos == -1 or end_pos == -1: raise ValueError(f"未找到锚点标记: {self.block_start} 或 {self.block_end}") if end_pos <= start_pos: raise ValueError("结束锚点位于起始锚点之前") inner_start = start_pos + len(self.block_start) block_content = full_content[inner_start:end_pos] replacements = [] # (start_in_block, end_in_block, replacement) # === Step 2: 更新 ENUM === if valid_blocks: # 提取函数:从字符串中提取 RATE_SET_xxx extract_enum = lambda s: re.findall(r'RATE_SET_[A-Z0-9_]+', s) enum_pattern = rf'(enum\s+{re.escape(self.enum_name)}\s*\{{)([^}}]*)\}}\s*;' match = re.search(enum_pattern, block_content, re.DOTALL | re.IGNORECASE) if not match: raise ValueError(f"未找到枚举定义: {self.enum_name}") header = match.group(1) # "enum rate_set_2g_20m {" body_content = match.group(2) lines = [ln.rstrip() for ln in body_content.splitlines() if ln.strip()] # 计算新值 new_macros = [] for block in valid_blocks: name = block["enum_name"] value = block["enum_value"] # ✅ 来自 build_injection_with_format 的正确值 new_macros.append(f"{name} = {value}") # === 关键:获取标准缩进 === indent_match = re.match(r'^(\s*)', lines[0] if lines else "") standard_indent = indent_match.group(1) if indent_match else " " # 格式化新 body new_body = self._format_with_inline_fallback( lines=lines, new_items=new_macros, max_per_line=self.MAX_ENUM_PER_LINE, indent_marker=standard_indent, item_separator=" ", line_suffix="", extract_func=extract_enum, align_eq_col=True, detect_spacing_from_last_line=True, ) # 关键修复:确保每行都有缩进(包括第一行) formatted_lines = [] for line in new_body.splitlines(): stripped = line.strip() if stripped: formatted_lines.append(f"{standard_indent}{stripped}") else: formatted_lines.append(line) final_body = '\n'.join(formatted_lines) # 关键:header 单独占一行,新 body 换行开始 new_enum_code = f"{header}\n{final_body}\n}};" replacements.append((match.start(), match.end(), new_enum_code)) self.logger.debug(f"更新 enum: 添加 {len(valid_blocks)} 项") # === Step 3: 更新 DATA 数组 === if valid_blocks: data_pattern = rf'(static const unsigned char {re.escape(self.data_array_name)}\[\]\s*=\s*\{{)([^}}]*)(\}}\s*;)' match = re.search(data_pattern, block_content, re.DOTALL) if not match: raise ValueError(f"未找到 data 数组: {self.data_array_name}") header = match.group(1) body_content = match.group(2).strip() footer = match.group(3) lines = body_content.splitlines() last_line = lines[-1] if lines else "" indent_match = re.match(r'^(\s*)', last_line) line_indent = indent_match.group(1) if indent_match else " " new_body = body_content.rstrip() if not new_body.endswith(','): new_body += ',' for block in valid_blocks: # 插入头行(如 /*...*/ 4,) new_body += f"\n{line_indent}{block['raw_header'].strip()}" # 插入每行 body(保持原始缩进) for raw_line in block["raw_body"]: new_body += f"\n{line_indent}{raw_line}" new_data_code = f"{header}{new_body}\n{footer}" replacements.append((match.start(), match.end(), new_data_code)) self.logger.debug(f"计划更新 data 数组: 添加 {len(valid_blocks)} 个原始块") # === Step 2: 更新 INDEX 数组 === if valid_blocks: index_pattern = rf'(static const unsigned short {re.escape(self.index_array_name)}\[\]\s*=\s*\{{)([^}}]*)(\}}\s*;)' match = re.search(index_pattern, block_content, re.DOTALL) if not match: raise ValueError(f"未找到 index 数组: {self.index_array_name}") header = match.group(1) body_content = match.group(2) footer = match.group(3).strip() lines = [ln.rstrip() for ln in body_content.splitlines()] non_empty_lines = [ln for ln in lines if ln.strip()] # 获取标准缩进(与 enum 一致) if non_empty_lines: indent_match = re.match(r'^(\s*)', non_empty_lines[0]) standard_indent = indent_match.group(1) if indent_match else " " else: standard_indent = " " # 生成新索引值 # 正确计算 index values:基于 data 偏移 + 每个 block 的实际大小 current_offset = parsed['data_len'] # 初始偏移 = 原 data 长度 new_index_values = [] for block in valid_blocks: # 添加当前 block 的起始偏移 new_index_values.append(str(current_offset)) # 偏移 += 当前 block 的数据条数 current_offset += block["count"]+1 # ← 使用 block 自带的 count! self.logger.info(f"生成新的 index values: {new_index_values}") # 格式化 index body formatted_body = self._format_with_inline_fallback( lines=non_empty_lines, new_items=new_index_values, max_per_line=self.MAX_INDEX_ITEMS_PER_LINE, indent_marker=standard_indent, item_separator=" ", line_suffix="", extract_func=lambda s: re.findall(r'\d+', s), detect_spacing_from_last_line=True, align_eq_col=False, ) # 统一添加缩进 final_lines = [] for line in formatted_body.splitlines(): stripped = line.strip() if stripped: final_lines.append(f"{standard_indent}{stripped}") else: final_lines.append("") final_body = '\n'.join(final_lines) new_index_code = f"{header}\n{final_body}\n{footer}" replacements.append((match.start(), match.end(), new_index_code)) self.logger.debug(f"更新 index 数组: 添加 {len(valid_blocks)} 个索引") # === Step 5: 倒序应用所有替换 === if not replacements: self.logger.info("无任何变更需要写入") return full_content for start, end, r in sorted(replacements, key=lambda x: x[0], reverse=True): block_content = block_content[:start] + r + block_content[end:] # === Step 6: 拼接回完整文件 === final_content = ( full_content[:inner_start] + block_content + full_content[end_pos:] ) self.logger.info(f"成功构建新内容,总长度变化: {len(full_content)} → {len(final_content)}") return final_content def run(self): self.logger.info("开始同步 RATE_SET 数据...") try: changed = self.inject_new_data() if changed: print(" 同步完成") else: print(" 无新数据,无需更新") return { "success": True, "changed": changed, "file": str(self.c_file_path), "backup": f"{self.c_file_path.stem}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.c.bak" if changed and not self.dry_run else None } except Exception as e: self.logger.error(f"同步失败: {e}", exc_info=True) print("❌ 同步失败,详见日志。") return {"success": False, "error": str(e)} def main(): logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(name)s: %(message)s', handlers=[ logging.FileHandler(LOG_FILE, encoding='utf-8'), logging.StreamHandler(sys.stdout) ], force=True ) dry_run = False # 设置为 True 可进行试运行 try: sync = RateSetSynchronizer(dry_run=dry_run) sync.run() print("同步完成!") except FileNotFoundError as e: logging.error(f"文件未找到: {e}") print(" 文件错误,请检查路径。") sys.exit(1) except PermissionError as e: logging.error(f"权限错误: {e}") print(" 权限不足,请关闭编辑器或以管理员运行。") sys.exit(1) except Exception as e: logging.error(f"程序异常退出: {e}", exc_info=True) print(" 同步失败,详见日志。") sys.exit(1) if __name__ == '__main__': main() 加一个最终打印变化
10-30
基于数据驱动的 Koopman 算子的递归神经网络模型线性化,用于纳米定位系统的预测控制研究(Matlab代码实现)内容概要:本文围绕“基于数据驱动的Koopman算子的递归神经网络模型线性化”展开,旨在研究纳米定位系统的预测控制问题,并提供完整的Matlab代码实现。文章结合数据驱动方法与Koopman算子理论,利用递归神经网络(RNN)对非线性系统进行建模与线性化处理,从而提升纳米级定位系统的精度与动态响应性能。该方法通过提取系统隐含动态特征,构建近似线性模型,便于后续模型预测控制(MPC)的设计与优化,适用于高精度自动化控制场景。文中还展示了相关实验验证与仿真结果,证明了该方法的有效性和先进性。; 适合人群:具备一定控制理论基础和Matlab编程能力,从事精密控制、智能制造、自动化或相关领域研究的研究生、科研人员及工程技术人员。; 使用场景及目标:①应用于纳米级精密定位系统(如原子力显微镜、半导体制造设备)中的高性能控制设计;②为非线性系统建模与线性化提供一种结合深度学习与现代控制理论的新思路;③帮助读者掌握Koopman算子、RNN建模与模型预测控制的综合应用。; 阅读建议:建议读者结合提供的Matlab代码逐段理解算法实现流程,重点关注数据预处理、RNN结构设计、Koopman观测矩阵构建及MPC控制器集成等关键环节,并可通过更换实际系统数据进行迁移验证,深化对方法泛化能力的理解。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值