228. Summary Ranges

本文深入探讨了排序算法的核心概念,包括快速排序、归并排序和堆排序,并详细阐述了它们的时间复杂度和应用场景。同时,文章介绍了算法优化技巧,如使用辅助数据结构和减少递归调用,以提升性能效率。

Given a sorted integer array without duplicates, return the summary of its ranges.

For example, given [0,1,2,4,5,7], return ["0->2","4->5","7"].

Credits:
Special thanks to @jianchao.li.fighter for adding this problem and creating all test cases.

Subscribe to see which companies asked this question

思路

1. 声明一个int型计数器count,判断nums中当前元素的值加上1是否等于下一个位置元素的值。如若相等,就执行cout++;然后迭代器移动到下一个位置,(这里就是一直把计数器增加,直到出现不相等的时候,进入下一种情况)再次判断第i个元素的值加上1是否等于下一个位置元素的值,如果不等,则利用计数器和当前的i值,format需要的summary of ranges,别忘了计数器要置为0,从而进入下一轮的判断。

2. 刚开始我只在nums的元素末尾加上0(或者其他非0数字)作为一个标兵(注意代码中判断了nums[i]和nums[i+1]的值,如果直接操作nums中元素,会造成out_of_range异常,这是因为最后一个元素是i,那么i+1就超出nums的范围,造成异常;因此需要加一个0或者其他非零数字(当nums=[-1]时,需要加入其他非零数字,不然会错误),但我们并不操作加入的数字)。

注意:

1. 我设计算法的核心思想是:① 记录和下一个位置元素值-1不相等的元素的位置;② 加入两种标兵(0和非0数字)来辅助程序设计。

2. 复习string字符串的数值转换参考:C++标准库,字符串

vector<string> summaryRanges(vector<int>& nums) {
	int len = nums.size();
	//nums.push_back(0);
	int count = 0;
	vector<string> temp;
	string str;

	/* ------ 运行报错后新加 ------- */
	// 需加判断nums是否为空
	if (len == 0)
		return temp;
	// 原来只在nums数组后面加入0;但是当输入的nums形参内只有-1时,会报错
	auto it = nums.end() - 1;
	if (*it == -1)
	{
		nums.push_back(1);
	}
	else
	{
		nums.push_back(0);
	}

	for (int i = 0; i < len; ++i)
	{
		if ((nums[i] + 1) == nums[i + 1]  )
		{
			++count;
		}
		else
		{
			/*if (count == 0 )
			{
				temp.push_back("\"" + to_string(nums[i]) + "\"" );
			}
			else
			{
				str = "\"" + to_string(nums[i - count]) + "->" + to_string(nums[i]) + "\"";
				temp.push_back(str);
				count = 0;
			}*/
			// 换成简单的代码
			// LeetCode上面不让加"\""
			str = (count == 0) ? ("\"" + to_string(nums[i]) + "\"") : ("\"" + to_string(nums[i - count])                                                                     + "->" + to_string(nums[i]) + "\"");
			temp.push_back(str);
			count = 0;
		}
	}
	return temp;
}

import pandas as pd import matplotlib.pyplot as plt import numpy as np from scipy import stats plt.rcParams['font.sans-serif'] = ['SimHei'] # 使用黑体显示中文 plt.rcParams['axes.unicode_minus'] = False # 解决保存图像时负号 '-' 显示为方块的问题 def data_preprocessing(df): df['监测时间'] = pd.to_datetime(df['监测时间'], format='%Y-%m-%d %H') df = df.set_index('监测时间') full_range = pd.date_range(start=df.index.min(), end=df.index.max(), freq='h') df_clean = df.reindex(full_range) valid_ranges = { 'PH 值': (6.0, 9.0), '氨氮排放量': (0.01, 1.5), '化学需氧量排放量': (0.05, 5.0) } for col, (min_val, max_val) in valid_ranges.items(): mask = df_clean[col].between(min_val, max_val) df_clean[col] = df_clean[col].where(mask).interpolate(method='spline', order=3) return df_clean def timing_drawing(df): plt.figure(figsize=(15, 9)) plt.subplot(3, 1, 1) plt.plot(df['PH 值'], color='#1f77b4') plt.title('PH 值时序变化趋势', fontproperties='SimHei') plt.ylabel('PH 值', fontproperties='SimHei') plt.subplot(3, 1, 2) plt.plot(df['氨氮排放量'], color='#2ca02c') plt.title('氨氮排放量时序变化趋势', fontproperties='SimHei') plt.ylabel('mg/L', fontproperties='SimHei') plt.subplot(3, 1, 3) plt.plot(df['化学需氧量排放量'], color='#d62728') plt.title('化学需氧量排放量时序变化趋势', fontproperties='SimHei') plt.ylabel('mg/L', fontproperties='SimHei') plt.tight_layout() plt.savefig('时间序列图.png', dpi=300) plt.show() def statistical_analysis(df): stats_summary = df.describe().transpose() median_values = df.median() stats_summary['median'] = median_values.values stats_summary = stats_summary[['mean', 'median', 'std', 'max', 'min']] # 确保输出的表格有正确的列名和指标名字 stats_summary.index.name = '指标' stats_summary.reset_index(inplace=True) stats_summary.columns = ['指标', '均值', '中位数', '标准差', '最大值', '最小值'] print("统计量分析表:") print(stats_summary.to_markdown(index=False)) if __name__ == "__main__": raw_df = pd.read_excel('B 题附件.xls') processed_df = data_preprocessing(raw_df) processed_df.reset_index().to_excel('预处理数据.xlsx', index=True) timing_drawing(processed_df) statistical_analysis(processed_df) 上述代码只需要PH值、氨氮排放量、化学需氧量排放量
03-24
# clm_generator/excel_to_clm.py import logging import os import sys from datetime import datetime import re import json from pathlib import Path from openpyxl import load_workbook import xlrd from jinja2 import Template from collections import defaultdict from utils import resource_path, get_output_dir from pathlib import Path # ------------------------------- # 日志配置 # ------------------------------- PROJECT_ROOT = Path(__file__).parent.resolve() LOG_DIR = PROJECT_ROOT / "output" / "log" LOG_DIR.mkdir(parents=True, exist_ok=True) LOG_FILE = LOG_DIR / f"excel_to_clm_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log" class ExcelToCLMConverter: def __init__(self, config_path="config/config.json", locale_display_name=None, locale_names_map=None): """ Args: config_path: 配置文件路径 locale_display_name: 兼容旧逻辑,默认 DEFAULT 名称(如 US) locale_names_map: 新增参数,用于设置每个 locale_target 的 assigned_locale 示例: { "locale_2g_idx": "us", "locale_2g_ht_idx": "us_ht", "locale_5g_idx": "eu_5g", ... } """ self.logger = logging.getLogger(__name__) self.logger.info("初始化完成") self.last_config = None self.used_ranges = [] # === Step 1: 加载配置文件 === self.config_file_path = resource_path(config_path) if not os.path.exists(self.config_file_path): raise FileNotFoundError(f"配置文件不存在: {self.config_file_path}") with open(self.config_file_path, 'r', encoding='utf-8') as f: self.config = json.load(f) self.logger.info(f" 配置文件已加载: {self.config_file_path}") # === Step 2: 处理 target_c_file === rel_c_path = self.config.get("target_c_file", "input/wlc_clm_data_6726b0.c") self.target_c_file = resource_path(rel_c_path) if not os.path.exists(self.target_c_file): raise FileNotFoundError(f"配置中指定的 C 源文件不存在: {self.target_c_file}") self.logger.info(f" 已定位目标 C 文件: {self.target_c_file}") # === Step 3: 初始化输出目录 === output_dir = self.config.get("output_path", "output") self.output_dir = resource_path(output_dir) Path(self.output_dir).mkdir(parents=True, exist_ok=True) # 确保存在 self.logger.info(f" 输出目录: {self.output_dir}") # === Step 4: 更新 locale_targets from locale_names_map === if locale_names_map and isinstance(locale_names_map, dict): updated_count = 0 for tgt in self.config.get("locale_targets", []): enum_key = tgt.get("enum") new_locale = locale_names_map.get(enum_key) if new_locale is not None: old_locale = tgt.get("assigned_locale") if old_locale != new_locale: tgt["assigned_locale"] = new_locale self.logger.info(f" 更新 {enum_key}: '{old_locale}' → '{new_locale}'") updated_count += 1 if updated_count > 0: # 写回 config.json try: with open(self.config_file_path, 'w', encoding='utf-8') as f: json.dump(self.config, f, indent=4, ensure_ascii=False) self.logger.info(f" 已保存 {updated_count} 个 assigned_locale 到 config.json") except Exception as e: self.logger.info(f" 写入 config.json 失败: {e}") # === Step 5: 设置默认 fallback 显示名(兼容旧逻辑)=== first_target = self.config.get("locale_targets", [{}])[0] fallback_locale = first_target.get("assigned_locale", "DEFAULT") self.locale_name = fallback_locale self.locale_display_name = ( locale_display_name or fallback_locale.replace('-', '_') ) # === Step 6: channel_set_map 加载 === persisted_map = self.config.get("channel_set_map") if persisted_map is None: raise KeyError("配置文件缺少必需字段 'channel_set_map'") if not isinstance(persisted_map, dict): raise TypeError(f"channel_set_map 必须是字典类型,当前类型: {type(persisted_map)}") self.channel_set_map = {str(k): int(v) for k, v in persisted_map.items()} self.logger.info(f" 成功加载 channel_set_map (共 {len(self.channel_set_map)} 项)") # === 初始化数据容器 === self.tx_power_data = [] self.tx_limit_entries = [] self.eirp_entries = [] self.global_ch_min = None self.global_ch_max = None self.generated_ranges = [] self.per_sheet_results = [] # 存放每个 sheet 的解析结果 def reset(self): """重置所有运行时数据,便于多次生成""" self.tx_limit_entries.clear() self.eirp_entries.clear() self.used_ranges.clear() self.tx_power_data.clear() # 同步清空 self.generated_ranges.clear() self.last_config = None self.logger.info(" 所有生成数据已重置") self.logger.info(f" 初始化完成。目标C文件: {self.target_c_file}") self.logger.info(f" 输出目录: {self.output_dir}") self.logger.info(f" Locale ID: {self.locale_name}") # ==================== 新增工具方法:大小写安全查询 ==================== def _ci_get(self, data_dict, key): """ Case-insensitive 字典查找 """ for k, v in data_dict.items(): if k.lower() == key.lower(): return v return None def _ci_contains(self, data_list, item): """ Case-insensitive 判断元素是否在列表中 """ return any(x.lower() == item.lower() for x in data_list) def parse_mode_cell(self, cell_value): # self.logger.info(" 正在执行 parse_mode_cell()...") # self.logger.info(" ...") if not cell_value: return None val = str(cell_value).strip() val = re.sub(r'\s+', ' ', val.replace('\n', ' ').replace('\r', ' ')) val_upper = val.upper() self.logger.info(f" 解析模式单元格: '{val}'") found_modes = [] bandwidth = None # === Step 1: 提取带宽 (20/40/80/160/320) === bw_match = re.search(r'(20|40|80|160|320)\s*(?:MHZ|M)?\b', val_upper) if bw_match: bandwidth = bw_match.group(1) else: # 尾部数字 fallback: HT40 → 40 tail_num = re.search(r'(20|40|80|160|320)(?![0-9])', val_upper) if tail_num: bandwidth = tail_num.group(1) if not bandwidth: # 再次 fallback:检查是否有任何带宽数字出现 all_bws = [b for b in ['20', '40', '80', '160', '320'] if b in val_upper] bandwidth = max(all_bws, key=int) if all_bws else '20' # === Step 2: 拆分表达式 === # 支持多种分隔符:/, \, |, 空格等 tokens = re.split(r'[/\\|+\s]+', val_upper) tokens = [t.strip() for t in tokens if t.strip()] # 如果原始字符串本身就是一个整体词(如 "11AC"),也作为一个 token if not tokens: tokens = [val_upper] # === Step 3: 定义模式关键词映射(支持前缀匹配)=== mode_indicators = [ ('VHT', '11AC'), ('11AC', '11AC'), ('HE', '11AX'), # 注意:不能写 '^HE$',否则 HE20 不会被识别 ('11AX', '11AX'), ('BE', '11BE'), # BE320、BE20 都能命中 ('11BE', '11BE'), ('EHT', '11BE'), ('HT', '11N'), ('11N', '11N'), ('11A', '11A'), ('11G', '11G'), ('DSSS', '11B'), ('CCK', '11B'), ('11B', '11B'), ] # === Step 4: 遍历所有 token,进行部分包含或前缀匹配 === for token in tokens: matched = False for pattern, canonical in mode_indicators: # 使用 re.search 而非 fullmatch,支持子串匹配 if re.search(pattern, token): if canonical not in found_modes: found_modes.append(canonical) matched = True break # 找到就跳出,避免重复添加 # === Step 5: 特殊情况补全:比如 "11AC/AX" → 应该两个都识别 === full_str_clean = re.sub(r'[^A-Z0-9]', '', val_upper) # 去除非字母数字 if '11AC' in full_str_clean and 'AX' in val_upper and '11AX' not in found_modes: found_modes.append('11AX') if '11AX' in full_str_clean and 'BE' in val_upper and '11BE' not in found_modes: found_modes.append('11BE') # 排序统一输出顺序(可选) order = {'11B': 0, '11G': 1, '11A': 2, '11N': 3, '11AC': 4, '11AX': 5, '11BE': 6} found_modes.sort(key=lambda x: order.get(x, 99)) # === Step 6: 最终校验 === if not found_modes: self.logger.info(f" 无法识别物理模式: '{cell_value}'") return None self.logger.info(f" ✓ 解析成功 → 模式={found_modes}, 带宽={bandwidth}MHz") return { "phy_mode_list": found_modes, "bw": bandwidth } def format_phy_mode(self, mode: str) -> str: """ 自定义物理层模式输出格式: - 11B/G/N 输出为小写:11b / 11g / 11n - 其他保持原样(如 11AC, 11BE) """ return { '11A': '11a', '11B': '11b', '11G': '11g', '11N': '11n' }.get(mode, mode) def col_to_letter(self, col): col += 1 result = "" while col > 0: col -= 1 result = chr(col % 26 + ord('A')) + result col //= 26 return result def is_valid_power(self, value): try: float(value) return True except (ValueError, TypeError): return False def get_cell_value(self, ws_obj, row_idx, col_idx): fmt = ws_obj["format"] if fmt == "xls": return str(ws_obj["sheet"].cell_value(row_idx, col_idx)).strip() else: cell = ws_obj["sheet"].cell(row=row_idx + 1, column=col_idx + 1) val = cell.value return str(val).strip() if val is not None else "" def find_table_header_row(self, ws_obj): """查找包含 'Mode' 和 'Rate' 的表头行""" fmt = ws_obj["format"] ws = ws_obj["sheet"] for r in range(15): mode_col = rate_col = None if fmt == "xlsx": if r + 1 > ws.max_row: continue for c in range(1, ws.max_column + 1): cell = ws.cell(row=r + 1, column=c) if not cell.value: continue val = str(cell.value).strip() if val == "Mode": mode_col = c elif val == "Rate": rate_col = c if mode_col and rate_col and abs(mode_col - rate_col) == 1: self.logger.info(f" 找到表头行: 第 {r+1} 行") return r, mode_col - 1, rate_col - 1 # 转为 0-based else: if r >= ws.nrows: continue for c in range(ws.ncols): val = ws.cell_value(r, c) if not val: continue val = str(val).strip() if val == "Mode": mode_col = c elif val == "Rate": rate_col = c if mode_col and rate_col and abs(mode_col - rate_col) == 1: self.logger.info(f" 找到表头行: 第 {r+1} 行") return r, mode_col, rate_col return None, None, None def find_auth_power_above_row(self, ws_obj, start_row): """查找 '认证功率' 所在的合并单元格及其列范围""" fmt = ws_obj["format"] ws = ws_obj["sheet"] self.logger.info(f" 开始向上查找 '认证功率',扫描第 0 ~ {start_row} 行...") if fmt == "xlsx": # xlsx 格式:使用 openpyxl 处理合并单元格 for mr in ws.merged_cells.ranges: top_left = ws.cell(row=mr.min_row, column=mr.min_col) val = str(top_left.value) if top_left.value else "" if "证功率" in val or "Cert" in val: r_idx = mr.min_row - 1 # 转为0基索引 if r_idx <= start_row: start_col = mr.min_col - 1 end_col = mr.max_col - 1 self.logger.info(f" 发现合并单元格含 '证功率': '{val}' → {self.col_to_letter(start_col)}{mr.min_row}") return start_col, end_col, r_idx # fallback:搜索普通单元格 for r in range(start_row + 1): for c in range(1, ws.max_column + 1): cell = ws.cell(row=r + 1, column=c) if cell.value and ("证功率" in str(cell.value)): self.logger.info(f" 普通单元格发现 '证功率': '{cell.value}' @ R{r + 1}C{c}") return c - 1, c - 1, r else: # fmt == "xls",使用 xlrd 读取 .xls 文件 from xlrd import Book, Cell # 第一步:检查合并单元格 if hasattr(ws, 'merged_cells'): for (rlo, rhi, clo, chi) in ws.merged_cells: # 合并区域左上角单元格为主值 cell = ws.cell(rlo, clo) val = cell.value if val and ("证功率" in str(val) or "Cert" in str(val)): if rlo <= start_row: self.logger.info(f" 发现 .xls 合并单元格含 '证功率': '{val}' → R{rlo + 1}C{clo + 1}") return clo, chi - 1, rlo # clo ~ chi-1 是实际列索引范围(chi 是开区间) # 第二步:fallback 到普通单元格搜索 for r in range(min(ws.nrows, start_row + 1)): for c in range(ws.ncols): val = ws.cell_value(r, c) if val and ("证功率" in str(val) or "Cert" in str(val)): self.logger.info(f" 发现 .xls 普通单元格 '证功率': '{val}' @ R{r + 1}C{c + 1}") return c, c, r return None, None, None def parse_ch_columns_under_auth(self, ws_obj, ch_row_idx, auth_start_col, auth_end_col, verbose=True): """ 解析指定行中 [auth_start_col, auth_end_col] 范围内的列,提取 CHx 信道编号 返回: {col_index: ch_number} 映射(便于主循环按列查找) """ fmt = ws_obj["format"] ws = ws_obj["sheet"] ch_map = {} # key: col_index, value: ch_number if verbose: self.logger.info(f"🔍 解析 CH 标题行(第 {ch_row_idx + 1} 行),列范围: Col {auth_start_col} ~ {auth_end_col}") for c in range(auth_start_col, auth_end_col + 1): val = self.get_cell_value(ws_obj, ch_row_idx, c) if not val: continue # 加强版正则:必须是 CH 开头,可接分隔符,然后是数字 match = re.search(r"^CH[\s_\-]?(\d+)$", str(val).strip(), re.I) if not match: continue try: ch_num = int(match.group(1)) ch_map[c] = ch_num # ✅ 列索引 → 信道号 if verbose: self.logger.info(f" ✅ 发现 CH{ch_num} @ 第 {c + 1} 列 (Col={c})") except ValueError: continue if not ch_map: if verbose: self.logger.info("❌ 在指定区域内未找到任何有效的 CHx 列") else: chs = sorted(ch_map.values()) if verbose: self.logger.info(f"🎉 成功提取 CH{min(chs)}-{max(chs)},共 {len(chs)} 个信道") return ch_map def encode_power(self, dbm): return int(round((float(dbm) + 1.5) * 4)) def merge_consecutive_channels(self, ch_list: list, band: str = "2g", bw: str = "20") -> list: """ 合并连续信道: - 2G: 数值连续(可跨带宽) - 5G/6G: 在对应带宽的合法信道集中找相邻项进行合并 """ if not ch_list: return [] sorted_ch = sorted(set(ch_list)) # === 2G: 简单数值连续(适用于 20M/40M)=== if band.lower() == "2g": ranges = [] start = end = sorted_ch[0] for ch in sorted_ch[1:]: if ch == end + 1: end = ch else: ranges.append((start, end)) start = end = ch ranges.append((start, end)) return ranges # === 5G / 6G: 基于合法信道集的索引连续性 === elif band.lower() in ["5g", "6g"]: try: valid_chs = self.get_valid_channels_for_bandwidth(band, bw) except ValueError as e: print(f"⚠️ {e} 使用默认数值连续合并") return self.merge_consecutive_channels(ch_list, band="2g") if not valid_chs: print(f"⚠️ 未找到 {band.upper()} @{bw}MHz 的合法信道列表,使用原始数值合并") return self.merge_consecutive_channels(ch_list, band="2g") # 构建信道到索引的映射 ch_to_index = {ch: idx for idx, ch in enumerate(valid_chs)} ranges = [] current_start = current_end = None for ch in sorted_ch: if ch not in ch_to_index: print(f"⚠️ 警告:信道 {ch} 不属于 {band.upper()} @{bw}MHz 合法集合,将单独成段") if current_start is not None: ranges.append((current_start, current_end)) current_start = current_end = None ranges.append((ch, ch)) continue idx = ch_to_index[ch] if current_start is None: current_start = current_end = ch else: prev_idx = ch_to_index[current_end] if idx == prev_idx + 1: # 在合法列表中是“下一个” current_end = ch else: ranges.append((current_start, current_end)) current_start = current_end = ch if current_start is not None: ranges.append((current_start, current_end)) return ranges else: raise ValueError(f"不支持的频段类型: {band}") def find_all_ch_rows_after_auth(self, ws_obj, auth_row, auth_start_col, auth_end_col, max_scan_ahead=70): """ 从 auth_row + 1 开始向下扫描,在限定范围内查找所有包含 CHx 格式的行 返回所有匹配的 0-based 行索引列表 """ fmt = ws_obj["format"] ws = ws_obj["sheet"] nrows = ws.max_row if fmt == "xlsx" else ws.nrows start_scan = auth_row + 1 end_scan = min(start_scan + max_scan_ahead, nrows) ch_rows = [] found_positions = [] # 用于日志 self.logger.info(f"🔍 开始扫描 CH 标题行:第 {start_scan + 1} ~ {end_scan} 行,列 {auth_start_col} ~ {auth_end_col}") for r in range(start_scan, end_scan): for c in range(auth_start_col, auth_end_col + 1): val = self.get_cell_value(ws_obj, r, c) if not val: continue # 更严格的正则:以 CH 开头,可接分隔符,后跟数字,且前后无字母 # 避免匹配 CHECK / PCH / BATCH 等 if re.search(r"^CH[\s_\-]?(\d+)$", str(val).strip(), re.I): ch_rows.append(r) found_positions.append((r + 1, c + 1, val)) # 1-based for log break # 每行只记录一次 if ch_rows: self.logger.info(f"✅ 发现 {len(ch_rows)} 个 CH 标题行:") for row_1b, col_1b, val in found_positions: self.logger.info(f" → 第 {row_1b} 行, 第 {col_1b} 列: '{val}'") else: self.logger.info("❌ 未找到任何有效的 CH 标题行") return ch_rows def collect_tx_limit_data(self, ws_obj, sheet_config, header_row_idx, auth_row, auth_start, auth_end, mode_col, rate_col): """ 收集发射功率限制数据,支持多个 CH 区域(multi-channel blocks) """ self.logger.info(" 正在执行 collect_tx_limit_data()...") self.logger.info(" ...") # 获取所有 CH 标题行(支持多块) ch_row_indices = self.find_all_ch_rows_after_auth(ws_obj, auth_row, auth_start, auth_end) if not ch_row_indices: self.logger.info("❌ 错误:无法定位任何 CH 标题行,跳过该表") return [] nrows = ws_obj["sheet"].nrows if ws_obj["format"] == "xls" else ws_obj["sheet"].max_row fmt = ws_obj["format"] ws = ws_obj["sheet"] entries = [] row_mode_info = {} # {row_index: parsed_mode_info} # ======== 第一步:构建 row_mode_info(保持不变)======== if fmt == "xlsx": merged_cells_map = {} for mr in ws.merged_cells.ranges: for r in range(mr.min_row - 1, mr.max_row): for c in range(mr.min_col - 1, mr.max_col): merged_cells_map[(r, c)] = mr for row_idx in range(header_row_idx + 1, nrows): cell_value = None is_merged = (row_idx, mode_col) in merged_cells_map if is_merged: mr = merged_cells_map[(row_idx, mode_col)] top_cell = ws.cell(row=mr.min_row, column=mr.min_col) cell_value = top_cell.value else: raw_cell = ws.cell(row=row_idx + 1, column=mode_col + 1) cell_value = raw_cell.value mode_info = self.parse_mode_cell(cell_value) if mode_info: if is_merged: mr = merged_cells_map[(row_idx, mode_col)] for r in range(mr.min_row - 1, mr.max_row): if header_row_idx < r < nrows: row_mode_info[r] = mode_info.copy() else: row_mode_info[row_idx] = mode_info.copy() else: # XLS for row_idx in range(header_row_idx + 1, ws.nrows): cell_value = self.get_cell_value(ws_obj, row_idx, mode_col) mode_info = self.parse_mode_cell(cell_value) if mode_info: row_mode_info[row_idx] = mode_info.copy() # ======== 第二步:按 CH 块分段处理数据 ======== ch_row_indices.sort() ch_row_indices.append(nrows) # 添加哨兵位便于切片 for i in range(len(ch_row_indices) - 1): ch_row_idx = ch_row_indices[i] next_ch_row_idx = ch_row_indices[i + 1] # 解析当前 CH 块的列映射:col_idx -> ch_num ch_map = self.parse_ch_columns_under_auth(ws_obj, ch_row_idx, auth_start, auth_end) if not ch_map: continue self.logger.info(f"🔧 处理 CH 块:第 {ch_row_idx + 1} 行 → 下一块在 {next_ch_row_idx + 1}") # 定义该块的数据范围 block_start = max(header_row_idx + 1, ch_row_idx + 1) block_end = min(next_ch_row_idx, nrows) # 遍历该块内的每一行 for row_idx in range(block_start, block_end): mode_info = row_mode_info.get(row_idx) if not mode_info: continue bw_clean = mode_info["bw"] has_valid_power = any( self.is_valid_power(self.get_cell_value(ws_obj, row_idx, col)) for col in ch_map.keys() ) if not has_valid_power: continue for phy_mode in mode_info["phy_mode_list"]: formatted_mode = self.format_phy_mode(phy_mode) mode_key = f"{formatted_mode}_{bw_clean}M" if not self._ci_contains(sheet_config.get("modes", []), mode_key): continue raw_rate_set = self._ci_get(sheet_config["rate_set_map"], mode_key) if not raw_rate_set: continue rate_set_list = [raw_rate_set] if isinstance(raw_rate_set, str) else raw_rate_set for rate_set_macro in rate_set_list: ch_count = 0 for col_idx, ch in ch_map.items(): # 注意顺序:col -> ch power_val = self.get_cell_value(ws_obj, row_idx, col_idx) if not self.is_valid_power(power_val): continue try: power_dbm = float(power_val) except: continue encoded_power = self.encode_power(power_dbm) entries.append({ "ch": ch, "power_dbm": round(power_dbm, 2), "encoded_power": encoded_power, "rate_set_macro": rate_set_macro, "mode": phy_mode, "bw": bw_clean, "src_row": row_idx + 1, "band": sheet_config["band"], "ch_block_row": ch_row_idx + 1 }) ch_count += 1 if ch_count > 0: self.logger.info( f"📊 第 {row_idx + 1} 行 → {formatted_mode} {bw_clean}M, " f"{ch_count} 信道, 宏={rate_set_macro}, CH块行={ch_row_idx + 1}" ) return entries def get_valid_channels_for_bandwidth(self, band: str, bw: str) -> list: """ 根据频段和带宽获取合法中心信道列表 """ # 规范化输入 band_norm = band.lower().strip() bw_clean = str(bw).replace('MHz', '').strip() if band_norm in ['2.4g', '2.4ghz', 'bg', 'bgn']: band_key = '2g' elif band_norm in ['5g', '5ghz']: band_key = '5g' elif band_norm in ['6g', '6ghz', 'be']: band_key = '6g' else: raise ValueError(f"不支持的频段: {band}") segments = self.config.get("channel_segments", {}) valid_chs = segments.get(band_key, {}).get(bw_clean, []) return sorted(valid_chs) def compress_tx_limit_entries(self, raw_entries, sheet_config): """ 压缩TX限制条目。 Args: raw_entries (list): 原始条目列表。 sheet_config (dict): Excel表格配置字典。 Returns: list: 压缩后的条目列表。 """ self.logger.info(" 正在执行 compress_tx_limit_entries()...") self.logger.info(" ...") from collections import defaultdict modes_order = sheet_config["modes"] # 构建小写映射用于排序(key: "11n_20M") mode_lower_to_index = {mode.lower(): idx for idx, mode in enumerate(modes_order)} range_template = sheet_config["range_macro_template"] group_key = lambda e: (e["encoded_power"], e["rate_set_macro"]) groups = defaultdict(list) for e in raw_entries: groups[group_key(e)].append(e) compressed = [] for (encoded_power, rate_set_macro), entries_in_group in groups.items(): first = entries_in_group[0] power_dbm = first["power_dbm"] mode = first["mode"] # 如 '11N' bw = first["bw"] # 如 '20' 或 '40' ch_list = sorted(e["ch"] for e in entries_in_group) for start, end in self.merge_consecutive_channels( ch_list, band=sheet_config["band"], # 如 "5g" bw=bw # 如 "80" ): range_macro = range_template.format( band=sheet_config["band"], bw=bw, start=start, end=end ) # === 新增:查找或分配 CHANNEL_SET_ID === assigned_id = -1 # 表示:这不是 regulatory 范围,无需映射 # === 新增:记录到 generated_ranges === segment_ch_list = list(range(start, end + 1)) self._record_generated_range( range_macro=range_macro, band=sheet_config["band"], bw=bw, ch_start=start, ch_end=end, channels=segment_ch_list ) # 格式化物理层模式(如 '11N' -> '11n') formatted_mode = self.format_phy_mode(mode) # 构造 mode_key 用于查找排序优先级 mode_key = f"{formatted_mode}_{bw}M" mode_order_idx = mode_lower_to_index.get(mode_key.lower(), 999) # 生成注释 comment = f"/* {power_dbm:5.2f}dBm, CH{start}-{end}, {formatted_mode} @ {bw}MHz */" # 新增:生成该段落的实际信道列表 segment_ch_list = list(range(start, end + 1)) compressed.append({ "encoded_power": encoded_power, "range_macro": range_macro, "rate_set_macro": rate_set_macro, "comment": comment, "_mode_order": mode_order_idx, "bw": bw, # 带宽数字(字符串) "mode": formatted_mode, # 统一格式化的模式名 "ch_start": start, "ch_end": end, "power_dbm": round(power_dbm, 2), "ch_list": segment_ch_list, # 关键!用于 global_ch_min/max 统计 }) # 排序后删除临时字段 compressed.sort(key=lambda x: x["_mode_order"]) for item in compressed: del item["_mode_order"] return compressed def _record_generated_range(self, range_macro, band, bw, ch_start, ch_end, channels): """ 记录生成的 RANGE 宏信息,供后续输出 manifest 使用 """ self.generated_ranges.append({ "range_macro": range_macro, "band": band, "bandwidth": int(bw), "channels": sorted(channels), "start_channel": ch_start, "end_channel": int(ch_end), "source_sheet": getattr(self, 'current_sheet_name', 'unknown') }) def clean_sheet_name(self, name): cleaned = re.sub(r'[^\w\.\=\u4e00-\u9fa5]', '', str(name)) return cleaned def match_sheet_to_config(self, sheet_name): cleaned = self.clean_sheet_name(sheet_name) for cfg in self.config["sheets"]: for pat in cfg["pattern"]: if re.search(pat, cleaned, re.I): self.logger.info(f" '{sheet_name}' → 清洗后: '{cleaned}'") self.logger.info(f" 匹配成功!'{sheet_name}' → [{cfg['band']}] 配置") return cfg self.logger.info(f" '{sheet_name}' → 清洗后: '{cleaned}'") self.logger.info(f"未匹配到 '{sheet_name}' 的模式,跳过...") return None def convert_sheet_with_config(self, ws_obj, sheet_name, sheet_config): self.current_sheet_name = sheet_name header_row_idx, mode_col, rate_col = self.find_table_header_row(ws_obj) if header_row_idx is None: self.logger.info(f" 跳过 '{sheet_name}':未找到 'Mode' 和 'Rate'") return None auth_start, auth_end, auth_row = self.find_auth_power_above_row(ws_obj, header_row_idx) if auth_start is None: self.logger.info(f" 跳过 '{sheet_name}':未找到 '认证功率'") return None raw_entries = self.collect_tx_limit_data( ws_obj, sheet_config, header_row_idx, auth_row, auth_start, auth_end, mode_col, rate_col ) if not raw_entries: self.logger.info(f" 从 '{sheet_name}' 未收集到有效数据") return None compressed = self.compress_tx_limit_entries(raw_entries, sheet_config) # 🔍 统计 2.4G 信道范围(仅限 2G 表) band_key = sheet_config.get("band", "").lower() if band_key in ["2g", "2.4g"]: for entry in compressed: ch_list = entry.get("ch_list", []) if not ch_list: continue start_ch, end_ch = min(ch_list), max(ch_list) if self.global_ch_min is None or start_ch < self.global_ch_min: self.global_ch_min = start_ch if self.global_ch_max is None or end_ch > self.global_ch_max: self.global_ch_max = end_ch # ✅ 返回该 sheet 的完整信息 result = { "sheet_name": sheet_name, "band": band_key, "entries": compressed, "sheet_config": sheet_config } self.logger.info(f"✅ 成功处理 '{sheet_name}' → {len(compressed)} 条条目 (band={band_key.upper()})") return result def render_from_template(self, template_path, context, output_path): """ 根据模板生成文件。 Args: template_path (str): 模板文件路径。 context (dict): 渲染模板所需的上下文数据。 output_path (str): 输出文件的路径。 Returns: None Raises: FileNotFoundError: 如果指定的模板文件不存在。 IOError: 如果在读取或写入文件时发生错误。 """ self.logger.info(" 正在执行 render_from_template()...") self.logger.info(" ...") template_path = resource_path(template_path) with open(template_path, 'r', encoding='utf-8') as f: template = Template(f.read()) content = template.render(**context) os.makedirs(os.path.dirname(output_path), exist_ok=True) with open(output_path, 'w', encoding='utf-8') as f: f.write(content) self.logger.info(f" 已生成: {output_path}") def generate_per_sheet_output(self): """ 为每一个已解析的 sheet 生成对应的 C 文件 """ timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") output_dir = Path(self.output_dir) output_dir.mkdir(parents=True, exist_ok=True) # 模板映射表 TEMPLATE_MAP = { "2g": "tx_limit_table_2_4G.c.j2", "5g": "tx_limit_table_5G.c.j2", "6g": "tx_limit_table_6G.c.j2" } # locale 映射 locale_by_enum = {tgt["enum"]: tgt["assigned_locale"].replace('-', '_') for tgt in self.config.get("locale_targets", []) if "enum" in tgt and "assigned_locale" in tgt} locale_2g = locale_by_enum.get("locale_2g_idx", "DEFAULT_2G") locale_2g_ht = locale_by_enum.get("locale_2g_ht_idx", "DEFAULT_2G_HT") for result in self.per_sheet_results: sheet_name = result["sheet_name"] band = result["band"] # 如 "2g", "5g" entries = result["entries"] if not entries: continue # 分类 normal / ht normal_entries = [e for e in entries if "HT" not in e.get("rate_set_macro", "")] ht_entries = [e for e in entries if "HT" in e.get("rate_set_macro", "")] # 构建结构 normal_struct = self.build_normal_structure(normal_entries) ht_segments = self.build_ht_structure(ht_entries) # fallback range(仅 2G 使用真实值) if band == "2g" and self.global_ch_min and self.global_ch_max: fb_start, fb_end = self.global_ch_min, self.global_ch_max fallback_macro = f"RANGE_2G_20M_{fb_start}_{fb_end}" else: fb_start, fb_end = 1, 11 fallback_macro = "RANGE_2G_20M_1_11" # 分配 CHANNEL_SET_ID if fallback_macro not in self.channel_set_map: next_id = max(self.channel_set_map.values(), default=0) + 1 self.channel_set_map[fallback_macro] = next_id fallback_id = self.channel_set_map[fallback_macro] # 选模板 jinja_template_name = TEMPLATE_MAP.get(band, "tx_limit_table_generic.c.j2") template_path = f"templates/{jinja_template_name}" # 输出文件名 output_filename = f"tx_limit_table_{band}.c" output_path = output_dir / output_filename # 上下文 context = { "timestamp": timestamp, "locale_display_name": self.locale_display_name, "locale_name_2_4G": locale_2g, "locale_name_2_4G_HT": locale_2g_ht, "normal_table": normal_struct, "ht_segments": ht_segments, "fallback_encoded_eirp": 30, "fallback_range_macro": fallback_macro, "fallback_ch_start": fb_start, "fallback_ch_end": fb_end, "fallback_channel_set_id": fallback_id, "channel_set_comment": f"Auto-generated fallback for {band.upper()} band", } # 渲染并写入 try: content = self.render_from_template_string(template_path, context) output_path.write_text(content, encoding='utf-8') self.logger.info(f" 已生成 [{band.upper()}] 文件: {output_path}") except Exception as e: self.logger.error(f" 模板渲染失败 ({template_path}): {e}") raise def build_normal_structure(self,entries): grouped = defaultdict(list) for e in entries: bw = str(e["bw"]) grouped[bw].append(e) result = [] for bw in ["20", "40", "80", "160"]: if bw in grouped: sorted_entries = sorted(grouped[bw], key=lambda x: (x["ch_start"], x["encoded_power"])) result.append((bw, sorted_entries)) return result def build_ht_structure(self,entries): groups = defaultdict(list) for e in entries: bw = str(e["bw"]) if "EXT4" in e["rate_set_macro"]: level = "ext4" elif "EXT" in e["rate_set_macro"]: level = "ext" else: level = "base" groups[(level, bw)].append(e) order = [ ("base", "20"), ("base", "40"), ("ext", "20"), ("ext", "40"), ("ext4", "20"), ("ext4", "40") ] segments = [] active_segment_count = sum(1 for key in order if key in groups) for idx, (level, bw) in enumerate(order): key = (level, bw) if key not in groups: continue seg_entries = sorted(groups[key], key=lambda x: (x["ch_start"], x["encoded_power"])) count = len(seg_entries) header_flags = f"CLM_DATA_FLAG_WIDTH_{bw} | CLM_DATA_FLAG_MEAS_COND" if idx < active_segment_count - 1: header_flags += " | CLM_DATA_FLAG_MORE" if level != "base": header_flags += " | CLM_DATA_FLAG_FLAG2" segment = { "header_flags": header_flags, "count": count, "entries": seg_entries } if level == "ext": segment["flag2"] = "CLM_DATA_FLAG2_RATE_TYPE_EXT" elif level == "ext4": segment["flag2"] = "CLM_DATA_FLAG2_RATE_TYPE_EXT4" segments.append(segment) return segments def render_from_template_string(self, template_path, context): from jinja2 import Environment, FileSystemLoader import os # 解析模板目录 template_dir = os.path.dirname(resource_path(template_path)) loader = FileSystemLoader(template_dir) env = Environment(loader=loader) filename = os.path.basename(template_path) template = env.get_template(filename) return template.render(**context) def log_changes_to_file(self, changes, locale_id, total_entries): """将变更摘要写入日志文件""" log_dir = self.output_dir/ "log" log_dir.mkdir(exist_ok=True) # 使用时间戳生成唯一文件名 timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S") log_path = log_dir / f"parse_{locale_id}_{timestamp_str}.log" timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") with open(log_path, 'w', encoding='utf-8') as f: # 覆盖写入最新变更 f.write(f"========================================\n") f.write(f"CLM 变更日志\n") f.write(f"========================================\n") f.write(f"时间: {timestamp}\n") f.write(f"地区码: {locale_id}\n") f.write(f"总 TX 条目数: {total_entries}\n") f.write(f"\n") if not any(changes.values()): f.write(" 本次运行无任何变更,所有文件已是最新状态。\n") else: if changes['added_ranges']: f.write(f" 新增 RANGE ({len(changes['added_ranges'])}):\n") for r in sorted(changes['added_ranges']): f.write(f" → {r}\n") f.write(f"\n") if changes['removed_ranges']: f.write(f" 删除 RANGE ({len(changes['removed_ranges'])}):\n") for r in sorted(changes['removed_ranges']): f.write(f" → {r}\n") f.write(f"\n") if changes['modified_ranges']: f.write(f" 修改 RANGE ({len(changes['modified_ranges'])}):\n") for r in sorted(changes['modified_ranges']): f.write(f" → {r}\n") f.write(f"\n") other_adds = changes['other_additions'] other_dels = changes['other_deletions'] if other_adds or other_dels: f.write(f" 其他变更:\n") for line in other_adds[:10]: f.write(f" add: {line}\n") for line in other_dels[:10]: f.write(f" del: {line}\n") if len(other_adds) > 10 or len(other_dels) > 10: f.write(f" ... 还有 {len(other_adds) + len(other_dels) - 20} 处未显示\n") f.write(f"\n") f.write(f"输出目录: {log_dir}\n") f.write(f"备份文件: {Path(self.target_c_file).with_suffix('.c.bak')}\n") f.write(f"========================================\n") self.logger.info(f" 已保存变更日志 → {log_path}") def save_channel_set_map_to_config(self): """将当前 channel_set_map 写回 config.json 的 channel_set_map 字段""" try: # 清理:只保留 fallback 类型的 RANGE(可正则匹配) valid_keys = [ k for k in self.channel_set_map.keys() if re.match(r'RANGE_[\dA-Z]+_\d+M_\d+_\d+', k) # 如 RANGE_2G_20M_1_11 ] filtered_map = {k: v for k, v in self.channel_set_map.items() if k in valid_keys} # 更新主配置中的字段 self.config["channel_set_map"] = filtered_map # 使用过滤后的版本 with open(self.config_file_path, 'w', encoding='utf-8') as f: json.dump(self.config, f, indent=4, ensure_ascii=False) self.logger.info(f" 已成功将精简后的 channel_set_map 写回配置文件: {filtered_map}") except Exception as e: self.logger.info(f" 写入配置文件失败: {e}") raise # 主要内容执行入口函数 def convert(self, file_path): self.logger.info(" 正在执行 convert()...") self.logger.info(" ...") c_source = Path(self.target_c_file) if not c_source.exists(): raise FileNotFoundError(f"目标 C 文件不存在: {c_source}") ext = os.path.splitext(file_path)[-1].lower() if ext == ".xlsx": wb = load_workbook(file_path, data_only=True) sheets = [{"sheet": ws, "format": "xlsx"} for ws in wb.worksheets] elif ext == ".xls": wb = xlrd.open_workbook(file_path) sheets = [{"sheet": ws, "format": "xls"} for ws in wb.sheets()] else: raise ValueError("仅支持 .xls 或 .xlsx 文件") # 清空上一次的结果 self.per_sheet_results.clear() for i, ws_obj in enumerate(sheets): sheet_name = wb.sheet_names()[i] if ext == ".xls" else ws_obj["sheet"].title config = self.match_sheet_to_config(sheet_name) if config: result = self.convert_sheet_with_config(ws_obj, sheet_name, config) if result: self.per_sheet_results.append(result) else: self.logger.info(f"跳过未匹配的 sheet: {sheet_name}") self.generate_per_sheet_output() #对外接口 def parse_excel(self): """ 【UI 兼容】供 PyQt UI 调用的入口方法 将当前 self.input_file 中的数据解析并填充到 tx_limit_entries """ self.logger.info(f" 开始解析: {self.input_file}") if not os.path.exists(self.input_file): self.logger.info(f" 文件不存在: {self.input_file}") raise FileNotFoundError(...) else: self.logger.info(f" 文件已找到,大小: {os.path.getsize(self.input_file)} 字节") if not hasattr(self, 'input_file') or not self.input_file: raise ValueError("未设置 input_file 属性!") if not os.path.exists(self.input_file): raise FileNotFoundError(f"文件不存在: {self.input_file}") self.logger.info(f" 开始解析 Excel 文件: {self.input_file}") try: self.convert(self.input_file) # 调用已有逻辑 self.logger.info(f" Excel 解析完成,共生成 {len(self.tx_limit_entries)} 条 TX 限幅记录") except Exception as e: self.logger.info(f" 解析失败: {e}") raise if __name__ == "__main__": import os logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(name)s: %(message)s', handlers=[ logging.FileHandler(LOG_FILE, encoding='utf-8'), logging.StreamHandler(sys.stdout) ], force=True ) logger = logging.getLogger(__name__) # 切换到脚本所在目录(可选,根据实际需求) script_dir = os.path.dirname(__file__) os.chdir(script_dir) # 直接使用默认参数(或从其他地方获取) config_path = "config/config.json" locale_id = None # 或指定默认值,如 "DEFAULT" display_name = None # 或指定默认值 input_file = "input/Archer BE900US 2.xlsx" # 创建转换器实例并执行 converter = ExcelToCLMConverter( config_path=config_path, locale_display_name=display_name ) converter.convert(input_file) 我希望再我已有的代码基础上完成
最新发布
10-29
Demo: Exploring Data with SAS Procedures In this demonstration, we use the PRINT, MEANS, UNIVARIATE, and FREQ procedures to explore and validate our data. Reminder: If you restarted your SAS session,you must recreate the PG1 library so you can access your practice files. In SAS Studio, open and submit the libname.sas program in the EPG1V2 folder. In Enterprise Guide, run the Autoexec process flow. Note: If you did not create the libname.sas program, complete the Activity: Create a Library for This Course (REQUIRED) in Lesson 2. Open p103d01.sas from the demos folder and locate the Demo section of the program. Complete the PROC PRINT statement to list the data in pg1.storm_summary. Use the DATA= option to specify the table name. Use the OBS= option to print only the first 10 observations. Highlight the step and run the selected code. Open code or syntax in a separate window. proc print data=pg1.storm_summary (obs=10); run; Review the output. The first 10 rows are displayed and all columns are included. Let's select the columns to include, and add a comment to document the program. Add a VAR statement to include only the following columns: Season, Name, Basin, MaxWindMPH, MinPressure, StartDate, and EndDate. Enterprise Guide: To easily add column names, use the autocomplete prompts to view and select columns. You can either double-click on a column to add it in the program, or start to type the column name and press the spacebar when the correct column is highlighted. SAS Studio: To easily add column names, place your cursor after the keyword VAR. Use the Library section of the navigation pane to find the pg1 library. Expand the storm_summary table to see a list of column names. Hold down the Ctrl key and select the columns in the order in which you want them to appear in the statement. Drag the selected columns to the VAR statement. Add the comment, list first 10 rows, before the PROC PRINT statement. Highlight the step and run the selected code. Open code or syntax in a separate window. /*list first 10 rows*/ proc print data=pg1.storm_summary(obs=10); var Season Name Basin MaxWindMPH MinPressure StartDate EndDate; run; Review the output. The output contains10 rows, but we've limited the columns and changed their order in the report. Remember we're using these procedures to validate our data. As we look at the values we notice that there are missing values for Name, and that Basin includes both lowercase and uppercase NA. There are also some missing values for MaxWindMPH and MinPressure. And that's just in the first 10 rows. Let's use other procedures to validate our data. Next we'll use PROC MEANS to compute summary statistics. Copy the PROC PRINT step, paste it at the end of the program and change PRINT to MEANS Remove the OBS= data set option to analyze all observations. Modify the VAR statement to include only MaxWindMPH and MinPressure. The columns on the VAR statement must be numeric. Add calculate summary statistics as a comment before the PROC MEANS statement. Highlight the step and run the selected code. Open code or syntax in a separate window. /*calculate summary statistics*/ proc means data=pg1.storm_summary; var MaxWindMPH MinPressure; run; Review the output. The report includes basic summary statistics for these two numeric columns. The frequency count, N, has different values for each column. This indicates that there are quite a few missing values for MinPressure compared to MaxWindMPH. You might want to look at the Minimum and Maximum to see if those ranges appear valid. Six is a pretty small value for MaxWindMPH. We might want to investigate further to see if this is a valid value. And negative 9,999 is definitely not valid for MinPressure. Next we'll use PROC UNIVARIATE to compute summary statistics including the 5 extreme low and high values. Copy the PROC MEANS step, paste it at the end of the program, and change MEANS to UNIVARIATE. Add examine extreme values as a comment before the PROC UNIVARIATE statement. Highlight the step and run the selected code. Open code or syntax in a separate window. /*examine extreme values*/ proc univariate data=pg1.storm_summary; var MaxWindMPH MinPressure; run; Review the output. Scroll down to the Extreme Observations table. It includes the observation number and the value for the low and high MaxWindMPH values. We might be interested later on in learning which storm had a maximum wind speed of 213 miles per hour. Scroll down to the Extreme Observations report for MinPressure. There are only two rows where the value for MinPressure was negative 9,999. That gives us a little more insight as to what we're dealing with in our data. Let's use PROC FREQ to generate one-way frequency reports. Copy the PROC UNIVARIATE step and paste it at the end of the program. Copy the PROC UNIVARIATE step, paste it at the end of the program, and change UNIVARIATE to FREQ. Change the VAR statement to a TABLES statement to produce frequency tables for Basin, Type, and Season. Add list unique values and frequencies as a comment before the PROC FREQ statement. Highlight the step and run the selected code. Open code or syntax in a separate window. /*list unique values and frequencies*/ proc freq data=pg1.storm_summary; tables Basin Type Season; run; Review the output. Recall that our PROC PRINT output showed that there were some lower case values for Basin. The Frequency table for Basin indicates that 16 rows have a lowercase na value. We'll need to correct that later on. The Type frequency table shows no case inconsistencies or incorrect values. The frequency table for Season shows how many storms there were per season. We can identify those seasons that had the most storms, such as 2005 with 102. These procedures give us insight to our data. They help us to learn about the data values and what we may need to fix as we prepare our data. They also help us think about questions we may want to answer as we analyze our data further.
09-06
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值