E - Frequent values

这是一个关于处理序列查询的问题,输入包含多个测试用例,每个用例包含一个非递减顺序的整数序列和一系列查询。对于每个查询,需要找出在给定范围内的整数中最常出现的值。输出为每个查询中指定区间内最频繁出现的数值的出现次数。

Time Limit: 2000MS Memory Limit: 65536KB 64bit IO Format: %I64d & %I64u

[Submit]   [Go Back]   [Status]

Description

You are given a sequence of n integers a1 , a2 , ... , an in non-decreasing order. In addition to that, you are given several queries consisting of indices i and j (1 ≤ i ≤ j ≤ n). For each query, determine the most frequent value among the integers ai , ... , aj.

Input

The input consists of several test cases. Each test case starts with a line containing two integers n and q (1 ≤ n, q ≤ 100000). The next line contains n integers a1 , ... , an (-100000 ≤ ai ≤ 100000, for each i ∈ {1, ..., n}) separated by spaces. You can assume that for each i ∈ {1, ..., n-1}: ai ≤ ai+1. The following q lines contain one query each, consisting of two integers i and j (1 ≤ i ≤ j ≤ n), which indicate the boundary indices for the 
query.

The last test case is followed by a line containing a single 0.

Output

For each query, print one line with one integer: The number of occurrences of the most frequent value within the given range.

Sample Input

10 3
-1 -1 1 1 1 1 3 10 10 10
2 3
1 10
5 10
0

Sample Output

1
4
3
# 加载必要包 if (!require("arules")) install.packages("arules") library(arules) # 读取数据(需将文件路径替换为实际路径) data <- read.csv("E:/副本expanded.csv", header = FALSE, stringsAsFactors = FALSE) # 定义Mushroom数据集的特征名(按UCI标准特征顺序) feature_names <- c("class", "cap-shape", "cap-surface", "cap-color", "bruises", "odor", "gill-attachment", "gill-spacing", "gill-size", "gill-color", "stalk-shape", "stalk-root", "stalk-surface-above-ring", "stalk-surface-below-ring", "stalk-color-above-ring", "stalk-color-below-ring", "veil-type", "veil-color", "ring-number", "ring-type", "spore-print-color", "population", "habitat") # 数据预处理:将每行转换为"特征名=值"格式 preprocess_transaction <- function(row) { non_empty <- row[!is.na(row) & row != ""] # 去除空值 paste(feature_names[1:length(non_empty)], non_empty, sep = "=") } # 生成事务列表 transactions_list <- apply(data, 1, preprocess_transaction) # 拆分字符向量为列表 transactions_list_split <- strsplit(transactions_list, " ") # 使用实际的分隔符替换空格 # 转换为transactions transactions_obj <- as(transactions_list_split, "transactions") # 保存预处理后的数据(可选) write(transactions_obj, file = "preprocessed_mushroom.csv", format = "basket", sep = ",") # 1. 计算项集支持度 calculate_support <- function(itemset, transactions) { count <- sum(sapply(transactions, function(trans) all(itemset %in% trans))) count / length(transactions) } # 2. 生成1-项候选集 generate_c1 <- function(transactions) { all_items <- unique(unlist(transactions)) lapply(all_items, function(item) c(item)) } # 3. 剪枝生成1-项频繁集 prune_c1 <- function(c1, transactions, min_sup) { support_df <- data.frame( itemset = I(lapply(c1, function(x) x)), support = sapply(c1, calculate_support, transactions) ) support_df[support_df$support >= min_sup, ] } # 4. 候选集连接(生成k-项候选集) join_itemsets <- function(frequent_prev) { k <- length(frequent_prev$itemset[[1]]) + 1 itemsets <- frequent_prev$itemset result <- list() for (i in 1:(nrow(frequent_prev) - 1)) { for (j in (i + 1):nrow(frequent_prev)) { item_i <- itemsets[[i]] item_j <- itemsets[[j]] # 前k-2项相同则连接 if (identical(item_i[1:(k - 2)], item_j[1:(k - 2)])) { new_itemset <- sort(unique(c(item_i, item_j))) if (length(new_itemset) == k) { result <- c(result, list(new_itemset)) } } } } unique(result) } # 5. 剪枝k-项候选集(子集均为频繁集) prune_ck <- function(ck, frequent_prev, min_sup, transactions) { # 筛选子集均为频繁集的候选集 valid_ck <- Filter(function(itemset) { subsets <- combn(itemset, length(itemset) - 1, simplify = FALSE) all(sapply(subsets, function(sub) any(sapply(frequent_prev$itemset, identical, sub)))) }, ck) # 计算支持度并筛选 if (length(valid_ck) == 0) return(data.frame(itemset = I(list()), support = numeric())) support_df <- data.frame( itemset = I(valid_ck), support = sapply(valid_ck, calculate_support, transactions) ) support_df[support_df$support >= min_sup, ] } # 6. 完整频繁项集挖掘 apriori_frequent <- function(transactions, min_sup) { # 生成1-项频繁集 c1 <- generate_c1(transactions) L1 <- prune_c1(c1, transactions, min_sup) all_frequent <- L1 k <- 2 while (nrow(all_frequent[lengths(all_frequent$itemset) == (k - 1), ]) > 0) { # 生成k-项候选集 frequent_prev <- all_frequent[lengths(all_frequent$itemset) == (k - 1), ] ck <- join_itemsets(frequent_prev) if (length(ck) == 0) break # 剪枝得到k-项频繁集 Lk <- prune_ck(ck, frequent_prev, min_sup, transactions) if (nrow(Lk) == 0) break # 合并所有频繁集 all_frequent <- rbind(all_frequent, Lk) k <- k + 1 } all_frequent } # 7. 生成关联规则(从频繁项集) generate_rules <- function(frequent_itemsets, transactions, min_conf) { rules <- data.frame( lhs = I(list()), # 规则左部 rhs = I(list()), # 规则右部 support = numeric(), confidence = numeric() ) # 仅处理长度>=2的频繁项集 frequent_large <- frequent_itemsets[lengths(frequent_itemsets$itemset) >= 2, ] for (i in 1:nrow(frequent_large)) { itemset <- frequent_large$itemset[[i]] itemset_support <- frequent_large$support[i] # 生成所有非空真子集作为左部 for (lhs_len in 1:(length(itemset) - 1)) { lhs_list <- combn(itemset, lhs_len, simplify = FALSE) for (lhs in lhs_list) { rhs <- setdiff(itemset, lhs) # 计算左部支持度 lhs_support <- calculate_support(lhs, transactions) if (lhs_support == 0) next # 计算置信度 confidence <- itemset_support / lhs_support # 筛选强规则 if (confidence >= min_conf) { rules <- rbind(rules, data.frame( lhs = I(list(lhs)), rhs = I(list(rhs)), support = itemset_support, confidence = confidence )) } } } } # 去重并按置信度排序 rules <- unique(rules) rules[order(-rules$confidence), ] } # 1. 支持度敏感性分析(记录频繁项集数量和运行时间) support_analysis <- function(transactions, min_conf, support_values) { result <- data.frame( min_sup = numeric(), frequent_count = numeric(), runtime = numeric() ) for (min_sup in support_values) { start_time <- Sys.time() # 挖掘频繁项集 frequent <- apriori_frequent(transactions, min_sup) end_time <- Sys.time() runtime <- as.numeric(difftime(end_time, start_time, units = "secs")) result <- rbind(result, data.frame( min_sup = min_sup, frequent_count = nrow(frequent), runtime = runtime )) cat(sprintf("min_sup=%.2f: 频繁项集数=%d, 运行时间=%.2f秒\n", min_sup, nrow(frequent), runtime)) } # 保存结果 write.csv(result, "support_analysis.csv", row.names = FALSE) return(result) } # 2. 置信度敏感性分析(记录强规则数量) confidence_analysis <- function(transactions, min_sup, confidence_values) { result <- data.frame( min_conf = numeric(), rule_count = numeric() ) # 先挖掘频繁项集(固定支持度) frequent <- apriori_frequent(transactions, min_sup) for (min_conf in confidence_values) { # 生成强规则 rules <- generate_rules(frequent, transactions, min_conf) result <- rbind(result, data.frame( min_conf = min_conf, rule_count = nrow(rules) )) cat(sprintf("min_conf=%.2f: 强规则数=%d\n", min_conf, nrow(rules))) } # 保存结果 write.csv(result, "confidence_analysis.csv", row.names = FALSE) return(result) } # 运行分析(示例参数,可根据需求调整) support_values <- c(0.1, 0.2, 0.3, 0.4, 0.5) # 支持度阈值范围 confidence_values <- c(0.7, 0.8, 0.9, 0.95) # 置信度阈值范围 min_conf_fixed <- 0.8 # 支持度分析的固定置信度 min_sup_fixed <- 0.2 # 置信度分析的固定支持度 # 执行分析 support_result <- support_analysis(transactions_list, min_conf_fixed, support_values) confidence_result <- confidence_analysis(transactions_list, min_sup_fixed, confidence_values) # 可视化分析结果(可选) if (!require("ggplot2")) install.packages("ggplot2") library(ggplot2) # 支持度-频繁项集数关系 ggplot(support_result, aes(x = min_sup, y = frequent_count)) + geom_line(color = "blue", linewidth = 1.2) + labs(x = "最小支持度", y = "频繁项集数量", title = "支持度对频繁项集数量的影响") + theme_minimal() # 支持度-运行时间关系 ggplot(support_result, aes(x = min_sup, y = runtime)) + geom_line(color = "red", linewidth = 1.2) + labs(x = "最小支持度", y = "运行时间(秒)", title = "支持度对运行时间的影响") + theme_minimal() # 置信度-强规则数关系 ggplot(confidence_result, aes(x = min_conf, y = rule_count)) + geom_line(color = "green", linewidth = 1.2) + labs(x = "最小置信度", y = "强规则数量", title = "置信度对强规则数量的影响") + theme_minimal() # 挖掘最终强规则(设置合理阈值,示例:min_sup=0.2, min_conf=0.8) final_frequent <- apriori_frequent(transactions_list, min_sup = 0.2) final_rules <- generate_rules(final_frequent, transactions_list, min_conf = 0.8) # 筛选与蘑菇可食用性相关的规则(包含class=EDIBLE或class=POISONOUS) poison_edible_rules <- final_rules[ sapply(final_rules$lhs, function(x) any(grepl("class=", x))) | sapply(final_rules$rhs, function(x) any(grepl("class=", x))), ] # 按“置信度降序+支持度降序”排序,取Top5规则 top5_rules <- poison_edible_rules[ order(-poison_edible_rules$confidence, -poison_edible_rules$support), ][1:5, ] # 格式化输出规则 print_top5_rules <- function(top5_rules) { for (i in 1:nrow(top5_rules)) { lhs_str <- paste(top5_rules$lhs[[i]], collapse = ", ") rhs_str <- paste(top5_rules$rhs[[i]], collapse = ", ") cat(sprintf("规则%d: %s => %s\n", i, lhs_str, rhs_str)) cat(sprintf(" 支持度: %.3f, 置信度: %.3f\n\n", top5_rules$support[i], top5_rules$confidence[i])) } } # 输出Top5规则 cat("Top5有意义的关联规则(与可食用性相关):\n\n") print_top5_rules(top5_rules) # 保存Top5规则 write.csv(top5_rules, "top5_mushroom_rules.csv", row.names = FALSE) 帮我在不改变数据内容的情况下调整这串代码使其在R中能运行出结果
10-22
import idaapi import idautils import idc import ida_funcs import ida_gdl import ida_kernwin import ida_xref from collections import Counter class XRefFunctionAnalyzer: def __init__(self): self.functions = [] self.selected_func = None def get_all_functions(self): """Get all functions in the database""" self.functions = [] for func_ea in idautils.Functions(): func_name = idc.get_func_name(func_ea) self.functions.append((func_ea, func_name)) return self.functions def get_xref_functions(self, target_ea): """Get all functions that call the target function (XRefs)""" xref_funcs = set() for xref in idautils.XrefsTo(target_ea): if xref.type == ida_xref.fl_CN or xref.type == ida_xref.fl_CF: caller_func = idaapi.get_func(xref.frm) if caller_func: xref_funcs.add(caller_func.start_ea) return list(xref_funcs) def count_basic_blocks(self, func_ea): """Count basic blocks in a function""" func = idaapi.get_func(func_ea) if not func: return 0 try: fc = idaapi.FlowChart(func) return sum(1 for _ in fc) except: return 0 def analyze_bytes_and_dwords(self, func_ea): """ Analyze the function to: - Count total number of bytes (function size) - Count number of DWORD (4-byte) operand accesses - Extract frequent byte values and dwords from instructions and operands """ func = idaapi.get_func(func_ea) if not func: return 0, 0, [], [] start_ea = func.start_ea end_ea = func.end_ea total_bytes = end_ea - start_ea dword_accesses = 0 byte_values = [] # All individual bytes encountered dwords = [] # All 32-bit immediate values or memory derefs for head in idautils.Heads(start_ea, end_ea): # Get instruction mnemonic mnem = idc.print_insn_mnem(head) # Iterate over operands for i in range(2): # Usually up to 2 operands op_type = idc.get_operand_type(head, i) op_value = idc.get_operand_value(head, i) if op_type == idc.o_imm: # Immediate value # Assume 32-bit context; treat as dword if fits if 0 <= op_value < 0x100000000: dwords.append(op_value) # Check if likely used as 32-bit access if "mov" in mnem or "lea" in mnem or "add" in mnem: dword_accesses += 1 elif op_type == idc.o_mem: # Direct memory ref dword_accesses += 1 dwords.append(op_value) elif op_type == idc.o_displ: # [reg + offset] dword_accesses += 1 if op_value != idaapi.BADADDR: dwords.append(op_value) # Read raw bytes from instruction insn_size = idc.get_item_size(head) for j in range(insn_size): b = idc.get_wide_byte(head + j) byte_values.append(b) # Find most common byte and dword common_bytes = Counter(byte_values).most_common(5) # Top 5 bytes common_dwords = Counter(dwords).most_common(5) # Top 5 dwords return total_bytes, dword_accesses, common_bytes, common_dwords def analyze_call_graph_from_functions(self, function_list): """Analyze call graphs starting from multiple functions and collect metrics""" all_analyzed_funcs = set() func_metrics = {} # {func_ea: (name, bb_count, byte_size, dword_count, common_bytes, common_dwords)} for start_ea in function_list: if start_ea in all_analyzed_funcs: continue queue = [start_ea] analyzed_funcs = set([start_ea]) while queue: current_ea = queue.pop(0) if current_ea in all_analyzed_funcs: continue func = idaapi.get_func(current_ea) if not func: continue func_name = idc.get_func_name(current_ea) bb_count = self.count_basic_blocks(current_ea) # Perform deep byte/dword analysis byte_size, dword_count, common_bytes, common_dwords = self.analyze_bytes_and_dwords(current_ea) func_metrics[current_ea] = ( func_name, bb_count, byte_size, dword_count, common_bytes, common_dwords ) # Discover called functions via call instructions try: for block in idaapi.FlowChart(func): for head in idautils.Heads(block.start_ea, block.end_ea): if idc.print_insn_mnem(head) == "call": call_ea = idc.get_operand_value(head, 0) if call_ea != idaapi.BADADDR: called_func = idaapi.get_func(call_ea) if called_func and called_func.start_ea not in analyzed_funcs: queue.append(called_func.start_ea) analyzed_funcs.add(called_func.start_ea) except Exception as e: continue all_analyzed_funcs.update(analyzed_funcs) return func_metrics def show_function_selection_dialog(self): """Show dialog to select function""" functions = self.get_all_functions() if not functions: ida_kernwin.warning("No functions found in the database!") return None class FunctionChooser(ida_kernwin.Choose): def __init__(self, title, items): ida_kernwin.Choose.__init__(self, title, [["Address", 10], ["Function Name", 40]]) self.items = items def OnGetSize(self): return len(self.items) def OnGetLine(self, n): return ["0x%X" % self.items[n][0], self.items[n][1]] chooser = FunctionChooser("Select Function to Analyze XRefs", functions) selected_idx = chooser.Show(True) if selected_idx >= 0: return functions[selected_idx][0] return None def analyze_and_display_results(self): """Main analysis function with enhanced byte/dword stats""" target_ea = self.show_function_selection_dialog() if not target_ea: return self.selected_func = target_ea target_func_name = idc.get_func_name(target_ea) ida_kernwin.show_wait_box(f"Analyzing XRefs to {target_func_name}...") try: xref_functions = self.get_xref_functions(target_ea) if not xref_functions: ida_kernwin.hide_wait_box() ida_kernwin.warning(f"No functions found calling {target_func_name}!") return ida_kernwin.replace_wait_box(f"Found {len(xref_functions)} callers. Analyzing call graphs...") func_metrics = self.analyze_call_graph_from_functions(xref_functions) sorted_by_bb = sorted(func_metrics.items(), key=lambda x: x[1][1], reverse=True) top_5_bb = sorted_by_bb[:5] sorted_by_dwords = sorted(func_metrics.items(), key=lambda x: x[1][3], reverse=True) top_5_dwords = sorted_by_dwords[:5] ida_kernwin.hide_wait_box() results = "Enhanced XRef Call Graph Analysis\n" results += "=" * 70 + "\n" results += f"Target Function: {target_func_name} (0x{target_ea:X})\n" results += f"Direct Callers: {len(xref_functions)}\n" results += f"Total Functions Analyzed: {len(func_metrics)}\n" results += "=" * 70 + "\n" # === Top 5 by Basic Blocks === results += "TOP 5 FUNCTIONS BY BASIC BLOCK COUNT:\n\n" for i, (ea, (name, bb, sz, dw, cb, cd)) in enumerate(top_5_bb, 1): results += f"{i}. {name} (0x{ea:X})\n" results += f" BB Count: {bb}, Size: {sz} bytes, DWORD Accesses: {dw}\n" if ea in xref_functions: results += f" ✓ Directly calls {target_func_name}\n" results += "-" * 40 + "\n" # === Top 5 by DWORD Usage === results += "\nTOP 5 FUNCTIONS BY DWORD OPERAND USAGE:\n\n" for i, (ea, (name, bb, sz, dw, cb, cd)) in enumerate(top_5_dwords, 1): results += f"{i}. {name} (0x{ea:X})\n" results += f" DWORD Accesses: {dw}, Size: {sz} bytes, BB Count: {bb}\n" results += f" Common DWords: " results += ", ".join([f"0x{val:X}({cnt})" for val, cnt in cd[:3]]) if cd else "None" results += "\n" if ea in xref_functions: results += f" ✓ Directly calls {target_func_name}\n" results += "-" * 40 + "\n" # === Most Frequent Bytes/DWords Across All Functions === all_bytes = [] all_dwords = [] for _, (_, _, _, _, cb, cd) in func_metrics.items(): all_bytes.extend([b for b, _ in cb]) all_dwords.extend([d for d, _ in cd]) global_common_bytes = Counter(all_bytes).most_common(5) global_common_dwords = Counter(all_dwords).most_common(5) results += f"\nMOST REPEATED BYTE VALUES ACROSS ALL ANALYZED FUNCTIONS:\n" for val, cnt in global_common_bytes: results += f" 0x{val:02X}: appeared {cnt} times\n" results += f"\nMOST REPEATED DWORD VALUES ACROSS ALL ANALYZED FUNCTIONS:\n" for val, cnt in global_common_dwords: results += f" 0x{val:08X}: appeared {cnt} times\n" # Display results ida_kernwin.msg("\n" + results + "\n") class ResultsDialog(ida_kernwin.Form): def __init__(self, text): ida_kernwin.Form.__init__(self, r"""STARTITEM 0 Enhanced XRef Analysis - Detailed Results {inf} """, { 'inf': ida_kernwin.Form.MultiLineTextControl(text, swidth=90, height=25), }) form = ResultsDialog(results) form.Compile() form.Execute() except Exception as e: ida_kernwin.hide_wait_box() ida_kernwin.warning(f"Analysis failed: {str(e)}") import traceback ida_kernwin.msg(traceback.format_exc()) def main(): analyzer = XRefFunctionAnalyzer() analyzer.analyze_and_display_results() if __name__ == "__main__": main() Enhanced XRef Call Graph Analysis ====================================================================== Target Function: .__memcpy_chk (0x51F990) Direct Callers: 115 Total Functions Analyzed: 115 ====================================================================== TOP 5 FUNCTIONS BY BASIC BLOCK COUNT: 1. sub_2EB1D4 (0x2EB1D4) BB Count: 209, Size: 6872 bytes, DWORD Accesses: 1129 ✓ Directly calls .__memcpy_chk ---------------------------------------- 2. sub_2FF5A0 (0x2FF5A0) BB Count: 204, Size: 3688 bytes, DWORD Accesses: 468 ✓ Directly calls .__memcpy_chk ---------------------------------------- 3. sub_2F7FF4 (0x2F7FF4) BB Count: 192, Size: 4420 bytes, DWORD Accesses: 608 ✓ Directly calls .__memcpy_chk ---------------------------------------- 4. sub_2FE7D0 (0x2FE7D0) BB Count: 148, Size: 3344 bytes, DWORD Accesses: 448 ✓ Directly calls .__memcpy_chk ---------------------------------------- 5. sub_4DA998 (0x4DA998) BB Count: 139, Size: 1916 bytes, DWORD Accesses: 119 ✓ Directly calls .__memcpy_chk ---------------------------------------- TOP 5 FUNCTIONS BY DWORD OPERAND USAGE: 1. sub_2EB1D4 (0x2EB1D4) DWORD Accesses: 1129, Size: 6872 bytes, BB Count: 209 Common DWords: 0xFFFFFFFFFFFFFFD8(183), 0xFFFFFFFFFFFFFFD0(111), 0x0(71) ✓ Directly calls .__memcpy_chk ---------------------------------------- 2. sub_2F7FF4 (0x2F7FF4) DWORD Accesses: 608, Size: 4420 bytes, BB Count: 192 Common DWords: 0xC8(85), 0xFFFFFFFFFFFFFF98(55), 0x1E0(30) ✓ Directly calls .__memcpy_chk ---------------------------------------- 3. sub_4690CC (0x4690CC) DWORD Accesses: 560, Size: 4444 bytes, BB Count: 106 Common DWords: 0xFFFFFFFFFFFFFFD0(103), 0x118(58), 0x60(48) ✓ Directly calls .__memcpy_chk ---------------------------------------- 4. sub_2FF5A0 (0x2FF5A0) DWORD Accesses: 468, Size: 3688 bytes, BB Count: 204 Common DWords: 0x80(73), 0x220(31), 0x228(27) ✓ Directly calls .__memcpy_chk ---------------------------------------- 5. sub_2FE7D0 (0x2FE7D0) DWORD Accesses: 448, Size: 3344 bytes, BB Count: 148 Common DWords: 0xA0(59), 0xB0(45), 0x0(36) ✓ Directly calls .__memcpy_chk ---------------------------------------- MOST REPEATED BYTE VALUES ACROSS ALL ANALYZED FUNCTIONS: 0x00: appeared 115 times 0xF9: appeared 86 times 0x40: appeared 84 times 0x01: appeared 74 times 0x03: appeared 61 times MOST REPEATED DWORD VALUES ACROSS ALL ANALYZED FUNCTIONS: 0x00000000: appeared 80 times 0x00000010: appeared 47 times above is the result output come fromida but remove the below output and also i want bytes also shows sub like dword MOST REPEATED BYTE VALUES ACROSS ALL ANALYZED FUNCTIONS: 0x00: appeared 115 times 0xF9: appeared 86 times 0x40: appeared 84 times 0x01: appeared 74 times 0x03: appeared 61 times MOST REPEATED DWORD VALUES ACROSS ALL ANALYZED FUNCTIONS: 0x00000000: appeared 80 times 0x00000010: appeared 47 times
最新发布
12-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值