网站用户行为分析项目之会话切割(六)=> 参数配置化

大家好,我是邵奈一,一个不务正业的程序猿、正儿八经的斜杠青年。
1、世人称我为:被代码耽误的诗人、没天赋的书法家、五音不全的歌手、专业跑龙套演员、不合格的运动员…
2、这几年,我整理了很多IT技术相关的教程给大家,主要是大数据教程,帮助了很多小伙伴入坑大数据行业。
3、如果您觉得文章有用,请收藏,转发,评论,并关注我,谢谢!
博客导航跳转(请收藏):邵奈一的技术博客导航
| 公众号 | 微信 | 微博 | 优快云 | 简书 |


0x00 教程内容

  1. 运行模式配置化
  2. 路径配置化
  3. 输出类型配置化

注意:以下代码均在 SessionCutETL 中修改。

0x01 运行模式配置化

目前我们是在本地运行的,如果是放到集群运行,则需要修改相应的代码,如果在本地测试,又要加上之前的代码,非常不方便。于是,我们可以给运行的模式加上参数适配。

(1)给设置运行模式加上判断条件:

conf.setMaster("local")
//修改为:
if(!conf.contains("spark.master")) {
   
   
  conf.setMaster("local")
}

0x02 路径配置化

目前我们的路径都是写死在代码里的,我们应该进行参数化处理,如果有传参数则使用传参的值,如果没有,则使用默认值。

(1)添加以下四行代码,并且需要修改代码里的路径:

// 通过配置拿到我们配置的输入和输出路径
val visitLogsInputPath = conf.get("spark.sessioncut.visitLogsInputPath", "data/rawdata/visit_log.txt")
val cookieLabelInputPath = conf.get("spark.sessioncut.cookieLabelInputPath", "data/cookie_label.txt")
val baseOutputPath = conf.get("spark.sessioncut.baseOutputPath", "data/output")

0x03 输出类型配置化

目前我们是直接把输出类型在代码中写死的,需要改进。
(1)添加一行代码,并修改代码里的字符串:

val outputFileType = if (args.nonEmpty) args(0) else "text"
// 修改输出代码为:
// text & parquet
OutputComponent.fromOutputFileType(outputFileType).writeOutputData(sc,baseOutputPath, parsedLogRDD, cookieLabeledSessionRDD)

完整代码如下:
(2)SessionCutETL 的完整代码如下:

package com.shaonaiyi.session

import com.shaonaiyi.spark.session.{
   
   
<!DOCTYPE html> <html> <head> <title>终极版 JS 混淆解密工具</title> <style> /* 保留原有样式并优布局 */ .container { max-width: 1400px; margin: 0 auto; padding: 20px; } .alert { padding: 10px 15px; margin: 10px 0; border-radius: 4px; display: none; } .alert.error { background: #fee; color: #d32f2f; } .alert.success { background: #efe; color: #2e7d32; } .alert.warning { background: #ffe; color: #f57c00; } .alert.show { display: block; } .config-panel { background: #f5f5f5; padding: 15px; border-radius: 4px; margin-bottom: 15px; } .config-group { display: flex; gap: 20px; flex-wrap: wrap; } .config-item { display: flex; align-items: center; gap: 5px; } .buttons { margin: 15px 0; display: flex; gap: 12px; } button { padding: 10px 20px; cursor: pointer; border: none; border-radius: 4px; background: #2196f3; color: white; transition: background 0.3s; } button:hover { background: #0b7dda; } button:disabled { background: #bbb; cursor: not-allowed; } .code-container { display: flex; gap: 20px; margin-top: 10px; } .code-box { flex: 1; display: flex; flex-direction: column; } textarea { width: 100%; height: 400px; padding: 12px; font-family: 'Consolas', monospace; border: 1px solid #ddd; border-radius: 4px; resize: vertical; } label { margin-bottom: 8px; font-weight: 500; } .progress { height: 4px; background: #eee; border-radius: 2px; margin: 10px 0; overflow: hidden; display: none; } .progress.show { display: block; } .progress-bar { height: 100%; background: #4caf50; width: 0%; transition: width 0.3s; } </style> </head> <body> <div class="container"> <div class="alert" id="alert"></div> <div class="config-panel"> <h4>混淆配置</h4> <div class="config-group"> <div class="config-item"> <input type="checkbox" id="mangleVars" checked> <label for="mangleVars">混淆变量名</label> </div> <div class="config-item"> <input type="checkbox" id="mangleFuncs" checked> <label for="mangleFuncs">混淆函数名</label> </div> <div class="config-item"> <input type="checkbox" id="mangleClasses" checked> <label for="mangleClasses">混淆类名</label> </div> <div class="config-item"> <input type="checkbox" id="flattenControl" checked> <label for="flattenControl">控制流扁平</label> </div> <div class="config-item"> <input type="checkbox" id="antiDebug" checked> <label for="antiDebug">防调试保护</label> </div> </div> </div> <div class="buttons"> <button onclick="handleObfuscate()" id="obfBtn">混淆代码</button> <button onclick="handleDeobfuscate()" id="deobfBtn">解密代码</button> <button onclick="copyResult()">复制结果</button> </div> <div class="progress"> <div class="progress-bar" id="progressBar"></div> </div> <div class="code-container"> <div class="code-box"> <label for="inputCode">输入代码:</label> <textarea id="inputCode" placeholder="输入要处理的JS代码...">// 测试用例:模板字符串、try/catch、循环 const user = "World"; const str = `Hello ${user}! Current time: ${new Date().getHours()}`; try { let a = 10; if (a > 5) throw new Error("测试异常"); } catch (e) { let a = 20; // 应与try块中的a区分 console.log(`捕获异常: ${e.message}, a=${a}`); } // 循环结构测试 for (let i = 0; i < 3; i++) { console.log(`循环第${i}次`); } while (Math.random() > 0.5) { console.log("随机循环"); }</textarea> </div> <div class="code-box"> <label for="outputCode">处理结果:</label> <textarea id="outputCode" placeholder="处理结果将显示在这里..." spellcheck="false"></textarea> </div> </div> </div> <!-- 引入AST解析库 --> <script src="https://cdn.jsdelivr.net/npm/acorn@8.10.0/dist/acorn.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/walkes@0.2.0/walkes.min.js"></script> <!-- Web Worker 脚本 --> <script id="worker-script" type="javascript/worker"> // 导入所需库(Worker内通过importScripts加载) importScripts( 'https://cdn.jsdelivr.net/npm/acorn@8.10.0/dist/acorn.min.js', 'https://cdn.jsdelivr.net/npm/walkes@0.2.0/walkes.min.js', 'https://cdn.jsdelivr.net/npm/text-encoding@0.7.0/lib/encoding.min.js' ); // 模块封装:字符串工具 const StringUtil = (() => { class _StringUtil { /** * 处理模板字符串,分离静态和动态部分 * @param {string} template 模板字符串内容(不含外层`) * @returns {Array} 格式: [{type: 'static', value: 'xxx'}, {type: 'dynamic', value: 'expr'}] */ static parseTemplate(template) { const parts = []; let currentStatic = ''; let inExpression = false; let braceDepth = 0; for (let i = 0; i < template.length; i++) { // 检测模板表达式开始 ${ if (!inExpression && template[i] === '$' && template[i+1] === '{') { if (currentStatic) { parts.push({ type: 'static', value: currentStatic }); currentStatic = ''; } inExpression = true; i++; // 跳过{ continue; } // 检测模板表达式结束 } if (inExpression) { if (template[i] === '{') braceDepth++; if (template[i] === '}') { if (braceDepth === 0) { inExpression = false; continue; } braceDepth--; } parts.push({ type: 'dynamic', value: template[i] }); continue; } currentStatic += template[i]; } if (currentStatic) { parts.push({ type: 'static', value: currentStatic }); } return parts; } /** * 转义字符串中的引号 * @param {string} str 原始字符串 * @param {string} quote 目标引号 * @returns {string} 转义后字符串 */ static escapeQuotes(str, quote) { const escapeChar = quote === '"' ? '\\"' : "\\'"; return str.replace(new RegExp(quote, 'g'), escapeChar); } /** * 兼容IE的replaceAll * @param {string} str 原始字符串 * @param {string} search 搜索值 * @param {string} replacement 替换值 * @returns {string} 替换后字符串 */ static replaceAll(str, search, replacement) { return String.prototype.replaceAll ? str.replaceAll(search, replacement) : str.split(search).join(replacement); } } return _StringUtil; })(); // 模块封装:加密工具 const CryptoUtil = (() => { class _CryptoUtil { /** * 生成会话级密钥(存储在sessionStorage) * @returns {string} 随机密钥 */ static generateSessionKey() { const key = Array.from(window.crypto.getRandomValues(new Uint8Array(16))) .map(b => b.toString(16).padStart(2, '0')) .join(''); sessionStorage.setItem('obf_session_key', key); return key; } /** * 获取会话密钥 * @returns {string} 密钥 */ static getSessionKey() { return sessionStorage.getItem('obf_session_key') || this.generateSessionKey(); } /** * AES-GCM加密 * @param {string} str 待加密字符串 * @returns {Promise<object>} 加密结果 */ static async encrypt(str) { try { const keyMaterial = await window.crypto.subtle.importKey( 'raw', new TextEncoder().encode(this.getSessionKey()), { name: 'AES-GCM' }, false, ['encrypt'] ); const iv = window.crypto.getRandomValues(new Uint8Array(12)); const encrypted = await window.crypto.subtle.encrypt( { name: 'AES-GCM', iv }, keyMaterial, new TextEncoder().encode(str) ); return { data: btoa(String.fromCharCode(...new Uint8Array(encrypted))), iv: btoa(String.fromCharCode(...iv)), method: 'aes' }; } catch (e) { throw new Error(`加密失败: ${e.message}`); } } /** * AES-GCM解密 * @param {object} payload 加密数据 * @returns {Promise<string>} 解密结果 */ static async decrypt(payload) { try { const keyMaterial = await window.crypto.subtle.importKey( 'raw', new TextEncoder().encode(this.getSessionKey()), { name: 'AES-GCM' }, false, ['decrypt'] ); const iv = Uint8Array.from(atob(payload.iv), c => c.charCodeAt(0)); const encrypted = Uint8Array.from(atob(payload.data), c => c.charCodeAt(0)); const decrypted = await window.crypto.subtle.decrypt( { name: 'AES-GCM', iv }, keyMaterial, encrypted ); return new TextDecoder().decode(decrypted); } catch (e) { throw new Error(`解密失败: ${e.message}`); } } } return _CryptoUtil; })(); // 模块封装:作用域分析器 const ScopeAnalyzer = (() => { class _ScopeAnalyzer { constructor() { this.scopeStack = [new Map()]; // 初始全局作用域 this.varIndex = 0; } /** * 进入新作用域 */ enterScope() { this.scopeStack.push(new Map()); } /** * 退出当前作用域 */ exitScope() { if (this.scopeStack.length > 1) this.scopeStack.pop(); } /** * 生成安全变量名 * @returns {string} 变量名 */ generateVarName() { const keywords = new Set(['break', 'case', 'catch', 'class', 'const', 'continue', 'debugger', 'default', 'delete', 'do', 'else', 'export', 'extends', 'finally', 'for', 'function', 'if', 'import', 'in', 'instanceof', 'new', 'return', 'super', 'switch', 'this', 'throw', 'try', 'typeof', 'var', 'let', 'void', 'while', 'with', 'yield']); const chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'; let name; do { const char = chars[this.varIndex % chars.length]; const num = Math.floor(this.varIndex / chars.length); name = `_${char}${num > 0 ? num : ''}`; this.varIndex++; } while (keywords.has(name) || this.isNameInScope(name)); return name; } /** * 检查变量名是否在作用域中存在 * @param {string} name 变量名 * @returns {boolean} 是否存在 */ isNameInScope(name) { return this.scopeStack.some(scope => Array.from(scope.values()).includes(name) ); } /** * 获取变量混淆名 * @param {string} originalName 原始变量名 * @returns {string} 混淆名 */ getMangledName(originalName) { const currentScope = this.scopeStack[this.scopeStack.length - 1]; if (currentScope.has(originalName)) { return currentScope.get(originalName); } const mangled = this.generateVarName(); currentScope.set(originalName, mangled); return mangled; } /** * 获取当前作用域链的所有变量映射 * @returns {Map} 变量映射表 */ getAllMappings() { return this.scopeStack.reduce((acc, scope) => { scope.forEach((v, k) => acc.set(k, v)); return acc; }, new Map()); } } return _ScopeAnalyzer; })(); // 模块封装:混淆器 const Obfuscator = (() => { class _Obfuscator { /** * 处理模板字符串 * @param {string} code 代码 * @returns {Promise<string>} 处理后代码 */ static async processTemplates(code) { const ast = acorn.parse(code, { ecmaVersion: 2020, sourceType: 'module' }); let result = code; let offset = 0; walkes(ast, { TemplateLiteral(node) { // 提取模板字符串内容 const start = node.start + offset; const end = node.end + offset; const templateRaw = code.slice(start, end); const templateContent = templateRaw.slice(1, -1); // 移除外层` // 解析静态和动态部分 const parts = StringUtil.parseTemplate(templateContent); if (parts.length === 0) return; // 加密静态部分 const processedParts = parts.map(async part => { if (part.type === 'static') { const encrypted = await CryptoUtil.encrypt(part.value); return `await CryptoUtil.decrypt(${JSON.stringify(encrypted)})`; } return part.value; // 动态部分保留原样 }); // 生成新模板字符串 Promise.all(processedParts).then(processed => { const newTemplate = `\`\${[${processed.join(',')}].join('')}\``; // 替换原始模板 result = result.slice(0, start) + newTemplate + result.slice(end); offset += newTemplate.length - (end - start); }); } }); return result; } /** * 处理作用域(包含try/catch) * @param {string} code 代码 * @returns {string} 处理后代码 */ static processScopes(code) { const analyzer = new ScopeAnalyzer(); const ast = acorn.parse(code, { ecmaVersion: 2020 }); // 遍历AST标记作用域 walkes(ast, { // 函数作用域 FunctionDeclaration(node) { analyzer.enterScope(); node.params.forEach(param => this.processParam(param, analyzer)); }, FunctionExpression(node) { analyzer.enterScope(); node.params.forEach(param => this.processParam(param, analyzer)); }, ArrowFunctionExpression(node) { analyzer.enterScope(); node.params.forEach(param => this.processParam(param, analyzer)); }, // 块级作用域 BlockStatement() { analyzer.enterScope(); }, // try/catch作用域 TryStatement(node) { analyzer.enterScope(); // try块 if (node.handler) { analyzer.enterScope(); // catch块 this.processParam(node.handler.param, analyzer); } }, // 类作用域 ClassDeclaration() { analyzer.enterScope(); }, // 退出作用域 'FunctionDeclaration:exit'() { analyzer.exitScope(); }, 'FunctionExpression:exit'() { analyzer.exitScope(); }, 'ArrowFunctionExpression:exit'() { analyzer.exitScope(); }, 'BlockStatement:exit'() { analyzer.exitScope(); }, 'TryStatement:exit'(node) { analyzer.exitScope(); // 退出try块 if (node.handler) analyzer.exitScope(); // 退出catch块 }, 'ClassDeclaration:exit'() { analyzer.exitScope(); } }, this); // 替换变量名 return this.replaceIdentifiers(code, analyzer.getAllMappings()); } /** * 处理函数参数 * @param {object} param 参数节点 * @param {ScopeAnalyzer} analyzer 作用域分析器 */ static processParam(param, analyzer) { if (param.type === 'Identifier') { analyzer.getMangledName(param.name); } } /** * 替换标识符 * @param {string} code 代码 * @param {Map} mappings 变量映射表 * @returns {string} 处理后代码 */ static replaceIdentifiers(code, mappings) { let result = code; mappings.forEach((mangled, original) => { const regex = new RegExp(`\\b${original}\\b(?!\\.)`, 'g'); result = StringUtil.replaceAll(result, regex, mangled); }); return result; } /** * 控制流扁平(支持循环和switch) * @param {string} code 代码 * @returns {string} 处理后代码 */ static flattenControlFlow(code) { // 处理for循环 code = code.replace(/for\s*\(\s*let\s+(\w+)\s*=\s*(\d+)\s*;\s*\1\s*<\s*(\d+)\s*;\s*\1\+\+\s*\)\s*\{/g, (match, varName, start, end) => { return `{let ${varName}=${start};const _loop=()=>{if(${varName}<${end}){`; } ).replace(/}\s*(\/\/.*)?$/gm, (match) => { return `${match}${varName}++;_loop();}};_loop();}`; }); // 处理while循环 code = code.replace(/while\s*\((.*?)\)\s*\{/g, `{const _loop=()=>{if($1){` ).replace(/}\s*(\/\/.*)?$/gm, (match) => { return `${match}_loop();}};_loop();}`; }); // 处理if-else if-else let caseIdx = 1; code = code.replace(/if\s*\((.*?)\)\s*\{/g, () => { const idx = caseIdx++; return `switch(true){case ${idx}:`; }).replace(/}\s*else\s+if\s*\((.*?)\)\s*\{/g, () => { const idx = caseIdx++; return `break;case ${idx}:`; }).replace(/}\s*else\s*\{/g, 'break;default:'); return code; } /** * 添加防调试保护 * @param {string} code 代码 * @returns {string} 处理后代码 */ static addAntiDebug(code) { const antiDebugCode = ` (()=>{ const check = ()=>{ const t=performance.now(); debugger; if(performance.now()-t>160)throw new Error("检测到调试行为"); }; setInterval(check, 100); })(); `; return antiDebugCode + code; } /** * 分块处理大文件 * @param {string} code 代码 * @param {object} config 配置 * @returns {Promise<string>} 混淆结果 */ static async processInChunks(code, config) { const chunkSize = 1024 * 10; // 10KB每块 const chunks = []; for (let i = 0; i < code.length; i += chunkSize) { chunks.push(code.slice(i, i + chunkSize)); } // 并行处理所有块 const processedChunks = await Promise.all(chunks.map(async (chunk, idx) => { self.postMessage({ type: 'progress', value: Math.floor((idx / chunks.length) * 80) }); let processed = chunk; processed = await this.processTemplates(processed); processed = this.processScopes(processed); if (config.flattenControl) processed = this.flattenControlFlow(processed); return processed; })); let result = processedChunks.join(''); if (config.antiDebug) result = this.addAntiDebug(result); self.postMessage({ type: 'progress', value: 100 }); return result; } /** * 执行混淆 * @param {string} code 代码 * @param {object} config 配置 * @returns {Promise<string>} 混淆结果 */ static async run(code, config) { try { return await this.processInChunks(code, config); } catch (e) { const category = e.message.includes('加密') ? 'security' : 'syntax'; throw { message: e.message, category }; } } } return _Obfuscator; })(); // 模块封装:解密器 const Deobfuscator = (() => { class _Deobfuscator { /** * 分块解密 * @param {string} code 代码 * @returns {Promise<string>} 解密结果 */ static async processInChunks(code) { const chunkSize = 1024 * 10; const chunks = []; for (let i = 0; i < code.length; i += chunkSize) { chunks.push(code.slice(i, i + chunkSize)); } const processedChunks = await Promise.all(chunks.map(async (chunk, idx) => { self.postMessage({ type: 'progress', value: Math.floor((idx / chunks.length) * 100) }); return await this.restoreStrings(chunk); })); return processedChunks.join(''); } /** * 还原加密字符串 * @param {string} code 代码 * @returns {Promise<string>} 处理后代码 */ static async restoreStrings(code) { const regex = /await CryptoUtil\.decrypt\((\{.*?\})\)/g; const matches = []; let match; while ((match = regex.exec(code)) !== null) { matches.push({ full: match[0], payload: JSON.parse(match[1]), index: match.index }); } // 倒序替换 matches.sort((a, b) => b.index - a.index).forEach(async (item) => { try { const decrypted = await CryptoUtil.decrypt(item.payload); code = code.slice(0, item.index) + decrypted + code.slice(item.index + item.full.length); } catch (e) { console.warn(`解密片段失败: ${e.message}`); } }); return code; } /** * 执行解密 * @param {string} code 代码 * @returns {Promise<string>} 解密结果 */ static async run(code) { try { return await this.processInChunks(code); } catch (e) { const category = e.message.includes('解密') ? 'security' : 'syntax'; throw { message: e.message, category }; } } } return _Deobfuscator; })(); // Worker消息处理 self.onmessage = async (e) => { const { type, code, config } = e.data; try { // 暴露加密工具供解密函数使用 self.CryptoUtil = CryptoUtil; let result; if (type === 'obfuscate') { result = await Obfuscator.run(code, config); } else { result = await Deobfuscator.run(code); } self.postMessage({ type: 'success', result }); } catch (e) { self.postMessage({ type: 'error', message: e.message, category: e.category || 'unknown' }); } }; </script> <script> // 主页面脚本 (() => { // 工具函数 const showAlert = (message, isError = true, category = 'unknown') => { const alertEl = document.getElementById('alert'); alertEl.textContent = message; alertEl.className = `alert ${ isError ? (category === 'security' ? 'error' : 'error') : 'success' } show`; setTimeout(() => alertEl.classList.remove('show'), 3000); }; const updateProgress = (percent) => { const progress = document.querySelector('.progress'); const bar = document.getElementById('progressBar'); progress.classList.add('show'); bar.style.width = `${Math.min(100, percent)}%`; if (percent >= 100) { setTimeout(() => progress.classList.remove('show'), 500); } }; // 创建Web Worker const createWorker = () => { const workerScript = document.getElementById('worker-script').textContent; const blob = new Blob([workerScript], { type: 'application/javascript' }); const url = URL.createObjectURL(blob); return new Worker(url); }; // 混淆处理 window.handleObfuscate = () => { const input = document.getElementById('inputCode').value.trim(); if (!input) return showAlert('请输入要混淆的代码'); const obfBtn = document.getElementById('obfBtn'); obfBtn.disabled = true; updateProgress(0); const worker = createWorker(); const config = { mangleVars: document.getElementById('mangleVars').checked, mangleFuncs: document.getElementById('mangleFuncs').checked, mangleClasses: document.getElementById('mangleClasses').checked, flattenControl: document.getElementById('flattenControl').checked, antiDebug: document.getElementById('antiDebug').checked }; worker.onmessage = (e) => { if (e.data.type === 'progress') { updateProgress(e.data.value); } else if (e.data.type === 'success') { document.getElementById('outputCode').value = e.data.result; showAlert('混淆成功', false); obfBtn.disabled = false; } else if (e.data.type === 'error') { showAlert(`混淆失败: ${e.data.message}`, true, e.data.category); obfBtn.disabled = false; } }; worker.postMessage({ type: 'obfuscate', code: input, config }); }; // 解密处理 window.handleDeobfuscate = () => { const input = document.getElementById('inputCode').value.trim(); if (!input) return showAlert('请输入要解密的代码'); const deobfBtn = document.getElementById('deobfBtn'); deobfBtn.disabled = true; updateProgress(0); const worker = createWorker(); worker.onmessage = (e) => { if (e.data.type === 'progress') { updateProgress(e.data.value); } else if (e.data.type === 'success') { document.getElementById('outputCode').value = e.data.result; showAlert('解密成功', false); deobfBtn.disabled = false; } else if (e.data.type === 'error') { showAlert(`解密失败: ${e.data.message}`, true, e.data.category); deobfBtn.disabled = false; } }; worker.postMessage({ type: 'deobfuscate', code: input }); }; // 复制结果 window.copyResult = () => { const output = document.getElementById('outputCode'); if (!output.value) return showAlert('没有可复制的内容'); output.select(); document.execCommand('copy'); showAlert('复制成功', false); }; })(); </script> </body> </html>
最新发布
11-06
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

邵奈一

教育是一生的事业。

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值