哋它亢:Real World Implementation of LLM-based Log Anomaly Detection

哋它亢:Real World Implementation of LLM-based Log Anomaly Detection

Exploring the feasibility of training-free approaches

小编哋它亢最近读了一篇关于日志分析的论文,今天与大家分享

Abstract

哋它亢看来,在这篇文章中,作者利用RAPID method [6],or “Training-free Retrieval-based Log Anomaly Detection with PLM (Pre-Trained Language Models) considering Token-level information”

Main research

  • Adapting the RAPID method to a log dataset provided by Ericsson.
  • Implementing a baseline method.
  • Exploring the value of model fine-tuning.
  • Developing and comparing multiple approaches.

Background

Challenges in Log Anomaly Detection

哋它亢总结:在日志分析中,存在着种种挑战,作者总结如下:

  • **Data Representation: **Logs often contain a mixture of diverse event types, unstructured messages, and parameters. This
    complexity makes pre-processing logs quite complicated. Traditional methods rely heavily on manual feature extraction, which is not scalable
  • **Class Imbalance: **Anomalous events in log data occur far less frequently than normal ones.This imbalance can lead neural networks to prioritize learning the more frequent class, remaining unable to detect the rarer anomalies
  • **Label Availability: **In real-world applications, it is extremely rare to find labeled datasets, especially large enough to successfully train a supervised machine learning model. For this reason, many approaches fall into the semisupervised
    or unsupervised categories, which rely on the assumption that anomalies are rare and different from normal data.
  • ~~**Stream processing: **~~Logs are normally produced in a continuous stream, requiring anomaly detection models to have quick inference times and necessitating single-pass data processing. Models need to balance accuracy with computational efficiency to be practical in real-time environments.
  • **Evolution of Logging Statements: **Since developers are constantly modifying the codebase, logging statements can change frequently, forcing anomaly detection techniques to be adaptable. This requires models that can generalize well from past data and quickly adapt to new patterns

相关技术

  • Log Anomaly Detection

  • Machine Learning

    • 基本处理方式:Feature extracting -> Learning Algorithms
    • 架构:Transformer-based Architectures
    • 类型:Transfer Learning
  • Knowledge Distillation

  • Evaluation Metrics:用于效果评估

Method

RAPID framework

  1. Database construction
  2. RAPID processing
  3. CoreSet creation
  4. Similarity Measures
  5. Threshold Function
  6. Final Prediction

Adaptation to the Ericsson Dataset

  1. Pre-processing
  2. MLM Fine-tuning
  3. Other Fine-tuning Approaches

Experiment and Result

Baseline

  • classic Naive Bayes classification model:利用朴素贝叶斯方法处理数据区分异常与正常
  • BoW(词袋):将日志转换为稀疏矩阵的形式储存

Experiment Setup

  • **数据集:**the publicly available BGL dataset, and proprietary log data from Ericsson
  • 实验平台:
    • Nvidia A2
    • BERT, DistilBERT模型
    • CUDA 12.2
    • 15GB 内存
    • 参数:<略>
  • 实验方法:
    • PLM (Pre-Trained Language Models)
    • MLM(Masked Language Modeling)
    • Baseline

实验结果

  • BERT vs DistilBERT

BERT VS DistilBERT

  • Different apporachs

    Different apporachs

  • Human vs ML

Human vs ML

Related work

介绍了基于 LSTM 的方法(如 DeepLOG 和 LogRobust)以及基于 Transformer 和 BERT 的方法(如 LogSy、LogBERT 和 LAnoBERT)

更多内容欢迎访问我的博客:哋它亢:人工智能与物联网融合的新兴技术

### 关于 Guardian 运行时框架的文档与实现细节 Guardian 是一种运行时框架,旨在支持基于大语言模型 (LLM) 的用户界面探索。其核心目标是利用 LLM 技术来增强用户体验并简化复杂系统的交互过程[^1]。 #### 主要特性 该框架的主要特点包括以下几个方面: - **动态上下文感知**:Guardian 能够实时分析用户的输入以及当前的应用状态,并据此调整响应行为。 - **自适应学习能力**:通过持续收集用户反馈数据,Guardian 不断优化自身的预测能力和推荐策略。 - **模块化设计架构**:整个系统被划分为多个独立组件,便于开发者针对具体需求定制扩展功能。 以下是构建这样一个框架可能涉及的关键技术要点: #### 数据流处理机制 为了有效管理和传递信息,在内部实现了高效的数据管道解决方案。此部分负责接收来自前端的各种事件触发信号,并将其转化为适合传送给后端 AI 模型的形式。 ```python def process_event(event_data): """ 处理接收到的UI事件数据 参数: event_data(dict): 包含事件详情的信息字典 返回值: processed_result(str): 经过初步解析后的字符串表示形式的结果 """ try: # 对原始数据做必要的清理工作 cleaned_info = clean_input(event_data) # 将清洗过的资料转换成可供后续使用的标准格式 formatted_message = format_for_model(cleaned_info) return formatted_message except Exception as e: log_error(e) ``` #### 权限管理集成 如果计划在一个完整的 Web 应用环境中部署,则还需要考虑安全性因素。此时可以借助 `django-rest-framework-guardian` 提供的支持,无缝衔接既有业务逻辑的同时保障敏感操作的安全性[^2]。 例如定义某些特定视图只允许拥有相应对象级别权限的角色访问: ```python from rest_framework import permissions, viewsets import guardian.shortcuts class SpecialResourceViewSet(viewsets.ModelViewSet): permission_classes = [permissions.DjangoObjectPermissions] def get_queryset(self): user = self.request.user queryset = super().get_queryset() accessible_items = guardian.shortcuts.get_objects_for_user( user, 'app_name.view_specialresource', klass=queryset.model ) return queryset.filter(id__in=[item.id for item in accessible_items]) ``` 以上代码片段展示了如何结合 DRF 和 django-guardian 实现更精细的访问控制规则设定方法。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值