工作流挖掘：相关问题和方法的研究(4)相关的工作_工作流挖掘有哪些应用-优快云博客

本文综述了过程挖掘领域的相关工作，包括多种过程发现方法如神经网络、算法方法及马尔科夫链等，并讨论了其在软件工程和工作流管理中的应用。文中还对比分析了各种方法的特点和局限。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

2. Related work

The idea of process mining is not new [8,11,15–17,24–29,42–44,53–57,61–63]. Cook and Wolf have investigated similar issues in the context of software engineering processes. In [15] they describe three methods for process discovery: one using neural networks, one using a purely algorithmic approach, and one Markovian approach. The authors consider the latter two the most promising approaches. The purely algorithmic approach builds a finite state machine(FSM) where states are fused if their futures (in terms of possible behavior in the next k steps) are identical. The Markovian approach uses a mixture of algorithmic and statistical methods and is able to deal with noise. Note that the results presented in [6] are limited to sequential behavior. Cook and Wolf extend their work to concurrent processes in [16]. They propose specific metrics (entropy, event type counts, periodicity, and causality) and use these metrics to discover models out of event streams. However, they do not provide an approach to generate explicit process models. Recall that the final goal of the approach presented in this paper is to find explicit representations for a broad range of process models, i.e., we want to be able to generate a concrete Petri net rather than a set of dependency relations between events. In [17] Cook and Wolf provide a measure to quantify discrepancies between a process model and the actual behavior as registered using event-based data. The idea of applying process mining in the context of workflow management was first introduced in [11]. This work is based on workflow graphs, which are inspired by workflow products such as IBM MQSeries workflow (formerly known as Flowmark) and InConcert. In this paper, two problems are defined. The first problem is to find a workflow graph generating events appearing in a given workflow log. The second problem is to find the definitions of edge conditions. A concrete algorithm is given for tackling the first problem. The approach is quite different from other approaches: Because the nature of workflow graphs there is no need to identify the nature (AND or OR) of joins and splits. As shown in [37], workflow graphs use true and false tokens which do not allow for cyclic graphs. Nevertheless, [11] partially deals with iteration by enumerating all occurrences of a given task and then folding the graph. However, the resulting conformal graph is not a complete model. In [44], a tool based on these algorithms is presented. Schimm [53,54,57] has developed a mining tool suitable for discovering hierarchically structured workflow processes. This requires all splits and joins to be balanced. Herbst and Karagiannis also address the issue of process mining in the context of workflow management [24–29] using an inductive approach. The work presented in [27,29] is limited to sequential models. The approach described in [24–26,28] also allows for concurrency. It uses stochastic task graphs as an intermediate representation and it generates a workflow model described in the ADONIS modeling language. In the induction step task nodes are merged and split in order to discover the underlying process. A notable difference with other approaches is that the same task can appear multiple times in the workflow model. The graph 240 W.M.P. van der Aalst et al. / Data & Knowledge Engineering 47 (2003) 237–267 generation technique is similar to the approach of [11,44]. The nature of splits and joins (i.e., AND or OR) is discovered in the transformation step, where the stochastic task graph is transformed into an ADONIS workflow model with block-structured splits and joins. In contrast to the previous papers, the work in [8,42,43,61,62] is characterized by the focus on workflow processes with concurrent behavior (rather than adding ad hoc mechanisms to capture parallelism). In [61,62] a heuristic approach using rather simple metrics is used to construct so-called “dependency/frequency tables” and “dependency/frequency graphs”. In [42] another variant of this technique is presented using examples from the health-care domain. The preliminary results presented in [42,61,62] only provide heuristics and focus on issues such as noise. The approach described in [8] differs from these approaches in the sense that for the a algorithm it is proven that for certain subclasses it is possible to find the right workflow model. In [3] the a algorithm is extended to incorporate timing information.

2. 相关的工作

过程挖掘的想法并不新鲜（译注：可于参考文献8，11，15-17，24-29，42-44，53-57，61-63中查阅到相关论述）。库克和沃尔夫已经在软件工程过程领域就类似问题做过研究。在参考文献[15]中他们描述了过程发现的3种方法：通过神经网路、单纯的算法步骤以及马尔科夫链方法。作者认为后两种是最有前景的方法。纯算法构造了一种有限状态机（FSM），（根据之后的若干步可能的行为动作）如果其预期是可以断定的，它的状态就是确定的。马尔科夫链则是算术和统计方法的混合体，它能（有效地）处理“噪音”。在参考文献[6]中对于连续行为提到的结论是有限的。在参考文献[16]中，库克和沃尔夫将他们的研究扩展到了并发过程。他们提到了一些特殊的名词（熵，事件类型统计，周期和因果关系）并且通过这些从事件流中发现一些模式。然而，他们并没有提供一种生成直接的过程模式的方法。重申一下，本文所展示的方法的最终目的是寻找更广范围内的过程模式的直接表示方法，即：我们想要并且能够构造一种具体的Petri网络而不仅仅限于一组事件的依赖关系。在参考文献[17]中，库克和沃尔夫提供了一种度量方法（形式）以量化过程模型和利用事件为基础的数据注册的实际行为之间的差异。在参考文献[11]中首次介绍了在工作流管理方面应用过程挖掘的想法。这一著作用到了像IBM MQSeries工作流（大家所熟知的Flowmark）和InConcert等工作流产品所推荐的工作流图表。本文将就如下2个问题给出定义，第1个问题是通过一种工作流图表去再现工作流日志中提到的事件的，第2个问题是定义边界条件。（工作流挖掘）方法不同去其他方法：因为工作流图表的初衷是没必要识别结合和分离的特性（与或者或）的。正如参考文献[37]所提到的，工作流图表由真和假的分支构成并且不允许循环。进而，参考文献[11]钟通过列举给定任务的所有可能性并折叠图表的方式部分处理了这些反复（循环）。然而，最终形成的图表并不是一种完整的模式。在参考文献[44]中，提出了一种基于这些算法的工具。Schimm在参考文献[53，54，57]中开发了一种适于各级工作流过程的挖掘工具。这就要求所有的分离和结合都是对等的，在参考文献[24-29]中，Herbst和Karagiannis通过一种引导性的方法提出了在工作流过程中的过程挖掘问题。参考文献[27，29]中的成果局限于连续性模式。参考文献[24-26，28]中描述的方法也允许同时发生（使用）。