通过主动学习生成自动机 (A Quick Survey of Active Automata Learning) - wcventure

最新推荐文章于 2024-07-29 19:17:56 发布

原创

最新推荐文章于 2024-07-29 19:17:56 发布 · 置顶 · 2.6k 阅读

7 ·

CC 4.0 BY-SA版权

本文探讨了主动学习自动机的各种方法，涵盖了从确定有限自动机到混合自动机的多种类型，介绍了Angluin的L*算法等核心算法，并讨论了它们在协议、智能卡、遗留软件测试等领域的应用。

A Quick Survey of Active Automata Learning

Remark 1：For Basic theoretical knowledge of the Angluin’s L* Algorithm, the interested reader can refer to this article.
Remark 2：go through the article on github.

Introduction
Target Automata Types
- 2.1 Deterministic Finite Automata (DFAs)
- 2.2 Nondeterministic Finite Automata (NFAs)
- 2.3 Moore Machine
- 2.4 Mealy Machine
- 2.5 Register Automata (RAs)
- 2.6 Büchi Automata (BAs)
- 2.7 Nominal Automata
- 2.8 Timed Automata
- 2.9 Weighted Automata (WFAs)
- 2.10 Hybrid Automata
- 2.11 Symbolic Automata (Sigma3 TBD)
- 2.12 Others
Approach
- 3.1 Angluin’s L* Algorithm
- 3.2 RPNI Algorithm (NFAs)
- 3.3 Rivest & Schapire’s Algorithm
- 3.4 Kearns & Vazirani’s Algorithm
- 3.5 Lω Algorithm (ω-regular sets)
- 3.6 RPNI2 Algorithm
- 3.7 ID and IID Algorithm
- 3.8 DeLeTe2 Algorithm
- 3.9 Estimation-Exploration Algorithm
- 3.10 L* based Algorithm for Büchi automaton (Büchi automaton)
- 3.11 TL* Algorithm (DERAs)
- 3.12 LM* and LM+ Algorithm (Mealy Machines)
- 3.13 NL* Algorithm (NFAs)
- 3.14 IOA Algorithm (Mealy Machines for I/O Automata)
- 3.15 DHC and LM* Algorithm (Mealy Machines for realistic reactive systems)
- 3.16 RAL Algorithm (Register Automata)
- 3.17 Ll Algorithm (finite cover automata)
- 3.18 The TTT Algorithm
- 3.19 Heerdt’s Algorithm (Mealy Machines)
- 3.20 SL* Algorithm (EFSM, Register Automata)
- 3.21 FDFAs-based Algorithm (Regular ω-languages)
- 3.22 A Mapper-Based Algorithm (Register Automata)
- 3.23 LearnLTS Algorithm
- 3.24 Medhat’s Algorithm (Hybrid Automata)
- 3.25 learning VPA (Visibly Pushdown Automata)
- 3.26 learning WFAS (Weight Automata)
- 3.27 MooreMI algorithm (Moore Machines)
- 3.28 νL* and νNL* Algorithm (Nominal Automata)
- 3.29 Product L* Algorithm (Moore Machines)
- 3.30 A Tree-based Algorithm for FDFAs (Büchi automaton)
Tools
- 4.1 LearnLib
- 4.2 Next Generation LearnLib (NGLL)
- 4.3 Libalf
- 4.4 RALT
- 4.5 Tomte
- 4.6 RALib
- 4.7 ROLL
- 4.8 Scikit-SpLearn
Application
- 5.1 Protocol
- Special topic: State machine inference for security protocols
- 5.2 SmartCard
- 5.3 Legacy software
- 5.4 LBT & Testing Finite-State Machines
- 5.5 Conformance Testing
- 5.6 Model Checking & Black Box Checking
- 5.7 Improving Efficiency And Scalability of Verification
- 5.8 Testing IoT Communication
- 5.9 Inferring Interface Programs of Systems At Runtime
- 5.10 Program Structures And Interface Programs
- 5.11 Extracting Automata from Neural Networks
- 5.12 Active Automata Learning For Real-Life Applications
- 5.13 Learning Communicating Automata from MSCs
- 5.14 Automated Compositional Verification For Timed Systems
- 5.15 Learn stateful typestates
- 5.16 Fuzzy Learning Automata
Challenge And Discussion
- 6.1 Predicates and operations on data
- 6.2 Beyond Mealy machines
- 6.3 Quality of models
- 6.4 Opening the box
Reference

ARTICLE

1. Introduction

This is a survey on active automata learning.

Automata learning, or model learning, aims to construct black-box state diagram models of software and hardware systems by providing inputs and observing outputs. In this article, we focus on one specific type of models, namely Automata, which are crucial for understanding the behavior of many software systems. Model inference techniques can be either white box or black box, depending on whether they need access to the code. In this article, we discuss black box techniques. Advantages of these techniques are that they are relatively easy to use and can also be applied in situations where we do not have access to the code or to adequate white box tools. There is a large body of research on learning automata and state machines, which can be divided into two broad categories: learning with queries and answers(active learning), and learning only from examples (passive learning). As a final restriction, we only consider techniques for active learning, that is, techniques that accomplish their task by actively doing experiments (tests) on the software. This survey mainly foucus on active Automata learning, and the related passive learning techniques may be slightly involved.

2. Target Automata Types

The original active automata learning algorithm has originally been presented for Deterministic Finite Automata (DFA), but has since been adapted to Mealy Machines, which are a better fit for learning actual reactive systems as they can encode system output in a natural way. A major and recent increase in expressiveness is achieved with Register Automata (RA) and Buchi Automata (BA).

2.1 Deterministic Finite Automata (DFA)

A deterministic finite automaton M is a 5-tuple, $(Q, Σ, δ, q 0, F)$ , consisting of

a finite set of states (Q)
a finite set of input symbols called the alphabet (Σ)
a transition function (δ : Q × Σ → Q)
an initial or start state (q0 ∈ Q)
a set of accept states (F ⊆ Q)

Let w = a1a2 … an be a string over the alphabet Σ. The automaton M accepts the string w if a sequence of states, r0,r1, …, rn, exists in Q with the following conditions:

r0 = q0
ri+1=δ(ri, ai+1), for i = 0, …, n−1
rn∈F.

In words, the first condition says that the machine starts in the start state q0. The second condition says that given each character of string w, the machine will transition from state to state according to the transition function δ. The last condition says that the machine accepts w if the last input of w causes the machine to halt in one of the accepting states. Otherwise, it is said that the automaton rejects the string. The set of strings that M accepts is the language recognized by M and this language is denoted by L(M).

A deterministic finite automaton without accept states and without a starting state is known as a transition system or semiautomaton.

Related Approach

Related Approach	-	Title
Angluins et al.	1987	Learning regular sets from queries and counterexamples
Rivest and Schapire	1993	Inference of Finite Automata Using Homing Sequences
Kearns and Vazirani	1994	An introduction to computational learning theory
parekh et al.	1997	A polynomial time incremental algorithm for regular grammar inference
Denis et al.	2001	Learning regular languages using RFSAs
Bongard et al.	2005	Active Coevolutionary Learning of Deterministic Finite Automata
Isberner et al.	2014	The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning
Volpato et al.	2015	Approximate Active Learning of Nondeterministic Input Output Transition Systems

2.2 Nondeterministic Finite Automata (NFA)

In automata theory, a finite state machine is called a deterministic finite automaton (DFA), if

each of its transitions is uniquely determined by its source state and input symbol, and
reading an input symbol is required for each state transition.

A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, does not need to obey these restrictions. In particular, every DFA is also an NFA. Sometimes the term NFA is used in a narrower sense, referring to a NDFA that is not a DFA, but not in this article.
Using the subset construction algorithm, each NFA can be translated to an equivalent DFA, i.e. a DFA recognizing the same formal language. Like DFAs, NFAs only recognize regular languages.

Related Approach

Related Approach	-	Title
Oncina et al.	1992	Inferring Regular Languages in Polynomial Updated Time
Dupont et al.	1996	Incremental regular inference

2.3 Moore Machine

In the theory of computation, a Moore machine is a finite-state machine whose output values are determined only by its current state. This is in contrast to a Mealy machine, whose output values are determined both by its current state and by the values of its inputs.
A Moore machine can be defined as a 6-tuple (S, S_0, Σ, Λ, T, G) consisting of the following:

a finite set of states S
a start state (also called initial state) S_0 which is an element of S
a finite set called the input alphabet Σ
a finite set called the output alphabet Λ
a transition function T:S × Σ → S mapping a state and the input alphabet to the next state
an output function G:S → Λ mapping each state to the output alphabet
A Moore machine can be regarded as a restricted type of finite-state transducer.

Preliminaries

Moore E F. Gedanken-Experiments on Sequential Machines[M]// Automata Studies. 1956:129-153.

Related Approach

Related Approach	-	Title
Georgios et al.	2016	Learning Moore Machines from Input-Output Traces
Moerman et al.	2017	Learning Product Automata

2.4 Mealy Machine

In the theory of computation, a Mealy machine is a finite-state machine whose output values are determined both by its current state and the current inputs. (This is in contrast to a Moore machine, whose output values are determined solely by its current state.) A Mealy machine is a deterministic finite-state transducer: for each state and input, at most one transition is possible.
A Mealy machine is a 6-tuple (S, S_0, Σ, Λ, T, G) consisting of the following:

a finite set of states S
a start state (also called initial state) S_0 which is an element of S
a finite set called the input alphabet Σ
a finite set called the output alphabet Λ
a transition function T:S × Σ → S mapping pairs of a state and an input symbol to the corresponding next state.
an output function G:S × Σ → Λ mapping pairs of a state and an input symbol to the corresponding output symbol.
In some formulations, the transition and output functions are coalesced into a single function T:S × Σ → S × Λ.

Preliminaries

Mealy G H. A method for synthesizing sequential circuits[J]. Bell System Technical Journal, 2013, 34(5):1045-1079.

Related Approach

Related Approach	-	Title
Shahbaz et al.	2009	Inferring Mealy Machines
Aarts et al.	2010	Learning I/O Automata
Steffen et al.	2011	Introduction to Active Automata Learning from a Practical Perspective

2.5 Register Automata (RA)

Register Automata are an extension of finite automata with data from in finite domains and are, e.g., well-suited for describing communication protocols. Register Automata are defied as follows:

Definition 1. Let a symbolic input be a pair (a; p¯), of a parameterized input a of arity k and a sequence of symbolic parameters p¯ = <p1, …, pk> Let further X = <x1, …, xm> be a finite set of registers. A guard is a conjunction of equalities and negated equalities, e.g., pi != xj, over formal parameters and registers. An assignment is a partial mapping ρ : X → X ∪ P for a set P of formal parameters.

Defiition 2. A Register Automaton (RA) is a tuple A* = (A, L, l0, X, Γ, λ), where

A is a finite set of actions.
L is a finite set of locations.
l0 ∈ L is the initial location.
X is a finite set of registers.
Γ is a finite set of transitions, each of which is of form h<l, (a, p¯), g, ρ, l’>, where l is the source location, l’ is the target location, (a, p¯) is a parameterized action, g is a guard, and ρ is an assignment.
λ : L → {+, -} maps each location to either + (accept) or - (reject).

Let us define the semantics of an RA A* = (A, L, l0, X, Γ, λ). A X-valuation, denoted by v, is a (partial) mapping from X to D. A state of A* is a pair <l, v> where l ∈ L and v is a X-valuation. The initial state is <l0, v0>, i.e., the pair of initial location and empty valuation.

A step of A*, denoted by <l, v> -(a,d¯)→ <l’, v’>, transfers A* from <l, v> to <l0, v0> on input (a, d¯) if there is a transition <l, (a, p¯), g, ρ, l’> ∈ Γ such that (1) g is modeled by d¯ and v, i.e., if it becomes true when replacing all pi by di and all xi by v(xi), and such that (2) v’ is the updated X-valuation, where v’(xi) = v(xj) wherever ρ(xi) = xj, and v’(xi) = dj wherever ρ(xi) = pj.

Related Approach

Related Approach	-	Title
Howar et al.	2012	Inferring Canonical Register Automata
Cassel et al.	2014	Active learning for extended finite state machines
Aarts et al.	2015	Learning Register Automata with Fresh Value Generation

2.6 Büchi Automata

In computer science and automata theory, a Büchi automaton is a type of ω-automaton, which extends a finite automaton to infinite inputs. It accepts an infinite input sequence if there exists a run of the automaton that visits (at least) one of the final states infinitely often. Büchi automata recognize the omega-regular languages, the infinite word version of regular languages. It is named after the Swiss mathematician Julius Richard Büchi who invented this kind of automaton in 1962.
Büchi automata are often used in model checking as an automata-theoretic version of a formula in linear temporal logic.

Formally, a deterministic Büchi automaton is a tuple A = (Q, Σ, δ, q0, F) that consists of the following components:

Q is a finite set. The elements of Q are called the states of A.
Σ is a finite set called the alphabet of A.
δ: Q × Σ → Q is a function, called the transition function of A.
q0 is an element of Q, called the initial state of A.
F⊆Q is the set of accepting states. A accepts exactly those runs in which at least one of the infinitely often occurring states is in F.

In a non-deterministic Büchi automaton, the transition function δ is replaced with a transition relation Δ that returns a set of states, and the single initial state q0 is replaced by a set I of initial states. Generally, the term Büchi automaton without qualifier refers to non-deterministic Büchi automata.

Preliminaries

Büchi J R. On a Decision Method in Restricted Second Order Arithmetic[M]// The Collected Works of J. Richard Büchi. Springer New York, 1990:511-8.
Calbrix H, Nivat M, Podelski A. Ultimately periodic words of rational ω -languages[J]. Comptes Rendus de l Académie des Sciences - Series I - Mathematics, 1993, 802(5):554-566.
Farwer B. ω-Automata[M]// Automata Logics, and Infinite Games. Springer Berlin Heidelberg, 2002:3-21.

Related Approach

Related Approach	-	Title
Maler and Pnueli	1995	On the learnability of infinitary regular sets
Farzan et al.	2008	Extending Automated Compositional Verification to the Full Class of Omega-Regular Languages
Angluin et al.	2014	Learning Regular Omega Languages
Li et al.	2017	A Novel Learning Algorithm for Büchi Automata Based on Family of DFAs and Classification Trees

2.7 Nominal Automata

Nominal automata are automata for infinite alphabets uses the notion of nominal sets.
Consider now an infinite alohabet A = {a, b, c, d, … }. The language L1 becomes {aa, bb, cc, dd, …}. Classical theory of finite automata does not apply to this kind of languages, but one may draw an infinite deterministic automaton that recognizes L1. This automaton ostensibly have infinitely many states, but the set of states can be finitely presented in a way open to effective manipulation. More specifically, in a nominal automaton the set of states is subject to an action of permutations of a set of atoms, and it is finite up to that action.

Preliminaries

Bojańczyk M, Klin B, Lasota S. Automata theory in nominal sets[J]. Logical Methods in Computer Science, 2014, 10(3).

Related Approach

Related Approach	-	Title
Moerman et al.	2017	Learning nominal automata

2.8 Timed Automata

In automata theory, a timed automaton is a finite automaton extended with a finite set of real-valued clocks. During a run of a timed automaton, clock values increase all with the same speed. Along the transitions of the automaton, clock values can be compared to integers. These comparisons form guards that may enable or disable transitions and by doing so constrain the possible behaviors of the automaton. Further, clocks can be reset. Timed automata are a sub-class of a type hybrid automata.

Formally, a timed automaton is a tuple A = (Q,Σ,C,E,q0) that consists of the following components:

Q is a finite set. The elements of Q are called the states of A.
Σ is a finite set called the alphabet or actions of A.
C is a finite set called the clocks of A.
E ⊆ Q × Σ × B© × P© × Q is a set of edges, called transitions of A, where
- B© is the set of boolean clock constraints involving clocks from C, and
- P© is the powerset of C.
q0 is an element of Q, called the initial state.
An edge (q,a,g,r,q’) from E is a transition from state q to q’ with action a, guard g and clock resets r.

Preliminaries

Alur R, Dill D L. A theory of timed automata[M]. Elsevier Science Publishers Ltd. 1994.
Bengtsson J, Yi W. Timed Automata: Semantics, Algorithms and Tools[J]. Lectures on Concurrency & Petri Nets, 2004, 3098:87-124.

Related Approach

Related Approach	-	Title
Maier et al.	2014	Online passive learning of timed automata for cyber-physical production systems

2.9 Weighted Automata

Weighted finite automata (WFA) are finite automata whose transitions and states are augmented with some weights, elements of a semiring. A WFA induces a function over strings. The value it assigns to an input string is the semiring sum of the weights of all paths labeled with that string, where the weight of a path is obtained by taking the semiring product of the weights of its constituent transitions, as well as those of its origin and destination states.

Preliminaries

Mohri M. Weighted Finite-State Transducer Algorithms. An Overview[M]// Formal Languages and Applications. Springer Berlin Heidelberg, 2004:551-563.
Mohri M. Weighted Automata Algorithms[M]// Handbook of Weighted Automata. 2009:213-254.

Related Approach

Related Approach	-	Title
Balle et al.	2015	Learning Weighted Automata

2.10 Hybrid Automata

In automata theory, a hybrid automaton (plural: hybrid automata or hybrid automatons) is a mathematical model for precisely describing systems in which digital computational processes interact with analog physical processes. A hybrid automaton is a finite state machine with a finite set of continuous variables whose values are described by a set of ordinary differential equations. This combined specification of discrete and continuous behaviors enables dynamic systems that comprise both digital and analog components to be modeled and analyzed.

An Alur-Henzinger hybrid H comprises the following components:

A finite set X = {x_1, …, x_n} of real-numbered variables. The number n is called the dimension of H. Let dot(X) be the set {dot(x_1), . . . , dot(x_n)} of dotted variables that represent first derivatives during continuous change, and let X’ be the set {x’_1, …, x’_n} of primed variables that represent values at the conclusion of discrete change.
A finite multidigraph (V, E). The vertices in V are called control modes. The edges in E are called control switches.
Three vertex labeling functions init, inv, and flow that assign to each control mode v ∈ V three predicates. Each initial condition init(v) is a predicate whose free variables are from X. Each invariant condition inv(v) is a predicate whose free variables are from X. Each flow condition flow(v) is a predicate whose free variables are from X∪dot(X).

So this is a labeled multidigraph.

An edge labeling function jump that assigns to each control switch e ∈ E a predicate. Each jump condition jump(e) is a predicate whose free variables are from X∪X’.
A finite set Σ of events, and an edge labeling function event: E → Σ that assigns to each control switch an event.

Preliminaries

Henzinger T A. The Theory of Hybrid Automata[M]// Verification of Digital and Hybrid Systems. Springer Berlin Heidelberg, 2000:278-292.

Related Approach

Related Approach	-	Title
Medhat et al.	2015	A framework for mining hybrid automata from input/output traces

2.11 Symbolic Automata (Sigma3 TBD)

The following content comes from http://pages.cs.wisc.edu/~loris/symbolicautomata.html

Classic automata theory builds on the assumption that the alphabet is finite. Unfortunately, practical applications such as XML processing and program trace analysis use values for individual symbols that are typically drawn from an infinite domain. Even when the alphabet is finite, classic automata may sometimes be a bad choice: for example a deterministic finite automata modelling a language over the UTF16 alphabet requires 2^16 transitions out of each state!

What are Symbolic Automata and Transducers?

Symbolic Finite Automata (SFAs) are finite state automata in which the alphabet is given by a Boolean algebra that may have an infinite domain, and transitions are labeled with first-order predicates over such algebra. For example a symbolic automaton (shown on the right) can define the following property:

OddG1 = {l | l is a list of odd numbers with length greater than 1}

In order for SFAs to be closed under Boolean operations and preserve decidability of equivalence, it should be decidable to check whether predicates in the algebra are satisfiable. In the example above predicates are expressed in Presburger arithmetic which is indeed a decidable theory closed under Boolean operations. Symbolic Finite Transducers (SFTs) extend SFAs to output lists. In a SFT transitions, upon reading an input symbol, can compute an output that is expressed as a function of the input being read. Such a function has to belong to the underlying alphabet theory. Many variants of SFAs and SFTs have been proposed, and this page tries to keep up with such extensions.

How do they relate to classic Automata?

Symbolic Finite Automata are strictly more expressive than deterministic finite automata. Despite this fact, Symbolic Finite Automata are closed under Boolean operations and admit decidable equivalence. In general for large alphabets Symbolic Automata outperforms their classic counterpart. In fact even complex regular expressions over UTF16 can be analyzed using symbolic automata.

References

We recommend reading this paper to get started. You can also watch this talk. The purpose of this page is to keep track of the latest results related to this topic. Email me (loris at cs.wisc.edu) with comments and/or suggested additions.