The behavior of software systems can be modeled as state machines by looking at the log data from these systems. Conventional algorithms, such as L∗, however, require too much memory to process log data when it gets too large. These algorithms must first load all available data i
...
The behavior of software systems can be modeled as state machines by looking at the log data from these systems. Conventional algorithms, such as L∗, however, require too much memory to process log data when it gets too large. These algorithms must first load all available data into memory, which is often way too much.
The approach investigated to deal with this large amount of data is to load the data instead in an on-disk database. The thesis researches the viability of this approach and proposes an algorithm for it.
When log data is present in a database, a state machine can be constructed iteratively by gradually querying more and more data from the database. Each time more data is gathered, partial data can be used to construct a hypothesis for the state machine in question. Such an approach is thus a combination of an active learning part that queries more data and an passive learning part that constructs the state machine from partial data.
Database technologies can contribute to what such an algorithm should look like. Given the vast landscape of different database technologies, some can shape what kinds of queries can be more efficiently executed. This should be taken into account when constructing an algorithm that queries data from a database to construct a state machine.
This thesis starts with an overview of the existing state machine learning algorithms and relevant database technologies. Then a new algorithm is roposed and it is tested on different datasets and compared with existing algorithms.