SAGE is a deterministic and unsupervised learning pipeline that can generate attack graphs from intrusion alerts without input knowledge from a security analyst. Using a suffix-based probabilistic deterministic finite automaton (S-PDFA), the system compresses over 1 million alert
...
SAGE is a deterministic and unsupervised learning pipeline that can generate attack graphs from intrusion alerts without input knowledge from a security analyst. Using a suffix-based probabilistic deterministic finite automaton (S-PDFA), the system compresses over 1 million alerts into less than 500 attack graphs (AGs), which are concise and manageable. Unlike other frequency analysis methods, SAGE does not discard infrequent high-severity alerts, which are crucial for learning the penetration strategies of attackers. This paper compares the baseline algorithm (i.e. S-PDFA) with a modelling assumption generated by swapping the S-PDFA with a PDFA. The aim is to validate the quality of SAGE and propose possible solutions for PDFA usage, allowing the algorithm to generate AGs in real-time. We compare them both quantitatively and qualitatively using size, complexity, completeness and interpretability metrics. Our findings show that AGs generated by the PDFA are more readable and as complete while being slightly larger (i.e. 16% larger) than the baseline S-PDFA. In certain cases, it can also better capture different attack strategies, proving that, if further optimized, it can perform better than the baseline.