Security Operations Center (SOC) analysts investigate thousands of intrusion alerts on a daily basis, leading to alert fatigue and reduced productivity [1]. While alert correlation techniques help reduce the volume of alerts, they do not show the bigger picture of how the attack
...
Security Operations Center (SOC) analysts investigate thousands of intrusion alerts on a daily basis, leading to alert fatigue and reduced productivity [1]. While alert correlation techniques help reduce the volume of alerts, they do not show the bigger picture of how the attack happened. Attack graphs (AG) are visual models of attacker strategies. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities, based on network scans and expert knowledge [3]. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose to learn AGs, purely based on the actions observed through intrusion alerts. In this paper, we develop an unsupervised sequence learning system, called SAGE (IntruSion alert-driven Attack Graph Extractor)3. It constructs alert-driven AGs without any expert input. These AGs unlock a new means to derive intelligence regarding attacker strategies without having to investigate thousands of intrusion alerts. Class imbalance remains a major challenge for machine learning-enabled attacker strategy identification – severe alerts are infrequent, while non-severe alerts (related to network scans) are very frequent. This makes most machine learning solutions inherently unsuitable, since they discard infrequent behavior. Instead, we learn an interpretable suffix-based probabilistic deterministic finite automaton (S-PDFA) using the FlexFringe automaton learning framework [4]. We tune the learning algorithm and transform the alert data such that the resulting model accentuates infrequent severe alerts, without discarding any lowseverity alerts. The model summarizes attack paths leading to severe attack stages. It can distinguish between alerts with the same signature but different contexts, i.e., scanning at the start and scanning midway through an attack are treated differently, since they indicate different attack stages. Targeted attack graphs are extracted from the S-PDFA on a per-victim, per-objective basis. Tested with intrusion alerts collected through Collegiate Penetration Testing Competition [2], we evaluate SAGE’s efficacy on distributed, multi-stage attack @en