Automatically Generating User-specific Recovery Procedures after Malware Infections
More Info
expand_more
Abstract
Malware poses a serious security risk in today’s digital environment. The defense against malware mainly relies on proactive detection. However, antivirus products often fail to detect new malware when the signature is not yet available. In the event of a malware infection, the common remediation strategy is reinstalling the system. However, the user loses their personal data, and thus it is not an ideal solution.
The academic works on malware remediation focus on system replay and recovery-oriented computing, which relies on heavy monitoring and is not suitable for a normal user’s personal computer. The work from Paleari et al. [31] proposed a remediation methodology that can be used entirely after the infection. They run the malware sample in the sandbox to observe the behavior and generate a revert operation for each action that modifies the system state. However, the limitation of such an approach is unable to deal with the potentially different behaviors in the sandbox and on the real hosts.
In this work, we propose a system that can generate user-specific recovery procedures, without the need of any monitoring in advance. We extend the work from Paleari et al. [31] by combining information from the infected machine. We first extract the environment configuration from the infected computer and configure the same context to the sandbox virtual machine, in order to eliminate the environmental influence on the malware’s behavior. After getting the behavior from the sandbox, we combine forensic evidence to understand the exact actions that happened on the system and generate the user-specific recovery procedures.
We implement a prototype based on Windows 10 and CAPE sandbox and perform an evaluation on 894 malware samples. We are able to recover 51.3% of the changes made by malware, which doubles the recovery rate compared to directly matching the sandbox result. Additionally, our experiment result also demonstrates significantly different actual behavior from the user’s machine and sandbox result. Our system design maximizes the use of information displayed in the sandbox, but the unshown behavior still leads to the biggest limitation of behavior-based recovery.