The use of Internet of Things (IoT) devices has experienced an increase since its inception and is expected to continue to do so. However, this growth has also attracted individuals with malicious intentions. Botnet attacks on IoT devices have become more potent each year, exploi
...
The use of Internet of Things (IoT) devices has experienced an increase since its inception and is expected to continue to do so. However, this growth has also attracted individuals with malicious intentions. Botnet attacks on IoT devices have become more potent each year, exploiting new vulnerabilities and attacking more devices. Therefore, it is imperative to improve countermeasures. N-BaIoT is a frequently used dataset that covers botnet attacks in various stages of the botnet life cycle. Nevertheless, when examining the state-of-the-art utilizing the dataset, there are certain limitations that need to be addressed.
One limitation is the lack of detailed feature analysis in most studies. This results in less comprehension of the behavior of the malicious and benign data in the dataset, leading to a lack of feature optimization. Feature optimization is crucial as it improves the computational time of the model and makes it efficient to deploy in real-life applications. Another limitation is the uneven distribution of the malicious and benign data, resulting in unreliable evaluation scores. This issue has not been addressed in many studies.
The main contribution of this thesis is the development of a hybrid ensemble model to detect IoT botnet attacks faster and accurately. Additionally, the aim of this study is to provide a clear analysis of the behavior of the malicious and benign data and optimize the number of selected features. In the hybrid ensemble model, a minimal number of features will be utilized to evaluate performance. The comparison will be achieved by taking into account both the unbalanced and balanced datasets used for training and testing. By considering both datasets, the limitation of the distribution can be highlighted by examining the distinct performance of each dataset, making the comparison extensive and reliable.
The results indicate that the selected features and detection models proposed in this research outperform those of other studies. In both cases using the balanced and unbalanced sets, the performance score and computational time are improved. In addition, this is achieved by using fewer features compared to several studies.