Defending against Malicious Websites: Themed Threats, Detection, and Law-Enforcement
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Malicious websites have become a main cyber threat. Despite the substantial effort made by researchers and practitioners, some fundamental problems regarding effective defenses against these attacks remain open, such as: What are the emerging trends of malicious websites? How should we cope with the new trends? What do we need to do to help law-enforcement deal with them? This dissertation addresses these problems by making three contributions. The first contribution is to characterize emerging threats of themed malicious websites, which represent one trend as evidenced by the many malicious websites exploiting the COVID incident. The characterization offers a deep understanding of the attacks, which leads to the second contribution, namely the investigation of how to detect the emerging themed malicious websites exploiting the COVID-19 incident. While the preceding two contributions are from a purely technological point of view, the third contribution investigates the gap between technology and law-enforcement with respect to malicious websites. Understanding and addressing the gap is essential because we anticipate that the law-enforcement eventually needs to be involved in dealing with malicious websites, if not already. For this purpose, we focus on investigating how to support the law-enforcement dealing with blacklisted websites while highlighting two important factors: one is the trustworthiness of Machine Learning methods in predicting malicious websites, which is important because blacklists are not perfect; and the other is how to interpret or explain the individual predictions (possibly in the court), which existing black-box ML models can not provide. The resulting methodology could be leveraged to cope with malicious websites towards ultimately eliminating, or at least adequately mitigating, them. Finally, we present case studies with real-world datasets to show the usability and efficacy of the proposed methods.