Exploring the Vulnerability of the State-of-the-Art Content Moderation Image Classifiers Against Adversarial Attacks

Seong, Andrew

Exploring the Vulnerability of the State-of-the-Art Content Moderation Image Classifiers Against Adversarial Attacks

Files

aseong_thesis_120823.pdf (1.55 MB)

Date

2023

Authors

Seong, Andrew

Abstract

The goal of this research is to assess and describe the vulnerabilities of deep-learning based image classifiers in the context of content moderation. While similar assessments have been made on adversarial attacks involving covering up offending parts of images in order to bypass computer vision-based content moderation, no work has been done around assessing the effectiveness of the more sophisticated adversarial attacks that does not alter the context of the images. In order to achieve this, I study the effect of various adversarial attacks in different strengths and their combinations on the classification accuracy of various state-of-the-art content moderation APIs designed to classify pornographic images employed by online social media platforms. The discovered weaknesses have been shared with respective online social media platforms to alert them to their weaknesses.

Keywords

Deep-learning, Online social media, Computer vision, Adversarial attacks

Department

Computer Science

Permalink

https://hdl.handle.net/20.500.12588/6293

Collections

Electronic Theses and Dissertations - Open Access

Full item page

Exploring the Vulnerability of the State-of-the-Art Content Moderation Image Classifiers Against Adversarial Attacks

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Department

Permalink

Collections