Exploring the Vulnerability of the State-of-the-Art Content Moderation Image Classifiers Against Adversarial Attacks

Date

2023

Authors

Seong, Andrew

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The goal of this research is to assess and describe the vulnerabilities of deep-learning based image classifiers in the context of content moderation. While similar assessments have been made on adversarial attacks involving covering up offending parts of images in order to bypass computer vision-based content moderation, no work has been done around assessing the effectiveness of the more sophisticated adversarial attacks that does not alter the context of the images. In order to achieve this, I study the effect of various adversarial attacks in different strengths and their combinations on the classification accuracy of various state-of-the-art content moderation APIs designed to classify pornographic images employed by online social media platforms. The discovered weaknesses have been shared with respective online social media platforms to alert them to their weaknesses.

Description

Keywords

Deep-learning, Online social media, Computer vision, Adversarial attacks

Citation

Department

Computer Science