Insider Threat Detection: Automating Psychological Assessment
Insiders pose a demonstrable risk to organizations through their often privileged access, which helps to facilitate inconsistently monitored and predicted abuses. The risk that insider threats pose to organizations has reached a point of salience that has prompted governing entities to establish a de facto group tasked with the establishing and continued maintaining of best defensive practices in response to the risks that insider threats represent. Likewise, already-established streams of academic research associated with tangential concepts and concerns, such as cyber security and information security, have begun to explore the psychological aspects and corollaries of insider threats, revealing subclinical psychopathy as an identifying personality trait for insider threat potential. This research attempts to anticipate the relevance and utility of this emerging research stream by examining existing measurement processes for related personality measures and developing a less obtrusive measurement approach that could be utilized by organizations. Applicability to a broad range of organization contexts requires the use of a commonly-available dataset. As a result, spoken language from interviews is used to derive text features that are evaluated by a set of machine learning models using support vector machine, random forest, and neural network techniques. Data was collected from 204 Amazon Mechanical Turk workers covering sets of psychopathy measures (EPA-SF and the Maasberg Modified SD3) as well as recorded responses to two job interview questions. Initial analysis revealed a subversive group of respondents that circumvented the collection of their job interview responses by providing silent audio. Analysis of differences in psychopathy measures between the subversive and non-subversive groups revealed significant differences with the subversive group having higher psychopathy scores in both instruments. Evaluation of automatic transcription efforts revealed significant limitations in automatic transcription techniques that necessitated manual transcription of the completed responses. Text features were extracted from the manual transcriptions. Machine learning model building was conducted in two phases, with the first phase resulting in 17040 models and a second phase building an additional 25560 models, resulting in a total of 42600 models at the end of the study. The findings of this study demonstrate validity in the use of machine learning to facilitate the automated assessment of behavioral traits using job interview responses and basic demographic data. Notably, the addition of spectral features provided significant improvement in average model efficacy. The contributions of this study include a demonstration that automated psychological assessment using commonly available data is a viable alternative to traditional assessment instruments. This study also contributes to machine learning research by demonstrating a model development process that allows models to be compared over a wide range of random arrangements of datasets by seeding the random arrangements in a repeated fashion for each technique and hyper parameter set. Finally, this study contributes to research methodology by providing evidence that machine learning processes developed with input feature spaces informed by theory outperform those developed with a broader input feature space that fully encompassed the informed input feature space.