Information Retrieval and Semantic Inference from Natural Language Privacy Policies
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Several state laws, along with app markets, such as Apple's App Store and Google Play, require app developers to provide users with legal privacy notices (privacy policy) containing critical requirements that inform users about what kinds of personal information is collected, how the data is used, and with whom the data is shared. Because privacy policies consist of legal terms often written by a legal team without rigorous insight into the app source code, and because the policy and app code can change independently, privacy policies become misaligned with the actual data practices. In addition to misinforming users, such inconsistencies between policies and data practices can have legal repercussions. The goal of this work is to capture and formalize the semantics of natural language privacy policies into a knowledge base that can actuate (1) transparent software implementation; and (2) shared understanding between policy authors, app developers, and regulators. Constructing an empirically valid knowledge base (i.e., privacy policy ontology) is a challenging task since it should be both scalable and consistent with multi-user interpretations.
This work focuses on formal representation of privacy policy semantics by applying grounded theory, natural language patterns, and neural networks on terminology of privacy policies. Further, the application of formal ontologies in privacy misalignment detection frameworks is discussed.