Reducing False Negatives in Taint Analysis via Hybrid Source Inference




Zhang, Xueling

Journal Title

Journal ISSN

Volume Title



Mobile applications (apps) are widely used and frequently process sensitive data, such as a user's current location, health information, or dating preferences. Because of their access to sensitive data, mobile applications have made privacy a well-known challenge in the ecosystem. Users are usually unaware of, and have little control over, what and how their data is collected, stored and transmitted.

To assess the data practices of mobile apps, research community has made significant efforts in developing data flow analysis tools that can be applied to app code. These tools are designed to detect tainted information flows from sources, which allow access to sensitive data (e.g., the user's location), to sink, which are potential channels through which sensitive data could be leaked to an adversary (e.g., a network connection).

The data flow analysis tools require a list of sources and sinks as input and are generally classified into two categories : 1) Dynamic taint analysis, which tracks taint flow during runtime. The effectiveness of dynamic analysis is limited by the execution coverage and run-time overhead. 2) Static taint analysis, which inspects code without running it, is theoretically conservative and intend to detect all possible taint flows. However, it often reports false negatives due to 1) static inaccessibility. Static code analysis does not have access to the code that is only visible or determinable during run time execution, such as reflection, dynamically loaded code, native code, code executed on a remote server and so on; 2) incomplete sources, no matter how good the tool is, it can only guarantee data security when its list of sources and sinks is complete. If a source is missing, a malicious app can retrieve this data without being detected by the analysis tool. Earlier research efforts in this area have primarily focus on tracing the sensitive data extracted from Android device through Android platform APIs with little works regarding sensitive data extracted through methods defined by apps or third-party libraries, which are also sources.

This thesis aims to reduce the false negatives in static taint analysis by uncovering sensitive data access through methods defined by app and third-party libraries other than Android platform APIs. Specifically, we utilize hybrid (combined static and dynamic) code analysis and machine learning techniques to detect such sensitive data access. The exposed data access can be used as 1) intermediate sources of Android API source whose taint path was interrupted by static inaccessibility code. 2) a new type of source that may lead to leaks.


This item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.




Computer Science