Tackling Build Failures in Continuous Integration

Date
2020
Authors
Hassan, Foyzul
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

In popular continuous integration(CI) practice, coding is followed by building, integration and system testing, pre-release inspection and deploying artifacts. This can reduce integration time and speed up the development process. At CI environment, dedicated infrastructure with different build systems such as Make, Ant, Maven, Gradle, Bazel, etc. are used to automate CI tasks like compilation, test invocation, etc. For continuous integration, developers describe build process though build scripts such as build.xml for Ant, pom.xml for Maven, and build.gradle for Gradle. But with the growing functionality and CI requirement, build scripts can very complex and may require frequent maintenance. Meanwhile, a large number of continuous integration failures may interrupt normal software development so that they need to be repaired as soon as possible. According to the TravisTorrent dataset of open-source software continuous integration, 22% of code commits include changes in build script files to maintain build scripts or to resolve CI failures.CI failures bring both challenges and opportunities to fault localization and program repair techniques. On the challenge side, unlike traditional fault localization scenarios (e.g., unit testing, regression testing) where only source code needs to be considered, CI failures can also be caused by build configuration errors and environment changes. Actually, the CI practice has made it a necessity for developers to automate the software build process with build scripts and configuration files. As a result, the build maintenance needs a substantial amount of developer efforts, and developer's carelessness may lead to defects in build scripts also. On the opportunity side, the scenario of continuous integration provides rich code commit history and build logs from previous passing builds and current failing builds. Such information sources are often not available in other application scenarios. Taking advantage of these additional information sources, we may be able to further improve the accuracy of automatic repair and fault localization for CI failures. Automated program repair techniques have great potential to reduce the cost of resolving software failures, but the existing techniques mostly focus on repairing source code so that they cannot directly help resolving software CI failures. To address this limitation, we proposed the first approach to automatic patch generation for build scripts, using fix patterns automatically generated from existing build script fixes and recommending fix patterns based on build log similarity. Apart from build script, CI failures are often a combination of test failures and build failures, and sometimes both source code and various build scripts need to be touched in the repair of one failure. To address this issue, we proposed a unified technique to localize faults in both source code and build scripts given CI failures. Adopting the information retrieval (IR) strategy, UniLoc locates buggy files by treating source code and build scripts as documents to search, and by considering build logs as search queries. However, instead of naively applying an off-the-shelf IR technique to these software artifacts, for more accurate fault localization, UniLoc (Unified Fault Localization) applies various domain-specific heuristics to optimize the search queries, search space, and ranking formulas. However, UniLoc can localize faults up to file level and limited to find faults in source code and build scripts. In the future, we are planning to expand our fault localization approach to the source code and build script block level to assist developers and automatic repair approaches better. Finally, beyond source code and build scripts, there are also other types of files to be involved during software repair, especially in other scenarios. For example, in the fault localization of web applications, we need to consider Html files, css files, client-side JavaScript files and server side source code. We plan to adapt our technique to more scenarios with heterogeneous bug locations. Proposed CI fault localization technique with program repair can be a handy solution to fix a range of CI failures.

Description
This item is available only to currently enrolled UTSA students, faculty or staff.
Keywords
Continuous Integration, Fault Localization, Program Repair, Software Build Process
Citation
Department
Computer Science