Precise code differencing
A clear understanding of software modification history is beneficial for creating code that is less error prone and projects that are timelier. Accurate software code differencing has been a long term problem for software engineers as noted by the many ongoing efforts to improve results. Software is a hierarchical data structure, requiring hierarchical analysis, for which simple text differencing is often not satisfactory. Many attempts have been made to solve the problem of differencing code at an abstract or data structure level, including the software's abstract syntax trees (ASTs).
Research on differencing from program trees has been done with cdiff, an unnamed partial match tool, and ChangeDistiller. These show promise, however they do not clearly define the accuracy of their results and, in the case of ChangeDistiller, use several heuristics to increase efficiency, which can also degrade accuracy. To overcome these problems, ZhangShashaComparison was developed as an Eclipse Plug-in tool that uses the Zhang Shasha tree differencing algorithm for AST analysis. It yields optimal and accurate results with respect to tree edit distances. A new algorithm was devised to extract the edit script from the Zhang Shasha output and report the code differences in a clear and precise manner to the end user. Thus, comparing the abstract syntax trees of two versions of source code files, hierarchical code changes can be precisely identified and reported, providing a potentially practical approach for software engineers to produce better code in faster time.