1- Automated Code Review: Empirical Evidence from Experiments and Industry
Umut Cihan
Master Student
(Supervisor: Asst.Prof.Eray Tüzün) Computer Engineering Department
Bilkent University
Abstract: Code reviews are essential for software quality. Advances in large language models (LLMs) have enabled AI-powered code reviews, but their reliability and impact on the industry remain unclear. This study evaluates LLMs for detecting code correctness and suggests improvements while assessing the adoption of AI-assisted code review tools in practice. The thesis consists of two studies: an experimental evaluation of LLMs in code review tasks and a case study on real-world AI-assisted code reviews. In the experiment, GPT-4o and Gemini 2.0 Flash were tested on 492 AI-generated code blocks and 164 HumanEval benchmark blocks. The models assessed correctness and suggested fixes, with GPT-4o achieving 68.50% accuracy in classification and 67.83% in corrections, outperforming Gemini 2.0 Flash (63.89% and 54.26%). Performance dropped without problem descriptions and varied across code types. The case study examined an AI-assisted code review tool based on Qodo PR Agent deployed to 238 practitioners across ten projects. The analysis focused on 4,335 pull requests, of which 1,568 received automated reviews. Developers engaged with 73.8% of AI-generated comments, though pull request closure time increased. Surveys indicated minor improvements in code quality but highlighted issues such as faulty suggestions and increased review time. LLM-based code reviews aid in detecting issues and improving code but risk errors. A “Human-in-the-loop” approach is proposed to balance automation with oversight. Despite challenges, AI-assisted reviews enhance bug detection and code awareness, offering valuable, albeit imperfect, integration into software development workflows.
DATE: April 10, Thursday @ 13:30 Place: EA 409
2- Assessing Software Evolution with the Stickiness Score: Evaluating Code Persistence Across Files, Folders, And Developers
Selen Uysal
Master Student
(Supervisor: Asst.Prof.Eray Tüzün) Computer Engineering Department
Bilkent University
Abstract: Software evolution involves continuous code changes, making it essential to understand factors that influence code stability and persistence. This study introduces a metric called the “Stickiness Score”, measuring the longevity of lines of code (LOC) within a project. It reflects how much of the LOC written by developers or belonging to a specific file or folder has persisted over time. The goal is to examine its correlation with various software metrics: contributor count, developer Stickiness Scores (average, commit-weighted average, and LOC-weighted average), cyclomatic complexity, bug-fix count, and static code analysis metrics, including bug and code smell counts. Stickiness Scores for developers, files, and folders are calculated using the tool developed for this study, Devotion, across five open-source projects. Spearman correlation tests were used to analyze the relationship between file Stickiness Scores and the specified software metrics. Contributor count exhibited a strong negative correlation with file Stickiness Scores. Commit- and LOC-weighted developer Stickiness Scores showed positive correlations, while unweighted averages produced mixed results. Cyclomatic complexity, bug-fix count, and code smell count showed inconsistent correlations. The bug counts in files showed no significant correlation. In conclusion, files with more contributors or frequent bug-related changes tend to be less sticky. In contrast, files modified by high-commit or high-volume contributions from developers with higher stickiness tend to persist longer. The Stickiness Score provides valuable insight into how contributor activity, code complexity, bugginess, and smells relate to code longevity. Keywords: Software Development, Code Stickiness, Code Survival, Code Churn, Correlation Analysis, Survival Analysis
DATE: April 10, Thursday @ 14:30 Place: EA 409