- Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies
Muhammet Hikmet Şimşir
Master Student
(Supervisor:Asst.Prof.Özgür S. Öğüz) Computer Engineering Department
Bilkent University
Abstract: Behavior cloning with high-capacity generative policies achieves strong imitation performance, but performance is often constrained by limited demonstration coverage and sensitivity to distribution shift. While reinforcement learning can improve task performance, directly fine-tuning large action decoders is often unstable and sample inefficient. We propose Lagrangian Perturbation Diffusion Steering (LP-DS), a lightweight adaptation method that improves a frozen generative policy while preserving its multimodal structure. LP-DS learns a compact noise-space perturbation module that shifts Gaussian noise inputs before decoding, enabling policy improvement without modifying the action decoder. To prevent off-manifold latent queries and unstable denoising dynamics, we optimize this module with a Lagrangian trust-region objective that maximizes downstream value while constraining perturbation magnitude, yielding stable and sample-efficient learning. Across RoboMimic manipulation, OpenAI Gym locomotion, and Adroit dexterous manipulation benchmarks, LP-DS improves sample efficiency, success, and return while maintaining diverse behavior, as quantified by higher action-space entropy using the Kozachenko–Leonenko k-nearest neighbor estimator, with return improvements of up to 25% over prior baselines. Project page: https://sites.google.com/view/lp-ds/home
DATE: April 13, Monday @ 15:30 Place: EA 502
2. Understanding the Limits of Automated Evaluation for Code Review Bots in Practice
Utku Boran Torun
Master Student
(Supervisor:Assoc.Prof.Eray Tüzün) Computer Engineering Department
Bilkent University
Abstract: Automated code review (ACR) bots are increasingly used in industrial software development to assist developers during pull request (PR) review. As adoption grows, a key challenge is how to evaluate the usefulness of bot-generated comments reliably and at scale. In practice, such evaluation often relies on developer actions and annotations that are shaped by contextual and organizational factors, complicating their use as objective ground truth. We examine the feasibility and limitations of automating the evaluation of LLM-powered ACR bots in an industrial setting. We analyze an industrial dataset from Beko comprising 2,604 bot-generated PR comments, each labeled by software engineers as fixed/wontFix. Two automated evaluation approaches, G-Eval and an LLM-as-a-Judge pipeline, are applied using both binary decisions and a 0–4 Likert-scale formulation, enabling a controlled comparison against developer-provided labels. Across Gemini-2.5-pro, GPT-4.1-mini, and GPT-5.2, both evaluation strategies achieve only moderate alignment with human labels. Agreement ratios range from approximately 0.44 to 0.62, with noticeable variation across models and between binary and Likert-scale formulations, indicating sensitivity to both model choice and evaluation design. Our findings highlight practical limitations in fully automating the evaluation of ACR bot comments in industrial contexts. Developer actions such as resolving or ignoring comments reflect not only comment quality, but also contextual constraints, prioritization decisions, and workflow dynamics that are difficult to capture through static artifacts. Insights from a follow-up interview with a software engineering director further corroborate that developer labeling behavior is strongly influenced by workflow pressures and organizational constraints, reinforcing the challenges of treating such signals as objective ground truth.
DATE: April 13, Monday @ 15:50 Place: EA 502
3.Characterization of Structural vVariation through Assembly-to-Assembly Comparison
Muhammet Rafi Çoktalaş
Master Student
(Supervisor:Assoc.Prof.Can Alkan) Computer Engineering Department
Bilkent University
Abstract: Motivation: Structural variations (SVs) are genomic differences spanning more than 50 nucleotides that drive evolution and significantly influence human health, underlying conditions such as autism, schizophrenia, and cancer. With the recent availability of high-quality de novo assemblies, SV discovery is shifting from short-read alignment to assembly-to-assembly comparison. However, existing assembly-based tools typically rely on Whole Genome Alignment (WGA), a computationally resource-intensive process that scales poorly to large datasets. Consequently, there is an urgent need forefficient algorithms that can characterize SVs without the overhead of full alignment. Results: We introduce STRiVE, a linearithmic-time algorithm that utilizes genome assembly sketches—rather than whole-genome alignments—to identify insertions, deletions, and inversions. Inspired by optical mapping, STRiVE treats sketches as sparse genomic landmarks to rapidly detect structural discrepancies. We evaluated STRiVE on simulated data based on the human reference (GRCh38) and real data from the Telomere-to-Telomere CHM13 assembly. STRiVE characterizes SVs in less than one minute per chromosome, including preprocessing steps. Our algorithm achieved a precision and recall of over 90%. While performance for insertions and deletions decreased in regions containing segmental duplications, STRiVE maintained robust performance for inversion discovery, with recall remaining above 90% even in complex scenarios in simulated data sets. We also tested STRiVE to characterize large SVs in the CHM13 data set, and STRiVE achieved the strongest overall recovery on the validated benchmark, detecting 12 of 15 insertions, 4 of 5 deletions, and 5 of 6 inversions. Availability: STRiVE is available at https://github.com/BilkentCompGen/strive
DATE: April 13, Monday @ 16:30 Place: EA 502