NeXtStereo: Directionally Driven Channel Expansion Gives Adaptive Real-Time Stereo
Ekin Berk Ekinci
Master Student
(Supervisor: Asst.Prof.Özgür S.Öğüz ) Computer Engineering Department
Bilkent University
Abstract: We present NeXtStereo, a lightweight stereo disparity estimation network designed for real-time depth perception. NeXtStereo builds on Widened ConvNeXtV2 blocks that strengthen cost aggregation while leveraging the scalability and generalization behavior of the ConvNeXt family. In addition, we introduce Directionally Modulated Attention (DMA), a novel attention mechanism that incorporates geometric priors to modulate features using directional cues. Together, these components improve structural detail recovery in challenging regions such as object boundaries, thin structures, and texture-weak areas, without relying on heavy 3D aggregation stacks. We evaluate NeXtStereo on SceneFlow, KITTI 2012/2015, and Middlebury, where it achieves a favorable accuracy/efficiency trade-off among real-time models and improves cross-domain robustness, with NeXtStereo-L achieving the lowest >2px error among the compared methods. We also study adaptation to the MS2 outdoor driving dataset and observe reliable transfer under fine-tuning. Furthermore, NeXtStereo demonstrates strong compatibility with convolutional Low-Rank Adaptation (LoRA), enabling parameter-efficient domain adaptation with improved stability compared to relevant real-time stereo matching baselines. Finally, we analyze selective 3D cost aggregation via a targeted ablation that replaces the first 1/4 scale aggregation block with a 3D ConvNeXt-style cost aggregation operator, characterizing the resulting accuracy/efficiency trade-offs.
DATE: January 15, Thursday @ 10:00 Place: EA 409