3D Stylization and Editing with Generative Models
İpek Öztaş Master Student
(Supervisor: Asst.Prof.Ayşegül Dündar Boral) Computer Engineering Department
Bilkent University
Abstract: Advancements in large pretrained generative and reconstruction models have significantly improved image and 3D content creation. However, key challenges remain in enabling users to exert precise, efficient control over these models without costly retraining. This thesis addresses these gaps through three interconnected studies, treating the internal representations of pretrained models—attention features, latent velocities, and positional embeddings—as control mechanisms for user-directed editing. First, two complementary approaches to 3D stylization based on large reconstruction models are introduced: a training-free method that injects features from a reference style image into the cross-attention layers of a pretrained reconstruction model, achieving effective stylization without training or test-time optimization, and a trainable extension that addresses the limited fidelity imposed by fixed attention parameters through a dual-path cross-attention architecture and a content-edge preservation loss, fine-tuning a small subset of parameters while preserving the efficiency of the reconstruction backbone. Next, a reference-guided 3D stylization framework operating on structured latent representations is proposed, adjusting predicted velocities at selected timesteps of a pretrained flow-based model using gradients from a style-consistency loss, integrated into an Euler sampling procedure. Finally, a depth-aware object relocation method achieves single-image editing by manipulating the rotary positional embeddings of diffusion transformers, encoding 3D spatial structure for geometry-consistent object motion with coherent occlusion handling, scene completion, and shadow propagation. Together, these contributions enable controllable 3D stylization and image editing.
DATE: June 18, Thursday @ 10:00 Place: EA 409