MATH Semineri: “Mathematical Foundations of DeepSeek: A Reinforcement Learning Perspective”, Berkay Anahtarcı, 17:00 24 Şubat 2025 (EN)

You are cordially invited to the Analysis Seminar organized by the Department of Mathematics.

Speaker: Berkay Anahtarcı (Özyeğin University)

“Mathematical Foundations of DeepSeek: A Reinforcement Learning Perspective.”

Abstract: This talk explores the mathematical foundations of DeepSeek R1, an RL-only model designed for complex reasoning. Unlike traditional supervised fine-tuning, DeepSeek R1 leverages Group Relative Policy Optimization (GRPO), a novel method that stabilizes Proximal Policy Optimization (PPO) without a critic. GRPO enhances chain-of-thought reasoning by structuring problem-solving into sequential steps. I will analyze its theoretical properties and implications for reasoning-driven reinforcement learning.

Date: Monday, February 24, 2025
Time: 17:00-18:00, GMT+3
Place: ZOOM

This is an online seminar. To request the zoom link, please send a message to goncha@fen.bilkent.edu.tr

Bilgi için