Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
TYAGI, N.; JOSHI, R.; DAS, S.; SIKORA, P.; MYŠKA, V.; DUTTA, M.
Originální název
Reinforcement Learning for Mathematical Reasoning in Small-Scale Language Models with Structured Policy Optimization
Anglický název
Druh
Stať ve sborníku v databázi WoS či Scopus
Originální abstrakt
Advancing mathematical reasoning in small-scale language models remains a challenge due to their limited capacity and the high computational demands of standard reinforcement learning methods like Proximal Policy Optimization (PPO). To address this, a resource-efficient training pipeline based on Group Relative Policy Optimization (GRPO) is proposed, enabling fine-tuning of compact models under strict memory constraints. The proposed method introduces structured prompting to explicitly separate reasoning steps from final answers and applies a dual reward system to jointly optimize for format adherence and mathematical correctness. The training incorporates low-overhead techniques such as 8-bit optimization, mixed-precision training, gradient checkpointing, and accelerated decoding for efficient rollout and policy updates. Experimental results show a 50.95% accuracy on a benchmark reasoning dataset—GSM8K, outperforming several larger models—while training entirely on a single GPU. These findings demonstrate that small-scale models, when trained with structured reinforcement learning, can achieve competitive performance in mathematical reasoning tasks. The approach offers a practical pathway for deploying interpretable, reasoning-capable models in low-resource environments.
Anglický abstrakt
Klíčová slova
Emotion Flip Reasoning, Therapeutic Dialogue Modelling, Emotional Trajectory Prediction, Transformer
Klíčová slova v angličtině
Autoři
Rok RIV
2026
Vydáno
05.11.2025
Nakladatel
IEEE
Místo
Florence, Italy
ISBN
979-8-3315-7675-2
Kniha
2025 17th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)
Periodikum
International Congress on Ultra Modern Telecommunications and Workshops
Stát
Spojené státy americké
Strany od
272
Strany do
277
Strany počet
6
URL
https://ieeexplore.ieee.org/document/11268643
BibTex
@inproceedings{BUT200028, author="{} and Rakesh Chandra {Joshi} and {} and Pavel {Sikora} and Vojtěch {Myška} and Malay Kishore {Dutta}", title="Reinforcement Learning for Mathematical Reasoning in Small-Scale Language Models with Structured Policy Optimization", booktitle="2025 17th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)", year="2025", journal="International Congress on Ultra Modern Telecommunications and Workshops", pages="272--277", publisher="IEEE", address="Florence, Italy", doi="10.1109/ICUMT67815.2025.11268643", isbn="979-8-3315-7675-2", url="https://ieeexplore.ieee.org/document/11268643" }