New Doctor of Science
- May 20, 2026
- Bojana Rauker
- 0
Today, May 20, 2026, doctoral candidate Antonio Tolić successfully defended his doctoral dissertation titled “Gradient Calibration in LSTM Networks for Enhanced Learning Efficiency.”
The defense committee consisted of Prof. Dr. Biljana Mileva Boshkoska and Assoc. Prof. Dr. Panče Panov.
He completed his doctoral dissertation under the supervision of Assoc. Prof. Dr. Sandro Skansi.
Abstract
Recurrent Neural Networks (RNNs), most notably Long ShortTerm Memory Networks (LSTMs),have established their efficacy across a wide range of sequential data tasks, especially in applications demanding precise modeling of dependencies emerging from the inherent order of the data and complex behavior patterns. Despite substantial advances in the development of LSTM architectures, processing sequences with longterm dependencies remains nontrivial, as gradients may still vanish or grow to numerically unstable magnitudes when propagated across many time steps. In this context, a new approach to alleviating these difficulties is introduced, in which an LSTM architecture integrates Chrono Initialization (CI) with Layer Normalization (LN) to calibrate gradient propagation and more effectively support the learning of longrange dependencies. CI ensures that the gradients are neither too small nor too large, reducing the likelihood of both vanishing and exploding gradients and thereby enabling stable learning over long sequences. LN further contributes to robustness, leading to more consistent training dynamics and improved model performance across different sequence lengths and under varying input conditions, including shifts in data distribution and scale. The proposed approach was evaluated against LSTM baselines with and without CI applied to the forgetand inputgate biases. In addition, several ablation variants were constructed to isolate the contribution of individual components of the proposed design. All model variants were evaluated on a diverse set of sequential learning tasks, covering multiple task formulations and distinct hyperparameter settings. Throughout this evaluation, the proposed approach consistently demonstrated performance gains over all baselines, yielding greater predictive capability and lower validation loss. Additionally, the approach contributed to more efficient training, achieving faster convergence while preserving strong generalization performance across different tasks and datasets. Its versatility was demonstrated on classification, regression, and sequence generation tasks. Overall, the proposed enhancements improve longterm dependency modeling and yield more stable training dynamics in LSTMs, thereby addressing the aforementioned gradientrelated difficulties. Formal analysis offers deeper insights into the underlying processes involved, thus establishing a robust basis for subsequent improvements in sequential data modeling.
Congratulations!

