A Reinforcement Learning Guided Oppositional Mountain Gazelle Optimizer for Time–Cost–Risk Trade-Off Optimization Problems

Eirgash, Mohammad; Tiang, Jun-Jiat; Ateş, Bayram; Sharma, Abhishek; Lim, Wei

doi:10.3390/buildings16010144

A Reinforcement Learning Guided Oppositional Mountain Gazelle Optimizer for Time–Cost–Risk Trade-Off Optimization Problems

Eirgash M. A., Tiang J., Ateş B., Sharma A., Lim W. H.

Buildings, cilt.16, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.3390/buildings16010144
Dergi Adı: Buildings
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Avery, Compendex, INSPEC, Directory of Open Access Journals
Anahtar Kelimeler: construction optimization, Mountain Gazelle Optimizer, opposition-based learning, pareto-front solutions, reinforcement learning, time–cost–risk trade-off problems
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Existing metaheuristic approaches often struggle to maintain an effective exploration–exploitation balance and are prone to premature convergence when addressing highly conflicting time–cost–safety–risk trade-off problems (TCSRTPs) under complex construction project constraints, which can adversely affect project productivity, safety, and the provision of decent jobs in the construction sector. To overcome these limitations, this study introduces a hybrid metaheuristic called the Q-Learning Inspired Mountain Gazelle Optimizer (QL-MGO) for solving multi-objective TCSRTPs in construction project management, supporting the delivery of resilient infrastructure and resilient building projects. QL-MGO enhances the original MGO by integrating Q-learning with an opposition-based learning strategy to improve the balance between exploration and exploitation while reducing computational effort and enhancing resource efficiency in construction scheduling. Each gazelle functions as an adaptive agent that learns effective search behaviors through a state–action–reward structure, thereby strengthening convergence stability and preserving solution diversity. A dynamic switching mechanism represents the core innovation of the proposed approach, enabling Q-learning to determine when opposition-based learning should be applied based on the performance history of the search process. The performance of QL-MGO is evaluated using 18- and 37-activity construction scheduling problems and compared with NDSII-MGO, NDSII-Jaya, NDSII-TLBO, the multi-objective genetic algorithm (MOGA), and NDSII-Rao-2. The results demonstrate that QL-MGO consistently generates superior Pareto fronts. For the 18-activity project, QL-MGO achieves the highest hypervolume (HV) value of 0.945 with a spread of 0.821, outperforming NDSII-Rao-2, MOGA, and NDSII-MGO. Similar results are observed for the 37-activity project, where QL-MGO attains the highest HV of 0.899 with a spread of 0.674, exceeding the performance of NDSII-Jaya, NDSII-TLBO, and NDSII-MGO. Overall, the integration of Q-learning significantly enhances the search capability of MGO, resulting in faster convergence, improved solution diversity, and more reliable multi-objective trade-off solutions. QL-MGO therefore serves as an effective and computationally efficient decision-support tool for construction scheduling that promotes safer, more reliable, and resource-efficient project delivery.