Advanced Engineering Informatics, cilt.62, 2024 (SCI-Expanded)
The predictive performance of machine learning (ML) models is challenged when trained on class imbalance real-world construction datasets, reducing the accuracy of relevant decisions. In construction projects, the collection of a balanced dataset is not always feasible. Here, the integration of generative and prediction models holds potential, synthesizing the underrepresented class and configuring a balanced input dataset. This study improves the performance of construction prediction models through the integration of a generative model that augments the dataset for the underrepresented class. For this, a variational autoencoder (VAE) was integrated into a multi-head graph attention network (GAT), whereby a comprehensive construction productivity dataset was collected across different projects related to different construction activities, each with a particular structure and level of class imbalance. Balancing the class distribution led to a significant increase in the predictive performance of the GAT model, where accuracy jumped from 90.6 % to 92.5 %, 81.1 % to 94.4 %, and 92.2 % to 95.4 % when trained on finishing, concrete, and insulation activity networks, respectively.