Generating synthetic data with variational autoencoder to address class imbalance of graph attention network prediction model for construction management


Mostofi F., Behzat Tokdemir O., TOĞAN V.

Advanced Engineering Informatics, vol.62, 2024 (SCI-Expanded, Scopus) identifier identifier

  • Publication Type: Article / Article
  • Volume: 62
  • Publication Date: 2024
  • Doi Number: 10.1016/j.aei.2024.102606
  • Journal Name: Advanced Engineering Informatics
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
  • Keywords: Class imbalance, Construction productivity prediction, Data augmentation, Generative model, Graph attention network (GAT), Machine learning (ML), Variational autoencoder (VAE)
  • Karadeniz Technical University Affiliated: Yes

Abstract

The predictive performance of machine learning (ML) models is challenged when trained on class imbalance real-world construction datasets, reducing the accuracy of relevant decisions. In construction projects, the collection of a balanced dataset is not always feasible. Here, the integration of generative and prediction models holds potential, synthesizing the underrepresented class and configuring a balanced input dataset. This study improves the performance of construction prediction models through the integration of a generative model that augments the dataset for the underrepresented class. For this, a variational autoencoder (VAE) was integrated into a multi-head graph attention network (GAT), whereby a comprehensive construction productivity dataset was collected across different projects related to different construction activities, each with a particular structure and level of class imbalance. Balancing the class distribution led to a significant increase in the predictive performance of the GAT model, where accuracy jumped from 90.6 % to 92.5 %, 81.1 % to 94.4 %, and 92.2 % to 95.4 % when trained on finishing, concrete, and insulation activity networks, respectively.