Generating synthetic data with variational autoencoder to address class imbalance of graph attention network prediction model for construction management


Mostofi F., Behzat Tokdemir O., TOĞAN V.

Advanced Engineering Informatics, cilt.62, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 62
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.aei.2024.102606
  • Dergi Adı: Advanced Engineering Informatics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
  • Anahtar Kelimeler: Class imbalance, Construction productivity prediction, Data augmentation, Generative model, Graph attention network (GAT), Machine learning (ML), Variational autoencoder (VAE)
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

The predictive performance of machine learning (ML) models is challenged when trained on class imbalance real-world construction datasets, reducing the accuracy of relevant decisions. In construction projects, the collection of a balanced dataset is not always feasible. Here, the integration of generative and prediction models holds potential, synthesizing the underrepresented class and configuring a balanced input dataset. This study improves the performance of construction prediction models through the integration of a generative model that augments the dataset for the underrepresented class. For this, a variational autoencoder (VAE) was integrated into a multi-head graph attention network (GAT), whereby a comprehensive construction productivity dataset was collected across different projects related to different construction activities, each with a particular structure and level of class imbalance. Balancing the class distribution led to a significant increase in the predictive performance of the GAT model, where accuracy jumped from 90.6 % to 92.5 %, 81.1 % to 94.4 %, and 92.2 % to 95.4 % when trained on finishing, concrete, and insulation activity networks, respectively.