Parameters Prediction of Regression Based on Gustafson-Kessel Clustering For Data Sets With Outliers


Akbaş S., Akbaş Y., Erbay Dalkılıç T.

International 10th USBIM Applied Sciences Congress, Dubai, Birleşik Arap Emirlikleri, 8 - 12 Kasım 2025, ss.279, (Özet Bildiri)

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: Dubai
  • Basıldığı Ülke: Birleşik Arap Emirlikleri
  • Sayfa Sayıları: ss.279
  • Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

In classical regression analyses, outliers in the dataset can significantly distort the estimation of model parameters and reduce the model's usability. In this context, weighted least squares is one of the most fundamental methods used for parameter estimation. In this study, to reduce the impact of outliers in the parameter estimation problem, the contribution of the data to the model is determined using membership degrees obtained with the Gustafson-Kessel (GK) clustering algorithm. In the proposed method, the problem of outliers in the dataset is addressed through a clustering-based preprocessing process that takes into account the natural structure of the data. Since the datasets whose parameters will be estimated often have elliptical-shaped distributions, the weighting process accordingly requires the use of an appropriate distance measure. Because the GK clustering algorithm uses the Mahalanobis distance measure, it provides a more robust structure against outliers for data with ellipsoidal distributions. With the proposed approach, the dataset is first divided into an optimal number of clusters using the GK algorithm. Then, the membership degrees obtained from the GK clustering algorithm on each cluster were used as weights for the data set divided into clusters, and the unknown parameters for the regression models were obtained. To examine the effectiveness of the proposed method, synthetic data sets were created, and parameter estimations were performed for these synthetic data sets using classical weighted least squares, fuzzy C-means (FCM)-based methods, and the proposed method. The mean squared error and coefficients of determination for the prediction models obtained from each method were determined, and the results were compared according to these established criteria. In conclusion, the GK clustering-based regression method offers an effective and flexible alternative for parameter estimation in data sets with high outlier density. This approach is expected to have potential applications in diverse fields such as machine learning, engineering modeling, and economic forecasting.