International 10th USBIM Applied Sciences Congress, Dubai, Birleşik Arap Emirlikleri, 8 - 12 Kasım 2025, ss.279, (Özet Bildiri)
In classical regression analyses, outliers in
the dataset can significantly distort the estimation of model parameters and
reduce the model's usability. In this context, weighted least squares is one of
the most fundamental methods used for parameter estimation. In this study, to
reduce the impact of outliers in the parameter estimation problem, the
contribution of the data to the model is determined using membership degrees
obtained with the Gustafson-Kessel (GK) clustering algorithm. In the proposed method,
the problem of outliers in the dataset is addressed through a clustering-based
preprocessing process that takes into account the natural structure of the
data. Since the datasets whose parameters will be estimated often have
elliptical-shaped distributions, the weighting process accordingly requires the
use of an appropriate distance measure. Because the GK clustering algorithm
uses the Mahalanobis distance measure, it provides a more robust structure
against outliers for data with ellipsoidal distributions. With the proposed
approach, the dataset is first divided into an optimal number of clusters using
the GK algorithm. Then, the membership degrees obtained from the GK clustering
algorithm on each cluster were used as weights for the data set divided into
clusters, and the unknown parameters for the regression models were obtained.
To examine the effectiveness of the proposed method, synthetic data sets were
created, and parameter estimations were performed for these synthetic data sets
using classical weighted least squares, fuzzy C-means (FCM)-based methods, and
the proposed method. The mean squared error and coefficients of determination
for the prediction models obtained from each method were determined, and the
results were compared according to these established criteria. In conclusion,
the GK clustering-based regression method offers an effective and flexible
alternative for parameter estimation in data sets with high outlier density.
This approach is expected to have potential applications in diverse fields such
as machine learning, engineering modeling, and economic forecasting.