Prediction of breast cancer using machine learning algorithms on different datasets

Yavuz Ö. Ç., Calp M. H., Erkengel H. C.

INGENIERÍA SOLIDARIA, vol.19, no.1, pp.1-32, 2023 (ESCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 19 Issue: 1
  • Publication Date: 2023
  • Doi Number: 10.16925/2357-6014.2023.01.08
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus, Fuente Academica Plus
  • Page Numbers: pp.1-32
  • Karadeniz Technical University Affiliated: Yes


Introduction: The research paper "Prediction of Breast Cancer using Machine Learning Algorithms on Different Datasets", was developed at Karadeniz Technical University in the year 2022. Problem: Breast cancer is a disease that is becoming more and more common, day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. This study aims to predict breast cancer by using machine learning (ML) algorithms on different datasets and demonstrates the applicability of these algorithms. Methodology: Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. Originality and Limitations of the Research: In the model developed within the scope of the study, the result values obtained from different algorithms such as NB, KNN, DT and RF were combined; the objective being to increase the performance of the model with this process, which is based on the Borda Voting method. Results: The prediction values obtained from each algorithm were written in different columns on the same spreadsheet and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. Conclusion: When the results were examined, it was seen that greater performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.