Optimizing Large-Scale Data Processing in Smart Manufacturing: A Benchmarking Study on Automotive Industry Data

Yavuz, ZAFER; Bilgin, Turgay

doi:10.30939/ijastech..1676422

Optimizing Large-Scale Data Processing in Smart Manufacturing: A Benchmarking Study on Automotive Industry Data

Yavuz Z., Bilgin T. T.

International Journal of Automotive Science And Technology, cilt.10, sa.1, ss.26-39, 2026 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 10 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.30939/ijastech..1676422
Dergi Adı: International Journal of Automotive Science And Technology
Derginin Tarandığı İndeksler: Scopus
Sayfa Sayıları: ss.26-39
Karadeniz Teknik Üniversitesi Adresli: Evet

Özet

Considering the high data production in automotive sector production lines, the analysis of this data is of critical importance for predictive maintenance, energy efficiency, and quality control processes. However, increasing data volume challenges the limits of traditional methods and requires consideration of the performance evaluation of different libraries. This paper aims to compare the performance characteristics of Pandas, Dask, Modin, Vaex and Polars libraries in the Python ecosystem for processing large datasets obtained from welding machines used in modern automotive production systems. The study utilized real production data from Matay, an automotive parts supplier, consisting of approximately 30 days of exhaust production machine data with a size of 17 GB containing 106,167,826 rows. Subsets of different sizes (10K, 100K, 1M, 10M rows) were created from this dataset, and 11 different experiments were conducted on selected columns. These experiments cover the topics of reading data, filtering, sorting, grouping, merging, writing data in different formats (csv, parquet) and handling missing data. Then the experiments were evaluated based on three different metrics: total execution time, total memory usage, and CPU execution time. Each experiment was repeated 3 times and average values were recorded. In conclusion, this study demonstrates that Polars may be more advantageous for performance-oriented applications across all data scales. Ultimately, the strategic selection of these data processing tools serves as a critical enabler for digital transformation in the automotive industry; thereby facilitating the integration of digital twins and AI-driven quality control into high-performance Industry 4.0 ecosystems.