International Journal of Automotive Science And Technology, cilt.10, sa.1, ss.26-39, 2026 (Scopus)
Considering the high data production in automotive sector production lines, the analysis of this data is of critical importance for predictive maintenance, energy efficiency, and quality control processes. However, increasing data volume challenges the limits of traditional methods and requires consideration of the performance evaluation of different libraries. This paper aims to compare the performance characteristics of Pandas, Dask, Modin, Vaex and Polars libraries in the Python ecosystem for processing large datasets obtained from welding machines used in modern automotive production systems. The study utilized real production data from Matay, an automotive parts supplier, consisting of approximately 30 days of exhaust production machine data with a size of 17 GB containing 106,167,826 rows. Subsets of different sizes (10K, 100K, 1M, 10M rows) were created from this dataset, and 11 different experiments were conducted on selected columns. These experiments cover the topics of reading data, filtering, sorting, grouping, merging, writing data in different formats (csv, parquet) and handling missing data. Then the experiments were evaluated based on three different metrics: total execution time, total memory usage, and CPU execution time. Each experiment was repeated 3 times and average values were recorded. In conclusion, this study demonstrates that Polars may be more advantageous for performance-oriented applications across all data scales. Ultimately, the strategic selection of these data processing tools serves as a critical enabler for digital transformation in the automotive industry; thereby facilitating the integration of digital twins and AI-driven quality control into high-performance Industry 4.0 ecosystems.