Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges<bold> </bold>


GÜRCAN F., BERİGEL M.

2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Kizilcahamam, Turkey, 19 - 21 October 2018, pp.284-289 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • City: Kizilcahamam
  • Country: Turkey
  • Page Numbers: pp.284-289
  • Karadeniz Technical University Affiliated: Yes

Abstract

In today's technological environments, the vast majority of big data-driven applications and solutions are based on real-time processing of streaming data. The real-time processing and analytics of big data streams play a crucial role in the development of big-data driven applications and solutions. From this perspective, this paper defines a lifecycle for the real-time big data processing. It describes existing tools, tasks, and frameworks by associating them with the phases of the lifecycle, which include data ingestion, data storage, stream processing, analytical data store, and analysis and reporting. The paper also investigates the real-time big data processing tools consisting of Flume, Kafka, Nifi, Storm, Spark Streaming, S4, Flink, Samza, Hbase, Hive, Cassandra, Splunk, and Sap Hana. As well as, it discusses the up-to-date challenges of the real-time big data processing such as "volume, variety and heterogeneity", "data capture and storage", "inconsistency and incompleteness", "scalability", "real-time processing", "data visualization", "skill requirements", and "privacy and security". This paper may provide valuable insights into the understanding of the lifecycle, related tools and tasks, and challenges of real-time big data processing.