Big Data Streaming: Phases

Kartikeya Mishra
2 min readAug 1, 2020
image from google

Big Data streaming involves processing continues streams of data in order to extract Real-time insights.

Big Data Streaming is important because some data requires action with seconds or milliseconds after the triggered incident. For example :

• Delay in tsunami prediction can cost people life.

• Delay in traffic jam prediction cost extra time.

• Advertisement can lose popularity, if not targeted correctly.

NoSQL databases are commonly used to solve challenges posed by stream processing techniques. For example to create a recommendation system we need to have a storage method that can store and fetch data for user to for Minimal Latency and can Store High Amount of Data.

Real-Time Processing of Big Data :

Real-Time processing consist of continuous input, processing and analysis of reporting data.

• The process consists of a sequence of repeated operations, in which the data streams are transferred to the memory.

• Real-time processing is crucial in order to continue the high level of functionality of automated systems having intensive data stream and different type data structures.

• For example Bank ATMs, radar system, disaster management system, internet of things and social networks.

Real-Time Big Data Processing Lifecycle :

Real-Time Big Data Processing Lifecycle has five phases :

Real-time Data Ingestion Phase

Big Data is ingested from the Heterogeneous Data Sources. In Data Ingestion Process lots of real-time processing paradigm use message ingestion store, so to act as a buffer.

Tools for Data Ingestion Phase: Flume and Kafka

Data Storage Phase —

The Data Storage Phase covers the operations for storage of real-time data systems and streams that have different data structure.

Stream Processing Phase —

The Stream Processing Phase Real-time big data is processed and structure for the real-time analyst is and decision making. In this phase, various framework and paradigm are used according to the nature of the real-time application.

Tools For Stream Processing Phase: Spark Streaming, Stor, S4

Analytical Data Store Phase —

The Analytical Data Store Phase is often needed to store and serve data in a structured format. This process makes query data using the analysis tool.

Tools For Analytical Data Store Phase: HBase, HIVE

Analysis and Reporting Phase —

Analysis and Reporting are aimed at providing information and insights for decision making from analysis and reporting.

Data Processing Architectures :

A good data processing architecture should have the following quality

• Fault-tolerant and Scalability.

• Supportive of batch and increment updates.

• Extensibility

--

--

Kartikeya Mishra

All about new technology in fun and easy way so that you can be confident in it and make your own piece of work using this knowledge !