Big Data Streaming: Phases
Big Data streaming involves processing continues streams of data in order to extract Real-time insights.
Big Data Streaming is important because some data requires action with seconds or milliseconds after the triggered incident. For example :
• Delay in tsunami prediction can cost people life.
• Delay in traffic jam prediction cost extra time.
• Advertisement can lose popularity, if not targeted correctly.
NoSQL databases are commonly used to solve challenges posed by stream processing techniques. For example to create a recommendation system we need to have a storage method that can store and fetch data for user to for Minimal Latency and can Store High Amount of Data.
Real-Time Processing of Big Data :
Real-Time processing consist of continuous input, processing and analysis of reporting data.
• The process consists of a sequence of repeated operations, in which the data streams are transferred to the memory.
• Real-time processing is crucial in order to continue the high level of functionality of automated systems having intensive data stream and different type data structures.
• For example Bank ATMs, radar system, disaster management system, internet of things and social networks.
Real-Time Big Data Processing Lifecycle :
Real-Time Big Data Processing Lifecycle has five phases :
Real-time Data Ingestion Phase —
Big Data is ingested from the Heterogeneous Data Sources. In Data Ingestion Process lots of real-time processing paradigm use message ingestion store, so to act as a buffer.
Tools for Data Ingestion Phase: Flume and Kafka
Data Storage Phase —
The Data Storage Phase covers the operations for storage of real-time data systems and streams that have different data structure.
Stream Processing Phase —
The Stream Processing Phase Real-time big data is processed and structure for the real-time analyst is and decision making. In this phase, various framework and paradigm are used according to the nature of the real-time application.
Tools For Stream Processing Phase: Spark Streaming, Stor, S4
Analytical Data Store Phase —
The Analytical Data Store Phase is often needed to store and serve data in a structured format. This process makes query data using the analysis tool.
Tools For Analytical Data Store Phase: HBase, HIVE
Analysis and Reporting Phase —
Analysis and Reporting are aimed at providing information and insights for decision making from analysis and reporting.
Data Processing Architectures :
A good data processing architecture should have the following quality
• Fault-tolerant and Scalability.
• Supportive of batch and increment updates.
• Extensibility