Apache Apex for Real Time Stream Processing

Apache Apex, which evolved from the incubator stage in April 2016, is currently gaining wide popularity among live data streaming processors. Apache Apex stands out for its ability to process large volumes of data substantially faster than its counterparts.
Synerzip is known to take on work in the latest cutting-edge technologies and thus has gained experience in working with Apache Apex in its early stages.
Use Case –
One of the regular needs of the IT security domain is to monitor and analyze real-time firewall data. This also includes the requirement to run large and complex queries against the entire data set to explore and analyze the data for business decisions.
Apex Solution –
Apache Apex is suitable for this problem domain due to its low latency and abundant data connectors via Apache Malhar. Data visualization can be done with Kibana. So one of the possible solutions is as below: Flowchart
NOTE: Flume operator is not yet part of Apache Malhar
The firewall is configured to send Syslog data to Flume. The firewall data (incoming/outgoing data log) is the input to the application. Apache Flume works efficiently for collecting, aggregating, and moving large amounts of log data. It also works as a connector, which connects the firewall data to Kafka, which is used as a message queue.
Data coming in from Kafka needs to be sorted before getting into Elasticsearch. Elasticsearch is a distributed, open source search and analytics engine, designed for horizontal scalability, reliability, and easy management. The data transformation is done by the Apache Apex application. Apache Apex reads and processes the data coming from Kafka and the processed data is searched through Elasticsearch. The searched data is lastly visualized using Kibana.
The Synerzip team has written the Apex application which converts and processes the data from Kafka to Elasticsearch.

Data coming in from Syslog to Kafka –
1. Data coming from Syslog to Kafka
Apache Apex Application –
The developed application is a Directed Acyclic Graph (DAG) which contains one input operator (Kafka), one output operator (Elasticsearch) and multiple custom operators which receive data from Kafka. The data is then processed and forwarded to Elasticsearch to persist it.
The custom operators process the stream of incoming data and aggregates it as per the business logic. It is this aggregated data which is then sent out to Elasticsearch. Aggregation is done on a stream of data in a configurable period of time, and it can be different for each custom operator. The team has written multiple custom operators to process different types of firewall messages.
2. Apache Apex Application
3. Output in Kibana
To learn more about Apache Apex, visit Apache Apex.
Demo
Check out these demo applications:
Demo 1 – This Apache Apex application gets firewall logs from Kafka, processes that log and pushes it to the Elasticsearch using Apex Elasticsearch output operator.
Demo 2 – This Apache Apex application performs aggregation on live stream of Syslog data coming from firewall and stores the aggregated data in Elasticsearch.
Why Apache Apex?
ApacheApex Apache Apex was chosen over others because it is highly scalable and gives a great performance, is fault tolerant, stateful and most importantly, easily operable.
With Apache Apex, Synerzip observed that data could be easily monitored and analyzed to meet the clients’ growing business needs, while improving performance up to 50x at times.
Apache Apex is slowly generating steam both in headlines and also in real-life adoption by data science companies in managing and analyzing their big data.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Apache Apex for Real Time Stream Processing

Previous PostUnleashing Power BI for Interactive Data Visualization

Next PostAgile UX Design: The 6D Process

Talk to Us

Apache Apex for Real Time Stream Processing

Previous PostUnleashing Power BI for Interactive Data Visualization

Next PostAgile UX Design: The 6D Process

You May Also Like

Preventive Health Using Big Data

Making the Case for Prescriptive Analytics in Healthcare

7 Examples of Big Data Analytics in Healthcare

Talk to Us