Target Audience
This talk is targeted at software architects and application developers who develop applications for streaming analytics that employ data-parallel computations to identify issues and aggregate trends across many incoming data streams. These applications need to track and analyze incoming messages from thousands of data sources so that they can (a) maintain a dynamic (in-memory) model of each data source’s behavior, (b) provide immediate feedback and alerting when issues are discovered, and (c) implement aggregate analysis to boost overall situational awareness.
Numerous applications can benefit from this type of streaming analytics, especially those with large numbers (thousands) of data sources. Examples include fleet tracking for rental car and trucking companies, asset tracking during disaster recovery, logistics for retail outlets, contact tracing for large companies, fraud detection in banking transactions, ecommerce recommendations, healthcare device tracking, security and intrusion detection, and more.
Purpose of the Talk
The talk describes a software construct called a real-time digital twin running on an in-memory data grid and its use to integrate streaming analytics and data-parallel computations. This construct provides a highly scalable architecture for simultaneously extracting dynamic information from thousands of data streams and continuously feeding it to MapReduce computations. The results of these computations can then be immediately visualized in real-time charts that identify dynamic trends.
The talk uses code samples and demos to show how real-time digital twins can simplify applications in streaming analytics by providing a straightforward framework for organizing application code. This approach offloads key functionality to the execution platform that otherwise would create challenges for the application developer, namely managing memory-based state, integrating data-parallel computations, avoiding scalability bottlenecks, and ensuring high availability.
Technologies Covered
The talk describes object-oriented APIs that are used to build applications in languages such as Java and C# that run on an in-memory data grid (IMDG). It describes their execution model on the IMDG, using the IMDG to store and access state information, and the implementation of MapReduce computations within the IMDG. It will cover scalability considerations for the IMDG and ensuring high availability at all stages of the streaming analytics pipeline. It will also compare this approach to pipelined and graph-oriented streaming analytics architectures, such as Apache Flink and Beam.
What the Audience Will Learn
The audience will learn about a new model for streaming analytics and its key benefits in comparison to other approaches in addressing challenges for applications that have thousands of data sources. They will see how this model can be applied in numerous use cases and how to use it to easily create data-parallel computations that continuously access streaming state. They will also see how the model enables individualized feedback for thousands of data sources as well as increased situational awareness from aggregate analysis.