Stream Processing with In-Memory Data Grids: Creating the Digital Twin
This talk is targeted at application developers who want to explore the use of in-memory computing for streaming analytics. The talk’s goal is to describe a key limitation (tracking streaming context) of current techniques (e.g., Spark streaming) and describe a new approach (implementing a digital twin using an in-memory data grid) that overcomes this limitation. It explains how the object-oriented architecture of in-memory data grids makes them well suited to applications that implement digital twins. The audience should gain an understanding of a new design technique for IMC applications, learn how to make use of it, and explore the advantages it offers for streaming analytics. The importance of the talk is that this technique provides leverage for developers that may force the big data community to rethink current approaches to stream processing.
Abstract
Businesses that track data from live systems, such as patient monitoring networks or wind turbine farms, need insights within less than a second to react to fast-changing conditions, make mission-critical decisions, and capitalize on new opportunities. Popular software platforms for streaming analytics (e.g., Apache Storm, Spark Streaming, and legacy CEP) let applications extract insights from data streams but are not well suited to modeling the underlying real-time context in which streaming data must be evaluated. As a result, they can fail to deliver crucial value in helping steer the behavior of these systems. With their object-oriented architecture, in-memory data grids (IMDGs) are now poised to overcome these limitations and enable streaming analytics to deliver significantly higher value than previously thought possible.
The key to deeper introspection into the dynamic behavior of live systems for sub-second feedback is to shift the focus from solely examining incoming data streams to analyzing the combination of data streams and the data sources that generate them. This enables these streams to be viewed in a richer context and provide significantly more valuable insights. Gartner has used the term “digital twins” in its recent report “Top 10 Strategic Technology Trends for 2017” to refer to software-based representations of real-world entities, such as the above examples of patients or wind turbines. For example, consider a patient monitoring system receiving telemetry from a population with remote pacemakers. By creating a digital twin of each patient which tracks medical history, lifestyle, and current medications, the system can extract more information from this telemetry and filter unnecessary alerts. Likewise, modeling the specific characteristics and condition of a wind turbine helps streaming analytics interpret telemetry seeking to predict if a blade failure is imminent.
It is cumbersome and often inefficient to implement digital twins with traditional stream processing technologies because of their lack of an integrated, object-oriented storage model. However, IMDGs provide a highly effective platform for incorporating digital twins into streaming analytics. IMDGs combine object-oriented, in-memory data storage for hosting digital twins with fast data access and integrated computing to implement streaming updates and fine grained, sub-second analysis. These capabilities both simplify development and maximize performance by avoiding unnecessary data motion. Moreover, unlike Spark, IMDGs are designed to meet the stringent high availability requirements of live systems.
This talk explains how IMDGs can be used to provide a highly effective platform for building stream processing applications that incorporate digital twins. With code samples, it shows how a digital twin can be constructed and used to ingest and analyze incoming data streams to provide immediate feedback to a live system. This approach is compared other popular stream processing architectures, such as Spark and Apache Flink, which by their design make it difficult to implement a digital twin.
The talk also explores three specific advantages of IMDGs over other stream processing architectures: the ability to cleanly separate application-specific code from the grid’s orchestration layer using object-oriented techniques, the integration of in-memory data storage and computation to combine event ingestion with sub-second analysis, and increased performance and scalability resulting from the grid’s ability to minimize data motion. Examples in e-commerce and the Industrial Internet of Things are used to illustrate the importance of digital twins and these key advantages.
Dr. William L. Bain founded ScaleOut Software in 2003 to develop in-memory data grid and in-memory computing products. As CEO, he has led the creation of numerous innovations for integrating data-parallel computing with in-memory data storage. Bill holds a Ph.D. in electrical engineering from Rice University. Over a 38-year career focused on parallel computing, he has contributed to advancements at Bell Labs Research, Intel, and Microsoft, and holds several patents in computer architecture and distributed computing. Bill founded and ran three start-up companies prior to ScaleOut Software. The most recent, Valence Research, which developed and distributed Web load-balancing software, was acquired by Microsoft Corporation and is a key feature within the Windows Server operating system. As an investor and member of the screening committee for the Seattle-based Alliance of Angels, Bill is actively involved in entrepreneurship and the angel community.
Bill has presented at the prior three IMCS conferences in San Francisco.
Recent talks presented by Bill Bain:
• In-Memory Computing Summit Amsterdam 2017: Stream Processing with In Memory Data Grids: Creating the Digital Twin
• DEVintersection Spring 2017: Supercomputing with Microsoft’s Task Parallel Library
• In-Memory Computing Summit 2016: Implementing User-Defined Data Structures in In-Memory Data Grids
• Database Month New York April 2016: Using Memory-Based NoSQL Data Structures to Eliminate the Network Bottleneck
• IBM POWER8 ISV Testimonial 2015: POWER8 and ScaleOut Software: In-memory computing for operational intelligence
• In-Memory Computing Summit 2015: Implementing Operational Intelligence Using In-Memory, Data-Parallel Computing
• Database Month New York May 2015: Using In-Memory, Data-Parallel Computing for Operational Intelligence
• Big Data Spain 2014: Real Time Analytics with MapReduce And In-Memory
• Strata+Hadoop World 2014: Using Operational Intelligence to Track 10M Cable TV Viewers in Real Time
URLs of previous presentations:
• In-Memory Computing Summit Amsterdam 2017: https://imcsummit.org/eu/sessions/stream-processing-memory-data-grids-c…
• In-Memory Computing Summit 2016: https://imcsummit.org/2016/videos-and-slides/implementing-user-defined-… • Database Month New York April 2016: http://www.databasemonth.com/database/nosql-data, https://youtu.be/2KfiQPkuemM
• IBM POWER8 ISV Testimonial 2015: https://www.youtube.com/watch?v=7q5ERajssvs
• In-Memory Computing Summit 2015: http://www.slideshare.net/imcsummit/imcs2015-1-devimplementing-operatio…
• Database Month New York May 2015: http://www.databasemonth.com/database/scaleout-data, https://youtu.be/xaFcJmu1yqg
• Big Data Spain 2014: https://www.youtube.com/watch?v=52smTmprT7w
• Strata + Hadoop 2014: http://conferences.oreilly.com/strata/stratany2014/public/content/solut…, https://www.youtube.com/watch?v=nOSk5nnzUpA