Ingesting Streaming Data for Analysis in Apache Ignite
Apache Ignite provides a distributed platform for a wide variety of workloads, but often the issue is simply in getting data into the database in the first place. The wide variety of data sources and formats presents a challenge to any data engineer; in addition, 'data drift', the constant and inevitable mutation of the incoming data's structure and semantics, can break even the most well-engineered integration.
This session, aimed at data architects, data engineers and developers, will explore how we can use the open source StreamSets Data Collector to build robust data pipelines. Attendees will learn how to collect data from cloud platforms such as Amazon and Salesforce, devices, relational databases and other sources, continuously stream it to Ignite, and then use features such as Ignite's continuous queries to perform streaming analysis.
We'll start by covering the basics of reading files from disk, move on to relational databases, then look at more challenging sources such as APIs and message queues. You will learn how to:
- Build data pipelines to ingest a wide variety of data into Apache Ignite
- Anticipate and manage data drift to ensure that data keeps flowing
- Perform simple and complex ad-hoc queries in Ignite via SQL
- Write applications using Ignite to run continuous queries, combining data from multiple sources