Harnessing the power of Spark for enterprise data engineering and analytics
In our work across clients, we have built several large scale Spark based data pipelines and analytics solutions, learning first-hand how to bring scalable in-memory computing to business users. We have had to build extreme-engineering solutions to crunch the SLAs for enterprise data processing and fine tune Spark jobs while also helping users transition from old school technologies to Spark based analytics workbenches. In this talk, we will share our some of the typical problem statements for enterprise data processing and analytics enablement our clients bring to us, our brief perspective on the solution, and more importantly our experiences and learning on the technical implementations. This will cover areas such as key performance pitfalls in Spark for typical data management jobs, helping users learn the dos and donts when using Spark for analytics, integrating data management and AI algorithms in pipelines, and experiences from implementing some interesting frameworks for analytics such as Spark Modular View.