SnappyData: Apache Spark meets Embedded In-Memory Database
I will introduce the in-memory based big data processing platform seamlessly integrated with Apache Spark.
Apache Spark is an excellent distributed computing framework. However, it is necessary to read the data each time it is processed. In addition, it is necessary to write the processing results in some data store. As a result, it takes time to read and write, and there is a problem to use for real-time analytics.
SnappyData can solve this problem. SnappyData integrates the features of distributed in-memory database into Spark JVM. This makes both distributed computing and data store features available in one cluster. In other words, the data already exists in the distributed in-memory database cluster and you can execute Spark processing on that database.
The advantages of using SnappyData are as follows:
- Simple (Because both distributed computing and database features can be used in one cluster)
- Fast (No need to access another data store when reading / writing data)
- Spark Tuning (Optimized DAG generation, partially using extended Spark SQL workload)
- Mutable DataFrame (Can update DataFrame)
- Real-time state sharing between Spark wokers and transaction
- Unified data access API by table and SQL
- Synapsis Data Engine (Return processing results at fast at the expense of accuracy)
SnappyData is an extension of Spark. Therefore, in addition to SnappyData's own features, Spark's various features are also available.