Redis In memory data processing
Redis is an in memory, key-value, database written in C. The main advantages of Redis are performance, simplicity and extensibility. It introduces some common datatypes (like Strings, Hashes, Lists) and allows you to query the data using simple commands like GET/SET which retrieve the exact key you were asking for or allowing single key in-place update
One shortcoming of Redis is that it does not support cross-key/cross-shards aggregation queries, while there are several use-cases in which users might want to e.g count how many times a property appears in subset of all hashes or group by some value and return a unique count.
In order to achieve these things a user usually has three patterns he can use:
-
Retrieve all the data from the Redis and run the aggregation on the client side.
-
Analyze the data by another system like spark/hadoop and perform those queries there.
-
Use Redis embedded Lua capabilities and write some lua code that performs the aggregation on the server side.
The first and the second option require moving all the data over the network. While the third option is good for a single Redis server, but it does not support clustering and requires writing a Lua script which is not a common programming language that most engineers/data scientists are familiar with.
In the last couple of months we, at Redislabs, have been testing a new programming module. This module will allow users to perform aggregation queries on all or part of the data located on Redis database. The aggregation logic is performed on the server side and only then the aggregated data is returned to the user. This technique solves the issue of passing all the data to the client/spark/hadoop over the network and make computation much much faster.
Last, following the Streams support in Redis, it will also support streaming api allowing the user to trigger an execution plan on a Stream event.