Everything We Learned About In-Memory Data Layout While Building VoltDB
The team behind the H-Store academic database and the VoltDB commercial database have been building and refining in-memory data storage for nearly ten years now, trying many different ways to organize data in memory. We’ve had successes and failures, and also lots of fascinating experiments. While we still have unanswered questions and ongoing experiments, this talk will share the highlights of what we have learned so far. It will be technical in nature and useful to those who store lots of data in memory as well as those who simply want to appreciate what goes into solving problems like these.
We will start with defining ways to measure success beyond finding balance between speed, compactness and parallelism. Specifically, which consequences of our layout choices make customers successful?
Next, what types of layout choices make sense for different workloads? Beyond columns versus rows, we’ll discuss covering indexes, fixed vs variable size rows, dealing with 8-byte pointers, and mutable vs immutable data. We’ll ask what makes sense for uniform access, and what make sense when data is divisible into hot and cold segments. How do decisions change when workloads need to scan many tuples? When does it make sense to keep multiple copies of data in a different format? Does it ever make sense to cache in-memory data?
Finally, and this will be the fun part, what are the traps and pitfalls to avoid? This talk will touch on garbage collection mitigation, fragmentation of mutable data, the traps of third-party code and other problems we’ve run into over the years.
John Hugg has spent his entire career working with databases and information management at a number of startups including Vertica Systems and now VoltDB. As the founding engineer at VoltDB, he was involved in all of the key early design decisions and worked collaboratively with the new VoltDB team as well as academic researchers exploring similar problems. In addition to his engineering role, John has become a primary evangelist for the technology, speaking at conferences worldwide and authoring blog posts on VoltDB, as well as on the state of OLTP and stream processing broadly.