Best Practices for Monitoring Distributed In-Memory Systems
When you add a distributed cluster in-between existing systems or new APIs, you introduce a lot of moving parts that can be almost impossible to track and troubleshoot for performance issues or failures. Learn how the veterans monitor various components of a distributed cluster for network, memory, or node-specific issues, and troubleshoot to resolve issues. By the end of this session you'll have a handy check-list and set of tools to consider using for your own deployments. This session will cover:
- How to monitor applications, cluster node logs and metrics, JVM, operating system, and the network
- What some of the best tools are for different scenarios, including:
- Log-based monitoring including Logstash, Elasticsearch, Kibana or Splunk
- Grafana
- Application monitoring (throughput and latency, GC)
- Node’s local metrics monitoring (memory/GC/CPU)
- Network issues monitoring (checking node connectivity and latency)
- GridGain Web Console
- Tips and tricks for how to configure and optimize monitoring