Understanding the differences between alertmanager’s group_wait, group_interval and repeat_interval
Alertmanager is an application that handles alerts sent by client applications such as Prometheus. It can also perform alert grouping, deduplication, silencing, inhibition. Definitely a useful addition to any modern monitoring infrastructure.
That being said, configuring it can be a little daunting with the many different configurations available and somewhat vague explanations on some of the terms.
While configuring Alertmanager, I came across these 3 confusing terms: group_wait, group_interval and repeat_interval.
Node-exporter setup with Systemd
For those who aren’t familiar, node-exporter is a Prometheus exporter that exposes hardware and OS metrics from *NIX kernels.
To get it up and running, there’s a simple guide on Prometheus official docs. The issue with the approach is that running node-exporter by executing binary directly isn’t the most reliable approach in a production environment as there’s no way to ensure that the node_exporter process will run continuously.
This is where systemd comes in.