Understanding the differences between alertmanager’s group_wait, group_interval and repeat_interval

Alertmanager is an application that handles alerts sent by client applications such as Prometheus. It can also perform alert grouping, deduplication, silencing, inhibition. Definitely a useful addition to any modern monitoring infrastructure.

That being said, configuring it can be a little daunting with the many different configurations available and somewhat vague explanations on some of the terms.

While configuring Alertmanager, I came across these 3 confusing terms: group_wait, group_interval and repeat_interval.

Read more →

Node-exporter setup with Systemd

For those who aren’t familiar, node-exporter is a Prometheus exporter that exposes hardware and OS metrics from *NIX kernels.

To get it up and running, there’s a simple guide on Prometheus official docs. The issue with the approach is that running node-exporter by executing binary directly isn’t the most reliable approach in a production environment as there’s no way to ensure that the node_exporter process will run continuously.

This is where systemd comes in. systemd is an init system and system maanger and comes with a management tool called systemctl meant for managing processes, checking statuses, configuration and changing system states.

Read more →