Orphan vs Zombie vs Daemon processes

What are processes? A process is basically a program in execution and a program is a piece of code which may be a single line or millions of lines long written in a programming language. When a UNIX machine gets powered up, the kernel will be loaded and complete its initialization process. Once initialization is completed, the kernel creates a set of processes in the user space, including the scheduling of the system management daemon process (usually named init) which has PID 1 and is responsible for running the right complement of services and daemons at any given time.
Read more →

Managing multiple EKS clusters access using Apiservers’ private endpoints with AWS VPN

I manage multiple EKS clusters (multi-envs multi-tenants) at work and access to these is via Bastion instances deployed within each VPC of those clusters. However this approach can become unmaintainable over time as the number of Bastion instances will grow with the number of clusters we manage. This means additional effort required for monitoring and maintenance of each of those Bastion instances. This led to the idea of removing all Bastion instances and configure direct access to Apiservers instead.
Read more →

Debugging containers using nsenter

If you have ever managed a Kubernetes cluster, chances are you have encountered pods that just doesn’t want to behave the way they are supposed to. You checked the logs and traced it back to the source code. Logic checks out ✅ You started narrowing down the causes. Networking issue? Configuration issue? You entered the container and decided to use ping to identify network connectivity issues. / $ ping google.com PING google.
Read more →

Visualizing alerts metrics on Grafana

When it comes to Prometheus and alerts, the typical use case is to send alerts to Alertmanager for handling (deduplication, grouping) and routing them to the various services such Slack, PagerDuty etc. However, there might be situations where we might need to perform analysis on alert patterns and being able to visualize how often the alerts are firing can be very useful. In this post, I will share how we can visualize the alert metrics on Grafana using the various PromQL operators and functions.
Read more →

Debugging a misfiring Prometheus alert

Last week at work, I encountered an alert that was misfiring. Or so I thought…
Read more →