Troubleshooting

For troubleshooting alerts triggered on a production cluster that must be resolved immediately, go to page Acting on Alerts!

About Kubernetes

On this page we list commands and workflows that assist in monitoring/debugging problems in Kubernetes.

Please view the Kubernetes Documentation for further information on how to administer a kubernetes environment.

Application Status

The Axual environment is built from containerized microservices. The containers are managed by Kubernetes. Kubernetes will list the status of each container and restart containers if health checks fail.

To list the status of each Axual module run the command:

kubectl -n kafka get pods
kubectl -n nginx get pods

The Axual Kubernetes Ingress (reverse proxy and security filter) runs inside of the nginx namespace. Your company may have its own Ingress configuration. Please check with your administrators if the nginx namespace is used.

In the output of the above command the status of each pod should be Running or Completed. If you have just started the system it will take some time for each pod to reach the status of Running. Some pods are used to initialize something in the system. After the initialization is complete the pod will stop with status Completed.

Debugging

If a pod does not start, then querying the pod status is generally the first step in debugging.

From the get pods command, copy the name of the pod and use that in the command below.

kubectl -n kafka describe pod <pod>

In the output from this command check the pod status and the Events. The events sometimes show a reason for the pod failure which does not show up in the logs. It is good to always check a failing pod startup with the describe command.

If there are general error coming from the pod please check the pod logs. Your company should have its own logging solution. For example at Axual we use the Elastic/Fluentd/Kibana stack.

We also have a general Logging page that describes how to configure the Kubernetes Pod logs.

Using the kubectl log, it is possible to view logs of Pods directly. Refer to the official documentation for more

kubectl -n kafka logs <pod>
# OR
kubectl -n kafka logs <pod> --previous

Please use this command if you have a reproducible error that should be easily viewed in logging.

If you need more detailed logs please alter the pod logging levels.

Restarting a service

It is possible that a service may become unresponsive, even though kubernetes reports the service as running. If you are certain that the service is not responding then please identify and restart the appropriate pod.

Step 1: Find the pod. Run the command:

kubectl -n kafka get pods

Step 2: Attempt to debug the cause of the issue.

Step 3: As a last resort restart the Pod.
For Kafka Pods, never delete Pods but use the Strimzi annotation strimzi.io/manual-rolling-update

kubectl -n kafka rollout restart deployments/<deployment-name>
# OR
kubectl -n kafka delete pod <pod>

Each service is a part of a replica set and will be automatically restarted when removed.

In situ configuration changes

When troubleshooting, the excellent GitOps workflow can become a burden, because it heavily decreases the speed in which configuration changes can be made or tested.
In troubleshooting scenarios it helps to modify Kubernetes Objects like ConfigMaps and Deployments directly via kubectl edit, Argo CD