Troubleshooting
Evaluate cluster and node logging
Master Node(s)
ETCD
Usually, most etcd implementations also include etcdctl, which can aid in monitoring the state of the cluster. If you’re unsure where to find it, execute the following:
find / -name etcdctl
Leveraging this tool to check the cluster status:
etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | 4e30a295f2c3c1a4 | 3.5.0 | 8.1 MB | true | false | 3 | 7903 | 7903 | |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
The cluster this was executed on has only one master node, hence only one result from the script. You will normally receive a response for each etcd member in the cluster.
Alternatively, leverage kubectl get componentstatuses:
kubectl get componentstatuses #ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
Etcd may also be running as a Pod:
kubectl logs etcd-ubuntu -n kube-system
Kube-apiserver
This is dependent on the environment for which the Kubernetes platform has been installed on. For systemd based systems:
journalctl -u kube-apiserver
Or
cat /var/log/kube-apiserver.log
Or for instances where Kube-API server is running as a static pod:
kubectl logs kube-apiserver-k8s-master-03 -n kube-system
Kube-Scheduler
For systemd-based systems
journalctl -u kube-scheduler
Or
cat /var/log/kube-scheduler.log
Or for instances where Kube-Scheduler is running as a static pod:
kubectl logs kube-scheduler-k8s-master-03 -n kube-system
Kube-Controller-Manager
For systemd-based systems
journalctl -u kube-controller-manager
Or
cat /var/log/kube-controller-manager.log
Or for instances where Kube-controller manager is running as a static pod:
kubectl logs kube-controller-manager-k8s-master-03 -n kube-system
Worker Node(s)
CNI
Obviously this is dependent on the CNI in use for the cluster you’re working on. However, using Flannel as an example:
journalctl -u flanneld
If running as a pod, however:
Kubectl logs --namespace kube-system <POD-ID> -c kube-flannel
kubectl logs --namespace kube-system weave-net-pwjkj -c weave
Kube-Proxy
For systemd-based systems
journalctl -u kube-proxy
Or
cat /var/log/kube-proxy.log
Or for instances where Kube-proxy manager is running as a static pod:
kubectl logs kube-proxy -n kube-system
Kubelet
journalctl -u kubelet
Or
cat /var/log/kubelet.log
Container Runtime
Similarly to the CNI, this depends on which container runtime has been deployed, but using Docker as an example:
For systemd-based systems:
journalctl -u docker.service
Or
cat /var/log/docker.log
Hint : list the contents of etc/systemd/system
if it’s a systemd-based service (containerd.service may be here)
Cluster Logging
At a cluster level, kubectl get events
provides a good overview.
Understand how to monitor applications
This section is a bit open-ended as it highly depends on what you have deployed and the topology of an application. Typically, however, we have an application that runs as a number of inter-connected microservices, consequently we monitor our applications by monitoring the underlying objects that comprise it, such as:
- Pods
- Deployments
- Services
- etc
Manage container stdout & stderr logs
Kubernetes handles and redirects any output generated from a containers stdout and stderr streams. These get directed through a logging driver which influences where to store these logs. Different implementations of Docker differ in exact implementation (such as RHEL's flavor of Docker) but commonly, these drivers will write to a file in json format:
root@ubuntu:~# docker info | grep "Logging Driver"
Logging Driver: json-file
The location for these logs is typically /var/log/containers
but can be tweaked. Additionally, these contain symlinks:
root@ubuntu:~# ls -la /var/log/containers/
total 44
drwxr-xr-x 2 root root 4096 Feb 8 19:18 .
drwxrwxr-x 11 root syslog 4096 Feb 12 00:00 ..
lrwxrwxrwx 1 root root 100 Feb 8 19:17 coredns-74ff55c5b-j4trd_kube-system_coredns-5d65324791ffcdf45d3552d875c6834f9a305c5be84b18745cb1657f784e5dd0.log -> /var/log/pods/kube-system_coredns-74ff55c5b-j4trd_4afbef57-5592-4edb-96af-9d17f595d160/coredns/0.log
lrwxrwxrwx 1 root root 100 Feb 8 19:17 coredns-74ff55c5b-wrgkr_kube-system_coredns-b2fcfa679e9725dbe601bc1a0f218121a9c44b91d7300bbb57039a4edd219991.log -> /var/log/pods/kube-system_coredns-74ff55c5b-wrgkr_b64ac6d5-654b-4194-b6a8-5f8aa4c3cbe2/coredns/0.log
lrwxrwxrwx 1 root root 81 Feb 8 19:16 etcd-ubuntu_kube-system_etcd-fcc5bc99932f380781776baa125b6f3be035e18fcec520afb827102e2afce1cd.log -> /var/log/pods/kube-system_etcd-ubuntu_f608198a8b73b3cf090bd15e2823df04/etcd/0.log
lrwxrwxrwx 1 root root 101 Feb 8 19:16 kube-apiserver-ubuntu_kube-system_kube-apiserver-a264bbd54b7f23c8d424b0b368a48fdd1c5dcecc72fca95a460c146b2b5d85f5.log -> /var/log/pods/kube-system_kube-apiserver-ubuntu_212641053a16fa2bb404ccde20f6eaf0/kube-apiserver/0.log
lrwxrwxrwx 1 root root 119 Feb 8 19:17 kube-controller-manager-ubuntu_kube-system_kube-controller-manager-a4ef7fe2b52272ea77f8de2da0989a9bcee757ae778fc08f1786b26b45bf13e1.log -> /var/log/pods/kube-system_kube-controller-manager-ubuntu_7bbe7d37f1b2c7586237165580c2f5c3/kube-controller-manager/0.log
lrwxrwxrwx 1 root root 102 Feb 8 19:05 kube-flannel-ds-rfsfs_kube-system_install-cni-f8762e22fbf17925432682bdb1259a066208c62fa695d09cd6ee9b0cef3d36ba.log -> /var/log/pods/kube-system_kube-flannel-ds-rfsfs_2892d4e3-e326-4b4b-90c0-396fb80863ca/install-cni/0.log
lrwxrwxrwx 1 root root 103 Feb 8 19:05 kube-flannel-ds-rfsfs_kube-system_kube-flannel-f96e019717814d7360e1aacd275cac121c13e0ee94cc5c93dcb35365608e6f83.log -> /var/log/pods/kube-system_kube-flannel-ds-rfsfs_2892d4e3-e326-4b4b-90c0-396fb80863ca/kube-flannel/0.log
lrwxrwxrwx 1 root root 96 Feb 8 19:17 kube-proxy-l52f9_kube-system_kube-proxy-bfe08cb8663b46551e8608c094194ec61d03edfa7d25a6f414c07ed6563ada89.log -> /var/log/pods/kube-system_kube-proxy-l52f9_d2b73ed1-5df4-4a18-9595-20798db4f110/kube-proxy/0.log
lrwxrwxrwx 1 root root 101 Feb 8 19:17 kube-scheduler-ubuntu_kube-system_kube-scheduler-4a695e53684f4591ec9385d6944f7841c0329aa49be220e5af6304da281cb41a.log -> /var/log/pods/kube-system_kube-scheduler-ubuntu_69cd289b4ed80ced4f95a59ff60fa102/kube-scheduler/0.log
Troubleshoot application failure
This is a somewhat ambitious topic to cover as how we approach troubleshooting application failures varies by the architecture of that application, which resources/API objects we're leveraging, if the application contains logs. However, good starting points would include running things like:
kubectl describe <object>
kubectl logs <podname>
kubectl get events
Troubleshoot cluster component failure
Covered in "Evaluate cluster and node logging"
Troubleshoot networking
DNS Resolution
Pods
and Services
will automatically have a DNS record registered against coredns
in the cluster, aka "A" records for IPv4 and "AAAA" for IPv6. The format of which is:
pod-ip-address.my-namespace.pod.cluster-domain.example
my-svc-name.my-namespace.svc.cluster-domain.example
Pod DNS records resolve to a single entity, even if the Pod contains multiple containers as they share the same networking space.
Service DNS records resolve to the respective service object.
Pods will automatically have their DNS resolution configured based on coredns settings. This can be validated by opening a shell to the pod and inspecting /etc/resolv.conf:
> kubectl exec -it web-server sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ # cat /etc/resolv.conf
nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
10.43.0.10
being the coredns service object:
> kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 16d
To test resolution, we can run a pod with nslookup
to test. For the pod below:
> kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-server 1/1 Running 0 2d20h 10.42.1.31 ip-172-31-36-67 <none> <none>
Knowing the format of the A record:
pod-ip-address.my-namespace.pod.cluster-domain.example
We should be able to resolve 10-42-1-31.default.pod.cluster.local
. Tip : To determine the cluster domain, inspect the coredns configmap. Below indicating cluster.local
.
> kubectl get cm coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
Create a Pod with the tools required:
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
Test lookup:
kubectl exec -i -t dnsutils -- nslookup 10-42-1-31.default.pod.cluster.local
> kubectl exec -i -t dnsutils -- nslookup 10-42-1-31.default.pod.cluster.local
Server: 10.43.0.10
Address: 10.43.0.10#53
Name: 10-42-1-31.default.pod.cluster.local
Address: 10.42.1.31
Similarly, for a service, in this case a service called nginx-service
that resides in the default namespace:
> kubectl exec -i -t dnsutils -- nslookup nginx-service.default.svc.cluster.local
Server: 10.43.0.10
Address: 10.43.0.10#53
Name: nginx-service.default.svc.cluster.local
Address: 10.43.0.223
> kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-service ClusterIP 10.43.0.223 <none> 80/TCP 9m15s
CNI Issues
Mainly covered earlier in acquiring logs for the CNI. However, one issue that might occur is when a CNI is incorrectly, or not initialised. This may cause workloads to enter a pending
status:
kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 0/1 Pending 0 57s <none> <none> <none> <none>
kubectl describe <pod>
can help identify issues with assigning IP addresses to nodes from the CNI
Port Checking
Similarly, with leveraging nslookup
to validate DNS resolution in our cluster, we can lean on other tools to perform other diagnostic. All we need is a pod that has a utility like netcat
, telnet
etc.