Kubernetes Troubleshooting I

Restore ETCD

This is a process no well documented in the official docs and I messed up in my CKA exam:

1- check config of etcd process. Maybe you will need some details for the restore process

$ kubectl describe pod -n kube-system etcd-master

2- Stop api-server if not running kubeadm

$ service kube-apiserver stop

3- Check help for all restore options. Keep in mind you will need (very likely) to provide certs for auth.

$ ETCDTL_API=3 etcdctl snapshot restore -h

4- Restore ETCD using a previous backup:

$ ETCDTL_API=3 etcdctl --endpoints snapshot restore FILE \
--cacert xxx --cert xx --key xxx

--data-dir /NEW/DIR \
--initial-cluster-toker TOKEN \ (token is any word) 

--name master \ 
--initial-cluster=master= \ 


5- Add new lines and update volume paths in ETCD config. If it is a static pod, check in /etc/kubernetes/manifests in master node.

--initial-cluster-token TOKEN

++ volumeMounts/volumes to new path /NEW/DIR !!!!

6- Restart services if not running kubeadm

$ systemctl daemon-reload
$ service etcd restart
$ service etcd kube-apiserver start

7- Checks

/// if using kubeadm, docker instance for etcd should restart
$ docker ps -a | grep -i etcd

/// check etcd is running showing members:
$ ETCDCTL_API=3 etcdctl member list --cacert xxx --cert xx --key xxx

Sidecar -logging

Based on this doc. You want to send some logs to stderr so you create a new container that takes those.

Container with a sidecar:

apiVersion: v1
kind: Pod
  name: counter
  - name: count
    image: busybox
   - /bin/sh 
   - -c 
   - > i=0; 
       while true; 
        echo "$i: $(date)" >> /var/log/1.log; 
        echo "$(date) INFO $i" >> /var/log/2.log; i=$((i+1)); sleep 1; 
   - name: varlog 
     mountPath: /var/log
  - name: sidecar-1 
    image: busybox 
    args: [/bin/sh, -c, 'tail -n+1 -f /var/log/1.log'] 
      name: varlog
      mountPath: /var/log
    name: varlog
    emptyDir: {}

Now you can see the logs of “/var/log/1.log” going via “sidecar-1”

$ kubectl logs counter sidecar-1

CPU/Memory of a POD

Based on these links: link1 , link2, link3

If you want to use “kubectl top” you need to install “metrics-server”

$ kubectl top pod --all-namespaces

Keep in mind that “kubectl top” shows metrics for a given pod. That information is based on reports from cAdvisor, which collects real pods resource usage.

And as per link3, “kubectl top” is not the same as running “top” inside the container.

Node NotReady

Based on this link:

$ kubectl get nodes
$ kubectl describe nodes XXX

$ ssh node 
   -> check for kubelet logs 
     cat /var/log/kubelet.log
     $ journalctl -u kubelet // systemctl status kubelet --> if a service