Kubernetes Troubleshooting I

Restore ETCD

This is a process no well documented in the official docs and I messed up in my CKA exam:

1- check config of etcd process. Maybe you will need some details for the restore process

$ kubectl describe pod -n kube-system etcd-master
...
--name=master
--initial-cluster=master=https://127.0.0.1:2380
--initial-advertise-peer-urls=https://127.0.0.1:2380
...

2- Stop api-server if not running kubeadm

$ service kube-apiserver stop

3- Check help for all restore options. Keep in mind you will need (very likely) to provide certs for auth.

$ ETCDTL_API=3 etcdctl snapshot restore -h

4- Restore ETCD using a previous backup:

$ ETCDTL_API=3 etcdctl --endpoints 127.0.0.1:2379 snapshot restore FILE \
--cacert xxx --cert xx --key xxx

--data-dir /NEW/DIR \
--initial-cluster-toker TOKEN \ (token is any word) 

--name master \ 
--initial-cluster=master=https://127.0.0.1:2380 \ 
--initial-advertise-peer-urls=https://127.0.0.1:2380 

USE HTTPS!!!!

5- Add new lines and update volume paths in ETCD config. If it is a static pod, check in /etc/kubernetes/manifests in master node.

--data-dir=/NEW/DIR
--initial-cluster-token TOKEN

++ volumeMounts/volumes to new path /NEW/DIR !!!!

6- Restart services if not running kubeadm

$ systemctl daemon-reload
$ service etcd restart
$ service etcd kube-apiserver start

7- Checks

/// if using kubeadm, docker instance for etcd should restart
$ docker ps -a | grep -i etcd

/// check etcd is running showing members:
$ ETCDCTL_API=3 etcdctl member list --cacert xxx --cert xx --key xxx

Sidecar -logging

Based on this doc. You want to send some logs to stderr so you create a new container that takes those.

Container with a sidecar:

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
  - name: count
    image: busybox
   args: 
   - /bin/sh 
   - -c 
   - > i=0; 
       while true; 
       do 
        echo "$i: $(date)" >> /var/log/1.log; 
        echo "$(date) INFO $i" >> /var/log/2.log; i=$((i+1)); sleep 1; 
       done 
   volumeMounts: 
   - name: varlog 
     mountPath: /var/log
  - name: sidecar-1 
    image: busybox 
    args: [/bin/sh, -c, 'tail -n+1 -f /var/log/1.log'] 
    volumeMounts: 
      name: varlog
      mountPath: /var/log
  volumes:
    name: varlog
    emptyDir: {}

Now you can see the logs of “/var/log/1.log” going via “sidecar-1”

$ kubectl logs counter sidecar-1

CPU/Memory of a POD

Based on these links: link1 , link2, link3

If you want to use “kubectl top” you need to install “metrics-server”

$ kubectl top pod --all-namespaces

Keep in mind that “kubectl top” shows metrics for a given pod. That information is based on reports from cAdvisor, which collects real pods resource usage.

And as per link3, “kubectl top” is not the same as running “top” inside the container.

Node NotReady

Based on this link:

$ kubectl get nodes
$ kubectl describe nodes XXX

$ ssh node 
   -> check for kubelet logs 
     cat /var/log/kubelet.log
     $ journalctl -u kubelet // systemctl status kubelet --> if a service