{"id":425,"date":"2020-09-20T21:09:59","date_gmt":"2020-09-20T20:09:59","guid":{"rendered":"https:\/\/blog.thomarite.uk\/?p=425"},"modified":"2020-11-08T12:27:10","modified_gmt":"2020-11-08T12:27:10","slug":"kubernetes-troubleshooting-i","status":"publish","type":"post","link":"https:\/\/blog.thomarite.uk\/index.php\/2020\/09\/20\/kubernetes-troubleshooting-i\/","title":{"rendered":"Kubernetes Troubleshooting I"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Restore ETCD<\/h2>\n\n\n\n<p>This is a process no well documented in the official docs and I messed up in my CKA exam:<\/p>\n\n\n\n<p>1- check config of etcd process. Maybe you will need some details for the restore process<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ kubectl describe pod -n kube-system etcd-master\n...\n--name=master\n--initial-cluster=master=https:\/\/127.0.0.1:2380\n--initial-advertise-peer-urls=https:\/\/127.0.0.1:2380\n...<\/pre>\n\n\n\n<p>2- Stop api-server if not running kubeadm<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ service kube-apiserver stop<\/pre>\n\n\n\n<p>3- Check help for all restore options. Keep in mind you will need (very likely) to provide certs for auth.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ ETCDTL_API=3 etcdctl snapshot restore -h<\/pre>\n\n\n\n<p>4- Restore ETCD using a previous backup:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ ETCDTL_API=3 etcdctl --endpoints 127.0.0.1:2379 snapshot restore FILE \\\n--cacert xxx --cert xx --key xxx\n\n<code>--data-dir \/NEW\/DIR \\<\/code>\n<code>--initial-cluster-toker TOKEN \\ (token is any word) <\/code>\n\n<code>--name master \\ <\/code>\n<code>--initial-cluster=master=https:\/\/127.0.0.1:2380 \\ <\/code>\n<code>--initial-advertise-peer-urls=https:\/\/127.0.0.1:2380 <\/code>\n\n<code>USE HTTPS!!!!<\/code><\/pre>\n\n\n\n<p>5- Add new lines and update volume paths in ETCD config. If it is a static pod, check in \/etc\/kubernetes\/manifests in master node.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">--data-dir=\/NEW\/DIR\n--initial-cluster-token TOKEN\n\n++ volumeMounts\/volumes to new path \/NEW\/DIR !!!!<\/pre>\n\n\n\n<p>6- Restart services if not running kubeadm<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ systemctl daemon-reload\n$ service etcd restart\n$ service etcd kube-apiserver start<\/pre>\n\n\n\n<p>7- Checks<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\/\/\/ if using kubeadm, docker instance for etcd should restart\n$ docker ps -a | grep -i etcd\n\n\/\/\/ check etcd is running showing members:\n$ ETCDCTL_API=3 etcdctl member list --cacert xxx --cert xx --key xxx<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Sidecar -logging<\/h2>\n\n\n\n<p>Based on this <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/cluster-administration\/logging\/\">doc<\/a>. You want to send some logs to stderr so you create a new container that takes those.<\/p>\n\n\n\n<p>Container with a sidecar:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">apiVersion: v1\nkind: Pod\nmetadata:\n  name: counter\nspec:\n  containers:\n  - name: count\n    image: busybox\n<code>   args: <\/code>\n   <code>- \/bin\/sh <\/code>\n   <code>- -c <\/code>\n   <code>- > i=0; <\/code>\n       <code>while true; <\/code>\n       <code>do <\/code>\n        <code>echo \"$i: $(date)\" >> \/var\/log\/1.log; <\/code>\n      <code>  echo \"$(date) INFO $i\" >> \/var\/log\/2.log; i=$((i+1)); sleep 1; <\/code>\n      <code> done <\/code>\n  <code> volumeMounts: <\/code>\n  <code> - name: varlog <\/code>\n     <code>mountPath: \/var\/log<\/code>\n  - name: sidecar-1 \n    image: busybox \n    args: [\/bin\/sh, -c, 'tail -n+1 -f \/var\/log\/1.log'] \n    volumeMounts: \n      name: varlog\n      mountPath: \/var\/log\n  volumes:\n    name: varlog\n    emptyDir: {}<\/pre>\n\n\n\n<p>Now you can see the logs of &#8220;\/var\/log\/1.log&#8221; going via &#8220;sidecar-1&#8221;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ kubectl logs counter sidecar-1<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">CPU\/Memory of a POD<\/h2>\n\n\n\n<p>Based on these links: <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/configure-pod-container\/assign-cpu-resource\/\">link1<\/a> , <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/resource-usage-monitoring\/\">link2<\/a>, <a href=\"https:\/\/stackoverflow.com\/questions\/51641310\/kubernetes-top-vs-linux-top\">link3<\/a><\/p>\n\n\n\n<p>If you want to use &#8220;kubectl top&#8221; you need to install &#8220;metrics-server&#8221;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ kubectl top pod --all-namespaces<\/pre>\n\n\n\n<p>Keep in mind that &#8220;kubectl top&#8221; shows metrics for a given pod. That information is based on reports from cAdvisor, which collects real pods resource usage.<\/p>\n\n\n\n<p>And as per link3, &#8220;kubectl top&#8221; is not the same as running &#8220;top&#8221; inside the container.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Node NotReady<\/h2>\n\n\n\n<p>Based on this <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/debug-cluster\/\">link<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$ kubectl get nodes\n$ kubectl describe nodes XXX\n\n$ ssh node \n   -> check for kubelet logs \n     cat \/var\/log\/kubelet.log\n     $ journalctl -u kubelet \/\/ systemctl status kubelet --> if a service<\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Restore ETCD This is a process no well documented in the official docs and I messed up in my CKA exam: 1- check config of etcd process. Maybe you will need some details for the restore process $ kubectl describe pod -n kube-system etcd-master &#8230; &#8211;name=master &#8211;initial-cluster=master=https:\/\/127.0.0.1:2380 &#8211;initial-advertise-peer-urls=https:\/\/127.0.0.1:2380 &#8230; 2- Stop api-server if not running &hellip; <a href=\"https:\/\/blog.thomarite.uk\/index.php\/2020\/09\/20\/kubernetes-troubleshooting-i\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Kubernetes Troubleshooting I&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27],"tags":[],"class_list":["post-425","post","type-post","status-publish","format-standard","hentry","category-kubernetes"],"_links":{"self":[{"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/posts\/425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/comments?post=425"}],"version-history":[{"count":1,"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/posts\/425\/revisions"}],"predecessor-version":[{"id":426,"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/posts\/425\/revisions\/426"}],"wp:attachment":[{"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/media?parent=425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/categories?post=425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.thomarite.uk\/index.php\/wp-json\/wp\/v2\/tags?post=425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}