CKA

I am studying for the Kubernetes certification CKA. These are some notes:

1- CORE CONCEPTS

1.1- Cluster Architecture

Master node: manage, plan, schedule and monitor. These are the main components:

etcd: db as k-v
scheduler
controller-manager: node-controller, replication-controller
apiserver: makes communications between all parts
docker

Worker node: host apps as containers. Main components:

kubelet (captain of the ship)
kube-proxy: allow communication between nodes

1.2- ETCD

It is a distributed key-value store (database). TCP 2379. Stores info about nodes, pods, configs, secrets, accounts, roles, bindings etc. Everything related to the cluster.

Basic commands:

client: ./etcdctl set key1 value1
        ./etcdctl get key1

Install Manual:

1- wget "github binary path to etc"
2- setup config file: important "--advertise-client-urls: IP:2379"
                      a lot of certs needed!!!

Install via kubeadm already includes etcd:

$ kubectl get pods -n kube-system | grep etcd

// get all keys from etcd
$ kubectl exec etcd-master -n kube-system etcdctl get / --prefix -keys-only

etcd can be set up as a cluster, but this is for another section.

1.3- Kube API Server

You can install a binary (like etcd) or use it via kubeadm.

It has many options and it defines certs for all connections!!!

1.4- Kube Controller-Manager

You can install a binary (like etcd) or use kubeadm. It gets all the info via the API server. Watch status of pods, remediate situations. Parts:

node-controller
replications-controller

1.5- Kube Scheduler

Decides which pod goes to which node. You can install a binary or via kubeadm.

1.6- Kubelet

It is like the “captain” of the “ship” (node). Communicates with the kube-cluster via the api-server.

Important: kubeadm doesnt install kubelet

1.7- Kube-Proxy

In a cluster, each pod can reach any other pod -> you need a pod network!

It runs in each node. Creates rules in each node (iptables) to use “services”

1.8- POD

It is the smallest kube object.

1 pod =~ 1 container + help container

It can be created via a “kubectl run” or via yaml file.

apiVersion: v1
kind: Pod
metadata:
  name: postgres-pod
  labels:
    name: postgres-pod
    app: demo-voting-app
spec:
  containers:
    - name: postgres
      image: postgres
      ports:
        - containerPort: 5432
      env:
        - name: POSTGRES_USER
          value: "postgres"
        - name: POSTGRES_PASSWORD
          value: "postgres"

Commands:

$ kubectl create -f my-pod.yaml
$ kubectl get pods
$ kubectl describe pod postgres

It always contains “apiVersion”, “kind”, “metadata” and “spec”.

1.9 ReplicaSet

Object in charge of monitoring pods, HA, loadbalancing, scaling. It is a replacement of “replication-controller”. Inside the spec.tempate you “cope/paste” the pod definition.

The important part is “selector.matchLabels” where you decide what pods are going to be managed by this replicaset

Example:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: my-rs
  labels:
    app: myapp
spec:
  replicas: 3
  selector: // match pods created before the RS - main difference between RS 
                                                                      and RC
    matchLabels:
      app: myapp   --> find labels from pods matching this
  template:
    metadata:
      name: myapp-pod
      labels:
        app: myapp
    spec:
      containers:
      - name: nginx-controller
        image: nginx

Commands:

$ kubectl create -f my-rs.yaml
$ kubectl get replicaset
$ kubectl scale --replicas=4 replicaset my-rs
$ kubectl replace -f my-rs.yaml

1.10- Deployments

It is an object that creates a pod + replicaset. It provides the upgrade (rolling updates) feature to the pods.

File is identical as a RS, only changes the “kind”

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: myapp
spec:
  replicas: 3
  selector: // match pods created before the RS - main difference between RS 
                                                                   and RC
    matchLabels:
      app: myapp   --> find labesl from pods matching this
  template:
    metadata:
      name: myapp-pod
      labels:
        app: myapp
    spec:
      containers:
      - name: nginx-controller
        image: nginx

Commands:

$ kubectl create -f my-rs.yaml
$ kubectl get deployments
$ kubectl get replicaset
$ kubectl get pods

1.11- Namespace

It is a way to create different environments in the cluster. ie: production, testing, features, etc. You can control the resource allocations for the “ns”

By default you have 3 namespaces:

kube-system: where all control-plane pods are installed
default:
kube-public:

The “ns” is used in DNS.

db-service.dev.svc.cluster.local
---------  --- ---  -----------
svc name   ns  type domain(default)

10-10-1-3.default.pod.cluster-local
--------- ---     ---  -----------
pod IP    ns      type  domain(default)

Keep in mind that POD DNS names are just the “IP” in “-” format.

You can add “namespace: dev” into the “metadata” section of yaml files. By default, namespace=default.

$ kubectl get pods --namespace=xx (by default is used "default" namespace)

Create “ns”:

namespace-dev.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: dev

$ kubectl create -f namespace-dev.yaml
or

$ kubectl create namespace dev

Change “ns” in your context if you dont want to type it in each kubectl command:

$ kubectl config set-context $(kubectl config current-context) -n dev

See all objects in all ns:

$ kubectl get pods --all-namespaces

$ kubectl get ns --no-headers | wc -l

1.12- Resource Quotas

You can state the resources (cpu, memory, etc) for a pod.

Example:

apiVersion: v1
kind: ResourceQuota
metadata:
 name: compute-quota
 namespace: dev
spec:
 hard:
   pods: "10"
   requests.cpu: "4"
   requests.memory: 5Gi
   limits.cpu: "10"
   limits.memory: 10Gi

Commands:

$ kubectl create -f compute-quota.yaml

1.13 Services

It is an object. It connects pods to external users or other pods.

Types:

NodePort: like docker port-mapping
ClusterIP: like a virtual IP that is reachable to all pods in the cluster.
LoadBalancer: only available in Cloud providers

1.13.1 NodePort

Like a virtual server. SessionAffinity: yes. Random Algorithm for scheduling.

Important parts:

targetport: This is the pod port.
port: This is the service port (most of the times, it is the same as targetport).
nodeport: This is in the node (the port other pods in different nodes are going to hit)

Example:

apiVersion: v1
kind: Service
metadata:
  name: mypapp-service
spec:
  type: NodePort
  ports:
  - targetPort: 80
    port: 80
    nodePort: 30080  (range: 30000-32767)
  selector:
    app: myapp        ---|
    type: front-end   ---|-> matches pods !!!!

The important bits are the “spec.ports” and “spec.selector” definitions. The “selector” is used to match on labels from pods where we want to apply this service.

Commands:

// declarative
$ kubectl create -f service-definition.yml
$ kubectl get services

// imperative
$ kubectl expose deployment simple-webapp-deployment --name=webapp-service --target-port=8080 --type=NodePort \
--dry-run=client -o yaml > svc.yaml --> create YAML !!!

Example of creating pod and service imperative way:

$ kubectl run redis --image=redis:alpine --labels=tier=db
$ kubectl expose pod redis --name redis-service --port 7379 --target-port 6379

1.13.2 ClusterIP

It is used for access to several pods (VIP). This is the default service type.

Example:

apiVersion: v1
kind: Service
metadata:
  name: back-end
spec:
  type: ClusterIP // (default)
  ports:
  - targetPort: 80
    port: 80
  selector:
    app: myapp
    type: back-end

Commands:

$ kubectl create -f service-definition.yml
$ kubectl get services

1.13.3 Service Bound

Whatever the service you use, you want to be sure it is in use, you can check that seeing if the service is bound to a node. That is configured by “selector” but to confirm that is correct, use the below command. You must have endpoints to proof your service is attached to some pods.

$ kubectl get service XXX | grep -i endpoint

1.13.4 Microservice Architecture Example

Based on this “diagram”:

voting-app     result-app
 (python)       (nodejs)
   |(1)           ^ (4)
   v              |
in-memoryDB       db
 (redis)       (postgresql)
    ^ (2)         ^ (3)
    |             |
    ------- -------
          | |
         worker
          (.net)

These are the steps we need to define:

1- deploy containers   -> deploy PODs (deployment)
2- enable connectivity -> create service clusterIP for redis
                          create service clusterIP for postgres
3- external access     -> create service NodePort for voting
                          create service NodePort for result

1.14- Imperative vs Declarative

imperative: how to do things (step by step)

$ kubectl run/create/expose/edit/scale/set …
$ kubectl replace -f x.yaml !!! x.yaml has been updated

declarative: just what to do (no how to do) –> infra as code / ansible, puppet, terraform, etc

$ kublectl apply -f x.yaml <--- it creates/updates

1.15 – kubectl and options

--dry-run: By default as soon as the command is run, the resource will be created. If you simply want to test your command , use the --dry-run=client option. This will not create the resource, instead, tell you weather the resource can be created and if your command is right.

-o yaml: This will output the resource definition in YAML format on screen.

$ kubectl explain pod --recursive ==> all options available

$ kubectl logs [-f] POD_NAME [CONTAINER_NAME]

$ kubectl -n prod exec -it PODNAME cat /log/app.log
$ kubectl -n prod logs PODNAME

1.16- Kubectl Apply

There are three type of files:

local file: This is our yaml file
live object config: This is the file generated via our local file and it is what you see when using “get”
last applied config: This is used to find out when fields are REMOVED from the local file

“kubectl apply” compares the three files above to find our what to add/delete.

2- SCHEDULING

2.1- Manual Scheduling

what to schedule? find pod without “nodeName” in the spec section, then finds a node for it.
only add “nodeName” at creation time
After creation, only via API call you can change that

Check you have a scheduler running:

$ kubectl -n kube-system get pods | grep -i scheduler

2.2 Labels and Selectors

group and select things together.
section “label” in yaml files

how to filter via cli:

$ kubectl get pods --selector key=value --selector k1=v1
$ kubectl get pods --selector key=value,k1=v1
$ kubectl get pods -l key=value -l k1=v1

In Replicasets/Services, the labels need to match!

--
spec:
 replicas: 3
 selector:
  matchLabels:
    app:App1 <----
 template:       |
   metadata:     |-- need to match !!!
    labels:      |
     app:App1 <---

2.3 Taints and Tolerations

set restrictions to check what pods can go to nodes. It doesn’t tell the POD where to go!!!

you set “taint” in nodes
you set “tolerance” in pods

Commands:

$ kubectl taint nodes NODE_NAME key=value:taint-effect
$ kubectl taint nodes node1 app=blue:NoSchedule <== apply
$ kubectl taint nodes node1 app=blue:NoSchedule- <== remove(-) !!!
$ kubectl taint nodes node1  <== display taints

*tain-effect = what happens to PODS that DO NOT Tolerate this taint? Three types:

- NoSchedule:
- PreferNoSchedule: will try to avoid the pod in the node, but not guarantee
- NoExecute: new pods will not be installed here, and current pods will exit if they dont tolerate the new taint. The node could have already pods before applying the taint…

Apply toleration in pod, in yaml, it is defined under “spec”:

spec:
 tolerations:
 - key: "app"
   operator: "Equal"
   value: "blue"
   effect: "NoSchedule"

In general, the master node never gets pods (only the static pods for control-plane)

$ kubectl describe node X | grep -i taint

2.4 Node Selector

tell pods where to go (different for taint/toleration)

First, apply on a node a label:

$ kubectl label nodes NODE key=value
$ kubectl label nodes NODE size=Large

Second, apply on pod under “spec” the entry “nodeSelector”:

...
spec:
  nodeSelector:
    size: Large

2.5 Node Affinity

extension of “node selector” with “and” “or” logic ==> mode complex !!!!

apply on pod:#
....
spec:
 affinity:
   nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:  or 
    preferredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: size
          operator: In    ||   NotIn   ||    Exists
          values:
          - Large              Small
          - Medium

DuringScheduling: pod is being created

2.6 Resource Limits

Pod needs by default: cpu(0.5) men(256m) and disk

By default: max cpu = 1 // max mem = 512Mi

Important regarding going over the limit:

if pod uses more cpu than limit -> throttle
                 mem            -> terminate (OOM)

Example:

pod
---
spec:
  containers:
    resources:
      requests:
        memory: "1Gi"
        cpu: 1
      limits:
        memory: "2Gi"
        cpu: 2

2.7 DaemonSets

It is like a replicaset (only kind changes). run 1 pod in each node: ie monitoring, logs viewer, networking (weave-net), kube-proxy!!!

It uses NodeAffinity and default scheduler to schedule pods in nodes.

$ kubectl get daemonset

if you add    a node, the daemonset creates that pod
       delete                       deletes

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: monitoring-daemon
spec:
  selector:
    matchLabels:
      app: monitoring-agent
  template:
    metadata:
      labels:
        app: monitoring-agent
    spec:
      containers:
      - name: monitoring-agent
        image: monitoring-agent

2.8 Static PODs

kubelet in a node, can create pods using files in /etc/kubernetes/manifests automatically. But, it can’t do replicasets, deployments, etc

The path for the static pods folder is defined in kubelet config file

kubelet.service <- config file
...
--config=kubeconfig.yaml \ or
--pod-manifest-path=/etc/kubernetes/manifests


kubeconfig.yaml
---
staticPodPath: /etc/kubernetes/manifests

You can check with”docker ps -a” in master for docker images running the static pods.

Static pods is mainly used by master nodes for installing pods related to the kube cluster (control-plane: controller, apiserver, etcd, ..)

Important:

you can’t delete static pods via kubectl. Only by deleting the yaml file for the folder “/etc/kubernetes/manifests”
the pods created via yaml in that folder, will have “-master” added to the name if you are in master node when using “kubectl get pods” or “-nodename” if it is other node.

Comparation Static-Pod vs Daemon-Set

static pod           vs          daemon-set
----------                       -----------
- created by kubelet              - created by kube-api
- deploy control-plane componets  - deploy monitoring, logging
    as static pods                     agents on nodes
- ignored by kube-scheduler       - ignored by kube-scheduler

2.9 Multiple Schedulers

You can write you own scheduler.

How to create it:

kube-scheduler.service
--scheduler-name= custom-scheduler

/etc/kubernetes/manifests/kube-scheduler.yam --> copy and modify
--- (a scheduler is a pod!!!)
apiVersion: v1
kind: Pod
metadata:
  name: my-custom-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=false
    - --scheduler-name=my-custom-scheduler
    - --lock-object-name=my-custom-scheduler
    image: xxx
    name: kube-scheduler
    ports:
    -  containerPort: XXX

Assign new scheduler to pod:

---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
  schedulerName: my-custom-scheduler

How to see logs:

$ kubectl get events ---> view scheduler logs
$ kubectl logs my-custom-scheduler -n kube-system

3- LOGGING AND MONITORING

Monitoring cluster components. There is nothing built-in (Oct 2018).

pay: datadog, dynatrace
Opensource Options: metrics server, prometheus, elastic stack, etc

3.1- metrics server

one per cluster. data kept in memory. kubelet (via cAdvisor) sends data to metric-server.

install: > minukube addons enable metrics-server //or
           other envs: git clone "github path to binary"
                       kubectl create -f deploy/1.8+/

view: > kubectl top node/pod

4- APPLICATION LIFECYCLE MANAGEMENT

4.1- Rolling updates / Rollout

rollout -> a new revision. This is the reason you create “deployment” objects.

There are two strategies:

recreate: destroy all, then create all -> outage! (scale to 0, then scale to X)
rolling update (default): update a container at each time -> no outage (It creates a new replicaset and then starts introducing new pods)

How to apply a new version?

1) Declarative: make change in deployment yaml file
kubectl apply -f x.yaml (recommended)

or

2) Imperative: 
kubectl create deployment nginx-deploy --image=nginx:1.16
kubectl set image deployment/nginx-deploy nginx=nginx:1.17 --record

How to check status of the rollout

status:   $ kubectl rollout status deployment/NAME
history:  $ kubectl rollout history deployment/NAME
rollback: $ kubectl rollout undo deployment/NAME

4.2- Application commands in Docker and Kube

From a “Dockerfile”:

---
FROM Ubuntu
ENTRYPOINT ["sleep"] --> cli commands are appended to entrypoint
CMD ["5"] --> if you dont pass any value in "docker run .." it uses by 
              default 5.
---

With the docker image created above, you can create a container like this:

$ docker run --name ubuntu-sleeper ubuntu-sleeper 10

So now, kubernetes yaml file:

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-sleeper-pod
spec:
  containers:
  -  name: ubuntu-sleeper
     image: ubuntu-sleeper
     command: ["sleep","10"] --> This overrides ENTRYPOINT in docker
     args: ["10"]   --> This overrides CMD [x] in docker
           ["--color=blue"]

4.3- Environment variables

You define them inside the spec.containers.container section:

spec:
 containers:
 - name: x
   image: x
   ports:
   - containerPort: x
   env:
   - name: APP_COLOR
     value: pink

4.4- ConfigMap

Defining env var can be tedious, so config maps is the way to manage them a bit better. You dont have to define in each pod all env vars… just one entry now.

First, create configmap object:

imperative $ kubectl create configmap NAME \
                       --from-literal=KEY=VALUE \
                       --from-literal=KEY2=VALUE2 \
                       or
                       --from-file=FILE_NAME
FILE_NAME
key1: val1
key2: val2

declarative $ kubectl create -f cm.yaml
            $ kubectl get configmaps

cat app-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
KEY1: VAL1
KEY2: VAL2

Apply configmap to a container in three ways:

1) Via "envFrom": all vars

spec:
  containers:
  - name: xxx
    envFrom:   // all values
    -  configMapRef:
         name: app-config

2) Via "env", to import only specific vars

spec:
 containers:
 - name: x
   image: x
   ports:
   - containerPort: x
   env:
   - name: APP_COLOR  -- get one var from a configmap, dont import everything
     valueFrom:
       configMapKeyRef:
         name: app-config
         key: APP_COLOR

3) Volume:

volumes:
- name: app-config-volume
  configMap:
    name: app-config

Check “explain” for more info:

$ kubectl explain pods --recursive | grep envFrom -A3

4.5- Secrets

This is encode in base64 so not really secure. It just avoid to have sensitive info in clear text…

A secret is only sent to a node if a pod on that node requires it.
Kubelet stores the secret into a tmpfs so that the secret is not written to disk storage. Once the Pod that depends on the secret is deleted, kubelet will delete its local copy of the secret data as well:
https://kubernetes.io/docs/concepts/configuration/secret/#risks

How to create secrets:

imperative $ kubectl create secret generic NAME \
                       --from-literal=KEY=VAL \
                       --from-literal=KEY2=VAL2 
                       or
                       --from-file=FILE
cat FILE
DB_Pass: password

declarative $ kubectl create -f secret.yaml

cat secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
data:
  DB_Pass: HASH <---- $ echo -n 'password' | base64 // ENCODE !!!!
                      $ echo -n 'HASH' | base64 --decode // DECODE !!!!

You can apply secrets in three ways:

1) as "envFrom" to import all params from secret object

spec:
  containers:
  - name: xxx 
    envFrom: 
    - secretRef:
        name: app-secret

2) Via "env" to declare only one secret param

spec:
  containers:
  - name: x
    image: x
    env:
      name: APP_COLOR
      valueFrom:
        secretKeyRef:
          name: app-secret
          key: DB_password

3) Volumes:

spec:
  containers:
  - command: ["sleep", "4800"]
    image: busybox
    name: secret-admin
    volumeMounts:
    - name: secret-volume
      mountPath: "/etc/secret-volume"
      readOnly: true
  volumes:
  - name: secret-volume
    secret:
      secretName: app-secret --> each key from the secret file is created
                                 as a file in the volume.
                                 The content of the file is the secret.


$ ls -ltr /etc/secret-volume
DB_Host
DB_User
DB_Password

4.6- Multi-container Pods

Scenarios where your app needs an agent, ie: web server + log agent

apiVersion: v1
kind: Pod
metadata:
  name: simple-webapp
  labels:
    name: simple-webapp
spec:
 containers:
 - name: simple-webapp
   image: simple-webapp
   ports:
   - containerPort: 8080
 - name: log-agent
   image: log-agent

4.7- Init Container

You use an init container when you want to setup something before the other containers are created. Once the initcontainers complete their job, the other containers are created.

An initContainer is configured in a pod like all other containers, except that it is specified inside a initContainers section

You can configure multiple such initContainers as well, like how we did for multi-pod containers. In that case each init container is run one at a time in sequential order.

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'git clone <some-repository-that-will-be-used-by-application> ;']
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

5- CLUSTER MAINTENANCE

5.1- Drain Node

If you need to upgrade/reboot a node, you need to move the pods to somewhere else to avoid an outage.

Commands:

$ kubectl drain NODE -> pods are moved to another nods and it doesnt 
                           receive anything new
$ kubectl uncordon NODE -> node can receive pods now

$ kubectl cordon NODE -> it doesnt drain the node, it just make the node to not receive new pods

“kube-controller-manager” check the status of the nodes. By default, kcm takes 5 minutes to mark down:

$ kube-controller-manager --pod-eviction-timeout=5m0s (by default) time masters waits for a node to be backup

5.2- Kubernetes upgrade

You need to check the version you are running:

$ kubectl get nodes --> version: v_major.minor.path

Important: kube only supports only the last two version from current, ie:

new current v1.12 -> support v1.11 and v1.10 ==> v1.9 is not supported!!!

Important: nothing can be higher version than kube-apiserver, ie:

kube-apiserver=x (v1.10)
- controller-mamanger, kube-scheduler can be x or x-1 (v1.10 , v1.9)
- kubetet, kube-proxy can be x, x-1 or x-2 (v1.8, v1.9, v1.10)
- kubectl can be x+1,x,x-1 !!!

Upgrade path: one minor upgrade at each time: v1.9 -> v1.10 -> v1.11 etc

Summary Upgrade:

1- upgrade master node
2- upgrade worker nodes (modes)
- all nodes at the same time
or
- one node at each time
- add new nodes with the new sw version, move pods to it, delete old node

5.2.1- Upgrade Master

From v1.11 to v1.12

$ kubeadm upgrade plan --> it gives you the info the upgrade

$ apt-get update

$ apt-get install -y kubeadm=1.12.0-00

$ kubeadm upgrade apply v1.12.0

$ kubectl get nodes (it gives you version of kubelet!!!!)

$ apt-get upgrade -y kubelet=1.12.0-00 // you need to do this if you have "master" in "kubectl get nodes"

$ systemctl restart kubelet

$ kubectl get nodes --> you should see "master" with the new version 1.12

5.2.2- Upgrade Worker

From v1.11 to v1.12

master:                     node-1
---------------------       -----------------------
$ kubectl drain node-1
                            apt-get update
                            apt-get install -y kubeadm=1.12.0-00
                            apt-get install -y kubelet=1.12.0-00
                            kubeadm upgrade node \
                                 [config --kubelet-version v1.12.0]
                            systemctl restart kubelet
$ kubectl uncordon node-1
$ apt-mark hold package

5.3- Backup Resources

$ kubectl get all --all-namespaces -o yaml > all-deploy-service.yaml

There are other tools like “velero” from Heptio that can do it. Out of scope for CKA.

5.4- Backup/Restore ETCD – Difficult

“etcd” is important because stores all cluster info.

The difficult part is to get the certificates parameters to get the etcd command working.

– You can get some clues from the static pod definition of etcd:

/etc/kubernetes/manifests/etcd.yaml: Find under exec.command

– or do a ps -ef | grep -i etcd and see the parameters used by other commands

verify command:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                      --cert=/etc/kubernetes/pki/etcd/server.crt \
                      --key=/etc/kubernetes/pki/etcd/server.key \
                      --endpoints=127.0.0.1:2379 member list

create backup:
ETCDCTL_API=3 etcdctl snapshot save SNAPSHOT-BACKUP.db \
                    --endpoints=https://127.0.0.1:2379 \
                    --cacert=/etc/etcd/ca.crt \
                    --cert=/etc/etcd/etcd-server.crt \
                    --key=/etc/etcd/etcd-server.key

verify backup:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                      --cert=/etc/kubernetes/pki/etcd/server.crt \
                      --key=/etc/kubernetes/pki/etcd/server.key \
                      --endpoints=127.0.0.1:2379 \
                      snapshot status PATH/FILE -w table

Summary:

etcd backup:
1- documentation: find the basic command for the API version
2- ps -ef | grep etcd --> get path for certificates
3- run command
4- verify backup

5.3.1- Restore ETCD

// 1- Stop api server
$ service kube-apiserver stop

// 2- apply etcd backup
$ ETCDCTL_API=3 etcdctl snapshot restore SNAPSHOT-BACKUP.db \
                  --endpoints=https://127.0.0.1:2379 \
                  --cacert=/etc/etcd/ca.crt \
                  --cert=/etc/etcd/etcd-server.crt \
                  --key=/etc/etcd/etcd-server.key
                  --data-dir /var/lib/etcd-from-backup \
                  --initial-cluster master-1=https://127.0.0.1:2380,
                                      master-2=https://x.x.x.y:2380 \
                  --initial-cluster-token NEW_TOKEN \
                  --name=master
                  --initial-advertise-peer-urls https://127.0.0.1:2380

// 3- Check backup folder
$ ls -ltr /var/lib/etcd-from-backup -> you should see a folder "member"

// 4- Update etcd.service file. The changes will apply immediately as it is a static pod

$ vim /etc/kubernetes/manifests/etcd.yaml
...
--data-dir=/var/lib/etcd-from-backup (update this line with new path)
--initial-cluster-token=NEW_TOKEN (add this line)
…
volumeMounts:
- mountPath: /var/lib/etcd-from-backup (update this line with new path)
  name: etcd-data
…
volumes:
- hostPath:
    path: /var/lib/etcd-from-backup (update this line with new path)
    type: DirectoryOrCreate
  name: etcd-data

// 5- Reload services
$ systemctl daemon-reload
$ service etcd restart
$ service kube-apiserver start

Important: In cloud env like aws,gcp you dont have access to ectd…

6- SECURITY

6.1- Security Primitives

kube-apiserver: who can access: files, certs, ldap, service accounts
                what can they do: RBAC authorization, ABAC autho

6.2- Authentication

Kubectl :

users: admin, devs                   --> kubectl can't create accounts
service accountsL 3rd parties (bots) --> kubectl can create accounts

You can use static file for authentication – NO RECOMMENDED

file x.csv:
   password, user, uid, gid --> --basic-auth-file=x.csv

token token.csv:
   token, user, uid, gid --> --token-auth-file=token.csv

Use of auth files in kube-api config:

kube-apiserver.yaml
---
spec:
  containers:
  - command: 
    … 
    - --basic-auth-file=x.csv 
    // or
    - --token-auth-file=x.csv

Use of auth in API calls:

$ curl -v -k https://master-node-ip:6443/api/v1/pods -u "user1:password1"
$ curl -v -k https://master-node-ip:6443/api/v1/pods \
    --header "Authorization: Bearer TOKEN"

6.3- TLS / Generate Certs

openssl commands to create required files:

gen key:  openssl genrsa -out admin.key 2048
gen cert: openssl rsa -in admin.key -pubout > mybank.pem
gen csr:  openssl req -new -key admin-key -out admin.csr \
                   -subj "/CN=kube-admin/O=system:masters"
             (admin, scheduler, controller-manager, kube-proxy,etc)

Generate cert with SAN:

0) Gen key: 
openssl genrsa -out apiserver.key 2048

1) Create openssl.cnf with SAN info
[req]
req_extensions = v3_req
[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
IP.1 = 10.96.1.1
IP.2 = 172.16.0.1

2) Gen CSR:
openssl req -new -key apiserver.key -subj "/CN=kube-apiserve" -out apiserver.csr -config openssl.cnf

3) Sign CSR with CA:
openssl x509 -req -in apiserver.csr -CA ca.crt -CAkey ca.key -out apiserver.crt

Self-Signed Cert: Sign the CSR with own key to generate the cert:

$ openssl x509 -req -in ca.csr -signkey ca.key -out ca.crt

User cers to query API:

$ curl https://kube-apiserver:6443/api/v1/pods --key admin.key --cert admin.crt --cacert ca.crt

Kube-api server config related to certs…:

--etcd-cafile=
--etcd-certfile=
--etcd-keyfile=
…
--kubelet-certificate-authority=
--kubelet-client-certificate=
--kubelet-client-key=
…
--client-ca-file=
--tls-cert-file=
--tls-private-key-file=
…

Kubelet-nodes:

server cert name => kubelet-nodeX.crt
                    kubelet-nodeX.key

client cert name => Group: System:Nodes name: system:node:node0x

kubeadm can generate all certs for you:

cat /etc/kubernetes/manifests/kube-apiserver.yaml
spec:
  containers:
  - command:
    - --client-ca-file=
    - --etcd-cafile
    - --etcd-certfile
    - --etcd-keyfile
    - --kubelet-client-certificate
    - --kubelet-client-key
    - --tls-cert-file
    - --tls-private-key-file

How to check CN, SAN and date in cert?

$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout

Where you check if there are issues with certs in a core service:

if installed manually: > journalctl -u etcd.service -l
if installed kubeadm: > kubectl logs etcd-master

6.4- Certificates API

Generate certificates is quite cumbersome. So kubernetes has a Certificates API to generate the certs for users, etc

How to create a certificate for a user:

1) gen key for user
openssl genrsa -out new-admin.key 2048

2) gen csr for user
openssl req -new -key new-admin.key -subl "/CN=jane" -out new-admin.csr

3) create "CertificateSigningRequest" kubernetes object:

cat new-admin-csr.yaml
---
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: jane
spec:
  groups:
  - system:authenticated
  usages:
  - digital signature
  - key encipherment
  - server auth
  request: (cat new-admin.csr | base64)

kubectl create -f new-admin-csr.yaml

4) approve new certificate, it can't be done automatically:
kubectl get csr
kubectl certificate approve new-admin

5) show certificate to send to user
kubectl get certificate new-admin -o yaml --> put "certificate:" in (echo ".." | base64 --decode)

The certs used by CA API are in controller-manager config file:

kube-controller-manager.yaml
--cluster-signing-cert-file=
--cluster-signing-key-file=

6.5- Kubeconfig

kubectl is always querying the API whenever you run a command and use certs. You dont have to type the certs every time because it is configured in the kubectl config at ~HOME/.kube/config.

The kubeconfig file has three sections: clusters, users and contexts (that join users with contexts). And you can have several of each one.

kubeconfig example:

apiVersion: v1
kind: Config
current-context: dev-user@gcp // example: user@cluster

clusters: ///
  - name:
    cluster:
      certificate-authority: PATH/ca.crt 
       //or
      certificate-authority-data: $(cat ca.crt | base64)
      server: https://my-kube-playground:6443

contexts: /// user@cluster
  - name: my-kube-admin@my-kube-playground
    context: my-kube-playground
      user: my-kube-admin
      cluster: my-kube-playground
      namespace: production

users: //
  - name: my-kube-admin
    user:
    client-certificate: PATH/admin.crt
    client-key: PATH/admin.key
    //or
    client-certificate-data: $(cat admin.crt | base64)
    client-key-data: $(cat admin.key | base64)

You can test other user certs:

$ curl https://kube-apiserver:6443/api/v1/pods --key admin.key \
                                     --cert admin.crt --cacert ca.crt

$ kubectl get pods --server my-kube-playground:6443 \
                   --client-key admin.key \
                   --client-certificate admin.crt \
                   --certificate-authority ca.crt \

Use and view kubeconfig file:

$ kubectl get pods [--kubeconfig PATH/FILE]

$ kubectl config view [--kubeconfig PATH/FILE] <-- show kubectl config file

$ kubectl config use-context prod-user@prod <-- change use-context in file too!

6.6- API groups

This is a basic diagram of the API. Main thing is the difference between “api” (core stuff) and “apis” (named stuff – depends on a namespace):

/metrics  /healthx  /version  /api                /apis          /logs
                             (core)               (named)
                              /v1                   |
                      namespace pods rc      /apps /extensions ... (api groups)
                      pv pvc binding...      /v1                  /v1
                                              |
                                     /deployments /replicaset  (resources)
                                          |
                                     -list,get,create,delete,update (verbs)

You can reach the API via curl but using the certs…

$ curl https://localhost:6443 -k --key admin.key --cert admin.crt \
                                 --cacert ca.crt
$ curl https://localhost:6443/apis -k | grep "name"

You can make your life easier using a kubectl proxy that uses the kubectl credentials to access kupeapi

$ kubectl proxy -> launch a proxy in 8001 to avoid use auth each time
                   as it uses the ones from kube config file

$ curl http://localhost:8001 -k

Important:

                    kube proxy  != kubeCTL proxy (reach kubeapi)
    (service running on node for 
     pods connectivity)

6.7- Authorization

What you can do. There are several method to arrange authorization:

Node authorizer: (defined in certificate: Group: SYSTEM:NODES CN: system:node:node01)

ABAC (Atribute Base Access Control): difficult to manage. each user has a policy…
{"kind": "Policy", "spec": {"user": "dev-user", "namespace": "", "resource": "pods", "apiGroup": ""}}

RBAC: Role Base Access Control: mode standard usage. create role, assign users to roles

Webhook: use external 3rd party: ie "open policy agent"

AlwaysAllow, AlwaysDeny

You define the method in the kubeapi config file:

--authorization-mode=AlwaysAllow (default)
or
--authorization-mode=Node,RBAC,Webhook (you use all these mode for each request until allow)

6.8- RBAC

You need to define a role and a binding role (who uses which role) objects. This is “namespaced“.

dev-role.yaml
--
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: dev
  namespace: xxx
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "get", "create", "update", "delete"]
  resourceNames: ["blue", "orange"] <--- if you want to filter at pod level
                                        too: only access to blue,orange
- apiGroups: [""]
  resources: ["configMap"]
  verbs: ["create"]

$ kubectl create -f dev-role.yaml

dev-binding.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-binding
  namespace: xxx
subjects:
- kind: User
  name: dev-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: dev
  apiGroup: rbac.authorization.k8s.io

$ kubectl create -f dev-binding.yaml

Info about roles/rolebind:

$ kubectl get roles
               rolebindings
          describe role dev
                   rolebinding dev-binding

Important: How to test the access of a user?

$ kubectl auth can-i create deployments [--as dev-user] [-n prod]
                     update pods
                     delete nodes

6.9- Cluster Roles

This is for cluster resources (non-namespae): nodes, pv, csr, namespace, cluster-roles, cluster-roles-binding

You can see the full list for each with:

$ kubectl api-resources --namespaced=true/false

The process is the same, we need to define a cluster role and a cluster role binding:

cluster-admin-role.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-administrator
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list", "get", "create", "delete"]

cluster-admin-role-bind.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-role-bind
subjects:
- kind: User
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-administrator
  apiGroup: rbac.authorization.k8s.io

Important: You can create a “cluster role” for a user to access pods (ie), using cluster role, that give it access to all pod in all namespaces.

6.10- Images Security

Secure access to images used by pods. An image can be in docker, google repo, etc

image: docker.io/nginx/nginx
           |       |     |
       registry  user  image
                account

from google: gcr.io/kubernetes-e2e-test-images/dnsutils

You can use a private repository:

$ docker login private.io
  user:
  pass:

$ docker run private.io/apps/internal-app

How to define a private registry in kubectl:

kubectl create secret docker-registry regcred \
--docker-server= \
--docker-username= \
--docker-password= \
--docker-email=

How to use a specific registry in a pod?

spec:
  containers:
  - name: nginx
    image: private.io/apps/internal-app
    imagePullSecrets:
      name: regcred

6.11- Security Contexts

Like in docker, you can assign security params (like user, group id, etc) in kube containers. You can set the security params at pod or container level:

at pod level:
----
spec:
  securityContext:
  runAsUser: 1000

at container level:
---
spec:
  containers:
  - name: ubuntu
    securityContext:
      runAsUser: 100 (user id)
      capabilities: <=== ONY AT CONTAINER LEVEL!
        add: ["MAC_ADMIN"]

6.12- Network Polices

This is like a firewall, iptables implementation for access control at network level. Regardless the network plugin, all pods in a namespace can reach any other pod (without adding any route into the pod).

Network policies are supported in kube-router, calico, romana and weave-net. It is not supported in flannel (yet)

You have ingress (traffic received in a pod) and egress (traffic generated by a pod) rule. You match the rule to a pod using labels with podSelector:

networkpolicy: apply network rule on pods with label role:db to allow only traffic from pods with label name: api-pod into port 3306

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-policy
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  ingress:
  - from: 
    - podSelector:
        matchLabels:
          name: api-pod
    ports:
    - protocol: TCP
      port: 3306

$ kubectl apply -f xxx

6.13- Commands: kubectx / kubens

I haven’t seen any lab requesting the usage. For the exam is not required but maybe for real envs.

Kubectx
reference: https://github.com/ahmetb/kubectx

With this tool, you don't have to make use of lengthy “kubectl config” commands to switch between contexts. This tool is particularly useful to switch context between clusters in a multi-cluster environment.

Installation:
sudo git clone https://github.com/ahmetb/kubectx /opt/kubectx
sudo ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx

Kubens
This tool allows users to switch between namespaces quickly with a simple command.
sudo git clone https://github.com/ahmetb/kubectx /opt/kubectx
sudo ln -s /opt/kubectx/kubens /usr/local/bin/kubens

7- STORAGE

7.1- Storage in Docker

In docker, /container and /images are under /var/lib/docker.

Docker follows a layered architecture (each line in Dockerfile is a layer):

$ docker build --> Read Only (image layer)
$ docker run -> new layer: it is rw (container layer) - lost once docker finish

So docker follows a “copy-on-write” strategy by default. If you want to be able to access that storage after the docker container is destroyer, you can use volumes:

> docker volume create data_volume 
    --> /var/lib/docker/volumes/data_volume
> docker run -v data_volume:/var/lib/mysql mysql
    --> volume mounting -> dir created in docker folders
> docker run --mount type=bind,source=/data/mysql,target=/var/lib/mysql mysl --> path mounting,dir not created in docker folders

volume driver: local, azure, gce, aws ebs, glusterfs, vmware, etc

storage drivers: enable the layer driver: aufs, zfs, btrfs, device mapper, overlay, overlay2

7.2- Volumes, PersistentVolumes and PV claims.

Volume: Data persistence after container is destroyed

spec:
  containers:
  - image: alpine
    volumeMounts:
    - mountPath: /opt
      name: data-volume ==> /data -> alpine:/opt

  volumes:
  - name: data-volume
    hostPath:
      path: /data
      type: Directory

Persistent volumes: cluster pool of volumes that users can request part of it

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-vol1
spec:
  accessModes:
    - ReadWriteOnce (ReadOnlyMode, ReadWriteMany)
  capacity:
    storage: 1Gi
  hostPath:
    path: /tmp/data
  persistentVolumeReclaimPolicy: Retain (default) [Delete, Recycle]

$ kubectl create -f xxx
$ kubectl get persistenvolume [pv]

PV claims: use of a pv. Each pvc is bind to one pv.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

$ kubectl create -f xxx
$ kubectl get persistentvolumeclaim [pvc]  
      ==> If status is "bound" you have matched a PV

Use a PVC in a pod:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: myfrontend
    image: nginx
    volumeMounts:
    - mountPath: "/var/www/html"
      name: mypd
  volumes:
  - name: mypd
    persistentVolumeClaim:
      claimName: myclaim

Important: a PVC will bound to one PV that fits its requirements. Use “get pvc” to check status.

7.3- Storage Class

dynamic provisioning of storage in clouds:

sc-definition -> pvc-definition -> pod-definition 
     ==> we dont need pv-definition! it is created automatically

Example:

sc-definition
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gcp-storage <===========1
provisioner: kubernetes.io/gce-pd
parameters: (depends on provider!!!!)
  type:
  replication-type:

pvc-def
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim <=========2
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: gcp-storage <======1
  resources:
    requests:
      storage: 500Mi

use pvc in pod
---
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: myfrontend
    image: nginx
    volumeMounts:
    - mountPath: "/var/www/html"
      name: mypd <=======3
  volumes:
  - name: mypd <========3
    persistentVolumeClaim:
      claimName: myclaim <===========2

8- NETWORKING

8.1 Linux Networking Basics

$ ip link (show interfaces)

$ ip addr add 192.168.1.10/24 dev eth0
$ route

$ ip route add 192.168.2.0/24 via 192.168.1.1
$ ip route default via 192.168.1.1
            0.0.0.0/0

// enabling forwarding
$ echo 1 > /proc/sys/net/ipv4/ip_forward
$ vim /etc/sysctl.conf
  net.ipv4.ip_forward = 1

8.2 Linux DNS basics

$ cat /etc/resolv.conf 
nameserver 192.168.1.1
search mycompany.com prod.mycompany.com

$ nslookup x.x.x.x
$ dig

8.3 Linux Namespace

// create ns
ip netns add red
ip netns add blue
ip netns (list ns)
ip netns exec red ip link // ip -n red link
ip netns exec red arp

// create virtual ethernet between ns and assign port to them
ip link add veth-red type veth peer name veth-blue 
  (ip -n red link del veth-red)
ip link set veth-red netns red
ip link set veth-blue netns blue

// assign IPs to each end of the veth
ip -n red addr add 192.168.1.11 dev veth-red
ip -n blue addr add 192.168.1.12 dev veth-blue

// enable links
ip -n red link set veth-red up
ip -n blue link set veth-blue up

// test connectivity
ip netns exec red ping 192.168.1.2

======

// create bridge
ip link add v-net-0 type bridge

// enable bridge
ip link set dev v-net-0 up // ( ip -n red link del veth-red)

// create and attach links to bridge from each ns
ip link add veth-red type veth peer name veth-red-br
ip link add veth-blue type veth peer name veth-blue-br

ip link set veth-red netns red
ip link set veth-red-br master v-net-0

ip link set veth-blue netns blue
ip link set veth-blue-br master v-net-0

ip -n red addr add 192.168.1.11 dev veth-red
ip -n blue addr add 192.168.1.12 dev veth-blue

ip -n red link set veth-red up
ip -n blue link set veth-blue up

ip addr add 192.168.1.1/24 dev v-net-0

ip netns exec blue ip route add 192.168.2.0/24 via 192.168.1.1
ip netns exec blue ip route add default via 192.168.1.1

iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE
iptables -t nat -A PREROUTING --dport 80 --to-destination 192.168.1.11:80 -j DNAT

8.4 Docker Networking

Three types:

- none: no connectivity
- host: share host network
- bridge: internal network is created and host is attached
   (docker network ls --> bridge -| are the same thing
    ip link --> docker0          -|

iptables -t nat -A DOCKER -j DNAT --dport 8080 --to-destination 192.168.1.11:80

8.5 Container Network Interface

Container runtime must create network namespace:
- identify network the container must attach to
- container runtime to invoke network plugin (bridge) when container is added/deleted
- json format of network config

CNI: 
 must support command line arguments add/del/chec
 must support parametes container id, network ns
 manage IP
 resutls in specific format

**docker is not a CNI**

kubernetes uses docker. it is created in the "host" network and then uses "bridge"

8.6 Cluster Networking

Most common ports:

etcd: 2379 (2380 as client)
kube-api: 6443
kubelet: 10250
kube-scheduler: 10251
kube-controller: 10252
services: 30000-32767

Configure weave-network:

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

$ kubectl get pod -n kube-system | grep -i weave (one per node)

cluster-networking doc: Doesnt give you steps to configure any CNI….

8.7 Pod Networking

every pod should have an ip.
every pod shoud be able to community with every other pod in the same node and other nodes (without nat)

Networking config in kubelet:

--cni-conf-dir=/etc/cni/net.d
--cni-bin-dir=/etc/cni/bin
./net-script.sh add <container> <namespace>

8.8 CNI Weave-net

installs an agent in each node. deploy as pods in nodes

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" 

$ kubectl get pods -n kube-system | grep weave-net

ipam weave:

where pods and bridges get the IPs?
plugin: host-local -> provide free ips from node

8.9 Service Networking

“service” is cluster-wide object. The service has an IP. Kubeproxy in each node, creates iptables rules.

ClusterIP: IP reachable by all pods in the cluster

$ ps -ef | kube-api-server --service-cluster-ip-range=x.x.x.x/y
!! pod network shouldnt overlap with service-cluster
$ iptables -L -t -nat | grep xxx
$ cat /var/log/kube-proxy.log

NodePort: same port in all nodes, sent to the pod

IPs for pod: check logs of pod weave:

$ kubectl -n kube-system logs weave-POD weave 
    --> the pod has two container so you need to specify one of them

IPs for services –> check kube-api-server config

8.10 CoreDNS

For pods and services in the cluster (nodes are managed externally)

kube dns: hostname    namespace  type  root           ip address
          web-service apps       svc   cluster.local  x.x.x.x (service)
          10-244-2-5  default    pod   cluster.local  x.x.x.y (pod)

fqdn: web-service.apps.svc.cluster.local
      10-244-2-5.default.pod.cluster.local

dns implementation in kubernetes use coredns (two pods for ha)

cat /etc/coredns/Corefile
.53: {
  errors // plugins
  health
  kubernetes cluster.local in-addr.arpa ip6.arpa {
     pods insecure // create record for pod as 10-2-3-1 instead of 10.2.3.1
     upstream
     fallthrough in-addr.arpa ip6.arpa
  }
  prometheus: 9153
  proxy: . /etc/resolv.conf // for external queries (google.com) from a pod
  cache: 30
  reload
}

$ kubectl get configmap -n kube-system

pods dns config:

cat /etc/resolv.conf => nameserver IP 
    <- it is the IP from $ kubectl get service -n kubesystem | grep dns
                         this come from the kubelet config:
                         /var/lib/kubelet/config.yaml:
                           clusterDNS:
                           - 10.96.0.10

$ host ONLY_FQDN

8.11 Ingress

Using a service “LoadBalance” is only possible in Cloud env like GCP, AWS, etc

When you create a service loadbalancer, the cloud provider is going to create a proxy/loadbalancer to access that service. so you can create a hierarchy of loadbalancers in the cloud provider… –> too complex ==> sol: Ingress

ingress = controller + resources. Not deployed by default

supported controller: GCP HTTPS Load Balancer (GCE) and NGINX (used in kubernetes)

8.11.1 Controller

1) nginx --> deployment file:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-ingress-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      name: nginx-ingress
  template:
    metadata:
      labels:
        name: nginx-ingress
    spec:
      containers:
      - name: nginx-ingress-controller
        image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0
      args:
      - /nginx-ingress-controller
      - --configmap=$(POD_NAMESPACE)/nginx-configuration
      env:
      - name: POD_NAME
        valueFrom:
          fieldRef:
            fieldPath: metadata.name
      - name: POD_SPACE
        valueFrom:
          filedRef:
            fieldPath: metadata.namespace
      ports:
      - name: http
        containerPort: 80
      - name: https:
        containerPorts: 443

2) nginx configmap used in deployment
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration

3) service
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
  selector:
    name: nginx-ingress

4) service account (auth): roles, clusterroles, rolebinding, etc
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-ingress-serviceaccount

8.11.2 Options to deploy ingress rules

option1) 1rule/1backend: In this case the selector from the service, gives us the pod

ingress-wear.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear
spec:
  backend:
    serviceName: wear-service
    servicePort: 80


option 2) split traffic via URL: 1 Rule / 2 paths

           www.my-online-store.com
          /wear              /watch
                    |
                    V
                  nginx
                    |
           ----------------------
           |                     |
          svc                   svc
          wear                  vid
          ====                  ====
           |                      |
        wear-pod               vid-pod


ingress-wear-watch.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear-watch
spec:
  rules:
  - http: 
      paths: 
      - path: /wear
        backend:
          serviceName: wear-service
          servicePort: 80
      - path: /watch
        backend:
          serviceName: watch-service
          servicePort: 80

$ kubectl describe ingress NAME
    ==> watchout the default backend !!!! 
        if nothing matches, it goes there!!!
        you need to define a default backend



option 3) split by hostname: 2 Rules / 1 path each

wear.my-online-store.com           watch.my-online-store.com
        |------------------------------------|
                           |
                           V
                         nginx
                           |
                ----------------------
                |                    |
               svc                  svc
               wear                 vid
               ====                 ====
                |                    |
            wear-pod               vid-pod


ingress-wear-watch.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear-watch
spec:
  rules:
  - host: wear.my-online-store.com 
    http: 
      paths: 
      - backend:
          serviceName: wear-service
          servicePort: 80
  - host: watch.my-online-store.com
    http: 
      paths: 
      - backend:
          serviceName: watch-service
          servicePort: 80

ingress examples: https://kubernetes.github.io/ingress-nginx/examples/

8.12 Rewrite

I havent seen any question about this in the mock labs but just in case: Rewrite url nginx:

For example: replace(path, rewrite-target)
using: http://<ingress-service>:<ingress-port>/wear 
   --> http://<wear-service>:<port>/

In our case: replace("/wear","/")

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress
  namespace: critical-space
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - http: 
      paths: 
      - path: /wear
        backend:
          serviceName: wear-service
          servicePort: 8282

with regex
replace("/something(/|$)(.*)", "/$2")

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  name: rewrite
  namespace: default
spec:
  rules:
  - host: rewrite.bar.com 
    http: 
      paths: 
      - backend:
          serviceName: http-svc
          servicePort: 80
        path: /something(/|$)(.*)

9- Troubleshooting

9.1 App failure

- make an application diagram
- test the services: curl, kubectl describe service (compare with yaml)
- status pod (restarts), describe pod, pod logs (-f)

9.2 Control plane failure

- get nodes, get pods -n kube-system

- master: service kube-apiserver/kube-controller-manager/
                                            kube-scheduler  status
          kubeadm: kubectl logs kube-apiserver-master -n kube-system
          service: sudo journalctl -u kube-apiserver

- worker: service kubelet/kube-proxy status


- Do exist static pods configured in kubelet config?
   1 check /etc/systemd/system/kubelet.service.d/10-kubeadm.confg for config file
   2 check static pod path in kubelet config

9.3 Worker node failure

- get nodes, describe nodes x (check status column)
- top, dh, service kubelet status, kubelet certificates, kubelet service running?
- kubectl cluster-info

10- JSONPATH

10.1 Basics

$ = root dictionary
results are always in [] // list

$.car.price -> [1000]
---
{
  "car": {
    "color": "blue",
    "price": "1000"
   },
  "bus": {
    "color": "red",
    "price": "1200"
   }
}

$[0] -> ["car"]
---
[
 "car",
 "bus",
 "bike
]

$[?(@>40)] == get all numbers greater than 40 in the array -> [45, 60]
---
[
 12,
 45,
 60
]

$.car.wheels[?(@.location == "xxx")].model

// find prize winner named Malala
$.prizes[?(@)].laureates[?(@.firstname == "Malala")]

wildcard
---
$[*].model 
$.*.wheels[*].model

find the first names of all winners of year 2014
$.prizes[?(@.year == 2014)].laureates[*].firstname

lists
---
$[0:3] (start:end) -> 0,1,2 (first 3 elements)
$[0:8:2] (start:end:step) -> 0,0+2=2,2+2=4,4+2=6 -> 
                                elements in position 0,2,4,6
$[-1:0] = last element
$[-1:] = last element
$[-3:] = last 3 elements

10.2 Jsonpath in Kubernetes

$ kubectl get pods -o json

$ kubectl get nodes -o=jsonpath='{.items[*].metada.name}{"\n"}
                                 {.items[*].status.capacity.cpu}'
master node01
4      4

$ kubectl get nodes -o=jsonpath='{range .items[*]}\
                          {.metada.name}{"\t"}{.status.capacity.cpu}{"\n"}\
                          {end}'
master 4
node01 4

$ kubectl get nodes -o=custom-columns=NODE:.metadata.name,
                                      CPU:.status.capacity.cpu …
NODE CPU
master 4
node01 4

$ kubectl get nodes --sort-by= .metadata.name

$ kubectl config view --kubeconfig=/root/my-kube-config 
            -o=jsonpath='{.users[*].name}' > /opt/outputs/users.txt

$ kubectl config view --kubeconfig=my-kube-config 
       -o jsonpath="{.contexts[?(@.context.user=='aws-user')].name}" >
                     /opt/outputs/aws-context-name

Documentation.

11- Install, Config and Validate Kube Cluster

All based on this.

11.1- Basics

education: minikube
           kubeadm/gcp/aws

on-prem: kubeadm

laptop: minikube: deploys VMs (that are ready) - single node cluster
        kubeadm: require VMS to be ready - single/multi node cluster

turnkey solution: you provision, configure and maintein VMs. 
                  Use scripts to deploy cluster (KOPS in AWS)
                 ie: openshift (redhat), Vagrant, VMware PKS, Cloud Foundry

hosted solutions: (kubernetes as a service) provider provision and maintains VMs, install kubernetes: ie GKE in GCP

11.2 HA for Master

api-server --> need LB (active-active)

active/passive
$ kube-controller-manager --leader-elect true [options]
  --leader-elect-lease-duration 15s
  --leader-elect-renew-deadline 10s
  --leader-elect-retry-period 2s

etcd: inside the masters (2 nodes total) or in separated nodes (4 nodes total)

11.3 HA for ETCD

leader etcd, writes and send the info to the others
leader election - RAFT:
   quorum = n/2 + 1 -> minimun num of nodes to accept a transactio
                       successful.
   recommend: 3 etcd nodes minimun => ODD NUMBER

$ export ETCDCTL_API=3
$ etcdctl put key value
$ etcdctl get key
$ etcdctl get / --prefix --keys-only

11.4 Lab Deployment

LAB setup (5nodes)
  1 LB
  2 master nodes (with etcd)
  2 nodes
  weave-net

> download kubernetes latest release from github
> uncompress
> cd kubernetes
> cluster/get-kube-binaries.sh --> downloads the latest binaries for your system.
> cd server; tar -zxvf server-linux-xxx
> ls kubernetes/server/bin

Plan:
1- deploy etcd cluster
2- deploy control plane components (api-server, controller-manager, scheduler)
3- configure haproxy (for apiserver)

        haproxy
           |
 -------------------------
 |                       |
 M1:                     M2:
 api                     api
 etcd                    etcd
 control-manager         control-manager
 scheduler               scheduler

 W1:                      W2:
 gen certs                TLS Bootstrap:
 config kubelet             - w2 creates and configure certs itself
 renew certs                - config kubelet
 config kube-proxy          - w2 to renew certs by itself
                            - config kube-proxy


TLS bootstrap:
1- in Master
 - create bootstrap token and associate it to group "system:bootstrappers"
 - assign role "system:node-bootstrapper" to group "system:bootstrappers"
 - assing role "system:certificates.k8s.io:certificatesigningrequests:nodeclient" to group "system:bootstrappers"
 - assing role "system:certificates.k8s.io:certificatesigningrequests:selfnodeclient" to group "system:node"

2- kubelet.service
   --bootstrap-kubeconfig="/var/lib/kubelet/bootstrap-kubeconfig" 
       // This is for getting the certs to join the cluster!!
   --rotate-certificates=true // this if for the client certs used to join the cluster (CSR automatic approval)
   --rotate-server-certificates=true // these are the certs we created in the master and copied to the worker manually
the server cert requires CSR manual approval !!!

> kubectl get csr
> kubectl certificate approve csr-XXX


bootsrap-kubeconfig
---
apiVersion: 1
clusters:
- cluster:
    certificate-authority: /var/lib/kubernetes/ca.crt
    server: https://192.168.5.30:6443 //(api-server lb IP)
  name: bootstrap
contexts:
- context:
    cluster: bootstrap
    user: kubelet-bootstrap
  name: bootstrap
current-context: bootstrap
kind: Config
preferences: {}
users:
- name: kubelet-bootstrap
  user:
    token: XXXXXXXXXX

11.5 Testing

11.5.1 manual test

$ kubectl get nodes
              pods -n kube-system (coredns, etcd, kube-paiserver, controller-mamanger, proxy, scheduler, weave)

$ service kube-apiserver status
          kube-controller-manager
          kube-scheduler
          kubelet
          kube-proxy

$ kubectl run nginx
          get pods
          scale --replicas=3 deploy/nginx
          get pods

$ kubectl expose deployment nginx --port=80 --type=NodePort
          get service
$ curl http://worker-1:31850

11.5.2 kubetest

end to end test: 1000 tests (12h) // conformance: 160 tests (1.5h)

1- prepare: creates a namespace for this test
2- creates test pod in this namespace, waits for the pods to come up
3- test: executes curl on one popd to reach the ip of another pod over http
4- record result

$ go get -u k8s.io/test-infra/kubetest
$ kubetest --extract=v1.11.3 (your kubernetes version)
$ cd kubernetes
$ export KUBE_MASTER_IP="192.168.26.10:6443"
$ export KUBE_MASTER=kube-master
$ kubetest --test --provider=skeleton > test-out.txt // takes 12 hours
$ kubetest --test --provider=skeleton --test_args="--ginkgo.focus=[Conformance]" > testout.txt // takes 1.5 hours


$ kubeadm join 172.17.0.93:6443 --token vab2bs.twzblu86r60qommq \
--discovery-token-ca-cert-hash sha256:3c9b88fa034a6f894a21e49ea2e2d52435dd71fa5713f23a7c2aaa83284b6700

12- Official cheatsheet

here

Internet: ID Theft

I have read a bit about ID theft in the internet but today I could read an article about a big figure in this type of crimes.

I didnt realised that ID theft was more profitable that just stealing credit cards, etc. And as well, much more damaging for the victim. It is really interesting the economic damage realised from these actions at a nation level like USA.

At least it seems the cyber criminal wants to get clean and help with a guide in his LinkedIn profile. The info maybe is not super up to date but the focus in strong passwords, password managers and Dual-Factor-Authentication for me is key (a part from having antivirus, up to date software, etc etc)

SNI and ESNI

I am subscribed to this site to get news about SSL/TLS. I am not great at security so want to try to read things like this.

This week there was an article about GFC blocking encrypted SNI. Obviously I had to read about what was ESNI via the Cloudflare link.

From that article, I recognized the SANs from certificates (if you have to renew a certificate with SANs is more expensive, that’s how I learned it). They consider it a hack, not 100% sure why. I thought having encrypted DNS should be enough but I forgot that when you negotiate TLS, that is not encrypted so the SNI you are sending is seen. The picture below, clarified it to me:

So for more details about ESNI, I had to read another entry. So you need TLS 1.3, DNSSEC and DoT/DoH to get the whole thing working. And not everybody support eSNI (rfc3546). As far as I can see, my GC browser doesnt support it and only FF does.

So if I want to get this working in my end I need to encrypt my DNS and use FF. Somehow, I have to be playing with this before because I noticed I had already installed stubby for configuring DNS over TLS. But it wasn’t in use as my resolv.conf is updated every time my laptop wakes up. So I have to change it manually:

cat /etc/resolv.conf
# Generated by NetworkManager
# Check stubby is running
# $ sudo netstat -lnptu | grep stubby
# you can test having wireshark and check tcp 853 to 1.1.1.1 and not seeing # any udp 53.
# dig @127.0.0.0 www.google.com
search mynet
nameserver 127.0.0.1

# netstat -lnptu | grep stubby
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 478658/stubby
tcp6 0 0 ::1:53 :::* LISTEN 478658/stubby
udp 0 0 127.0.0.1:53 0.0.0.0:* 478658/stubby
udp6 0 0 ::1:53 :::* 478658/stubby

After that change, I tried to test it but I couldnt see any traffic on tcp 853. The stubby service was running but something wasn’t ok.

Aug 31 17:34:44 athens stubby[11294]: Could not schedule query: None of the configured upstreams could be used to send queries on the spe>
Aug 31 17:34:44 athens stubby[11294]: Could not schedule query: None of the configured upstreams could be used to send queries on the spe>
Aug 31 17:34:44 athens stubby[11294]: Could not schedule query: None of the configured upstreams could be used to send queries on the spe>
Aug 31 17:34:44 athens stubby[11294]: Could not schedule query: None of the configured upstreams could be used to send queries on the spe>

So I decided to check the config. My config is the default one so it is using some specific servers. I enabled Google and Cloudflare resolvers and restart stubby. After that, we have tcp 853!

# vim /etc/stubby/stubby.yml


# tcpdump -i wlp2s0 tcp port 853
...
8:40:42.680280 IP 192.168.1.158.32850 > one.one.one.one.domain-s: Flags [S], seq 2282297719, win 64240, options [mss 1460,sackOK,TS val 1220711339 ecr 0,nop,wscale 7,tfo cookiereq,nop,nop], length 0
18:40:42.683573 IP one.one.one.one.domain-s > 192.168.1.158.32850: Flags [S.], seq 4197575255, ack 2282297720, win 65535, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
18:40:42.926432 IP 192.168.1.158.39920 > one.one.one.one.domain-s: Flags [S], seq 3775203823, win 64240, options [mss 1460,sackOK,TS val 4179354929 ecr 0,nop,wscale 7,tfo cookiereq,nop,nop], length 0
18:40:42.929220 IP one.one.one.one.domain-s > 192.168.1.158.39920: Flags [S.], seq 911192268, ack 3775203824, win 65535, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
18:40:47.496031 IP 192.168.1.158.49154 > dns.google.domain-s: Flags [S], seq 4032010100, win 64240, options [mss 1460,sackOK,TS val 224906238 ecr 0,nop,wscale 7,tfo cookiereq,nop,nop], length 0
18:40:47.499698 IP dns.google.domain-s > 192.168.1.158.49154: Flags [S.], seq 4016982215, ack 4032010101, win 60192, options [mss 1380,sackOK,TS val 1421566573 ecr 224906238,nop,wscale 8,tfo cookie b0b482362b412e4b,nop,nop], length 0
18:40:47.499728 IP 192.168.1.158.49154 > dns.google.domain-s: Flags [.], ack 1, win 502, options [nop,nop,TS val 224906242 ecr 1421566573], length 0
18:40:47.499886 IP 192.168.1.158.49154 > dns.google.domain-s: Flags [P.], seq 1:261, ack 1, win 502, options [nop,nop,TS val 224906242 ecr 1421566573], length 260
18:40:47.503025 IP dns.google.domain-s > 192.168.1.158.49154: Flags [.], ack 261, win 240, options [nop,nop,TS val 1421566577 ecr 224906242], length 0
18:40:47.514228 IP dns.google.domain-s > 192.168.1.158.49154: Flags [P.], seq 1:3174, ack 261, win 240, options [nop,nop,TS val 1421566585 ecr 224906242], length 3173
18:40:47.514283 IP 192.168.1.158.49154 > dns.google.domain-s: Flags [.], ack 3174, win 480, options [nop,nop,TS val 224906256 ecr 1421566585], length 0

What it looks very clear, it is very verbose. I have “suspender” enabled in GC so there are not may tabs in the background doing things… In my former employer. The firewalls stats showed that DNS was the protocol most used in our corporate network…

So once I have DNSSEC enabled, let’s run the eSNI test.

This is from GC:

So good thing DNSSEC and TLS1.3 are fine. Expected that eSNI is failing.

For FF, eSNI is not enabled by default, and took me a bit to find a blog that showed the correct steps to configure it. This is the winner. I need two changes in my about.config and restart FF. And this is the result for the same test page:

So it is nice to have the whole setup working with FF. It would be great if GC had eSNI support. But still this has to be supported by the destination web server.

ZSWAP

Yesterday read for first time an article about zswap. I though it was something new, but a bit search showed that it started around 2013 as per this article.

I have a 2015 Dell XPS13 with i7, 128GB SSD and 8GB RAM. But some times my systems struggle with memory and when swapping kicks in, the system gets stuck. I have used swappiness in the past but not much improvement (or it was in my former company laptop??). Anyway this is my current swappiness:

# cat /proc/sys/vm/swappiness
60
# sysctl vm.swappiness
vm.swappiness = 60

Even having “suspender” enabled in Chrome, I just have over 2GB RAM free.

$ top
top - 10:23:14 up 6 days, 1:54, 1 user, load average: 0.23, 0.41, 0.44
Tasks: 333 total, 1 running, 332 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 0.2 sy, 0.0 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 56.6/7933.8 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 1.3/6964.0 [| ]

Ok, check you have ZSWAP available in your kernel.

$ uname -a
Linux x 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux

$ cat /boot/config-uname -r | grep -i zswap
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set

$ cat /sys/module/zswap/parameters/enabled
N

If you have “N”in the last command and “CONFIG_ZSWAP=y” then your systems supports zswap but it is not enabled.

$ echo Y | sudo tee /sys/module/zswap/parameters/enabled

$ cat /sys/module/zswap/parameters/enabled
Y

Now you can tune some parameters to increase compression rate (3:1)

# list all parameters
grep . /sys/module/zswap/parameters/*

# change  compression params
echo z3fold | sudo tee /sys/module/zswap/parameters/zpool
echo lzo | sudo tee /sys/module/zswap/parameters/compressor

How to check zswap is working? I followed this email thread to find some clues:

# cd /sys/kernel/debug/zswap

/sys/kernel/debug/zswap# grep . *
duplicate_entry:0
pool_limit_hit:0
pool_total_size:237568
reject_alloc_fail:0
reject_compress_poor:0
reject_kmemcache_fail:0
reject_reclaim_fail:0
same_filled_pages:33
stored_pages:151
written_back_pages:0

###############
## 12h later ##
###############

/sys/kernel/debug/zswap# grep . *
duplicate_entry:0
pool_limit_hit:0
pool_total_size:17944576
reject_alloc_fail:0
reject_compress_poor:8
reject_kmemcache_fail:0
reject_reclaim_fail:0
same_filled_pages:4146
stored_pages:16927
written_back_pages:0

So it seems some values are increasing.

The difficult part seems to make this change to survive a reboot. The easy way, it is just to update “/etc/rc.local”.

# cat /etc/rc.local
....
# enabling zswap - 2020-08
echo Y >/sys/module/zswap/parameters/enabled
echo z3fold >/sys/module/zswap/parameters/zpool

The Art of Resilience

One day when I was a child, I recollect an interview (TV or radio not sure) about a basketball player who have been playing for a long time without an injury. And that was reported as something extraordinary. I think the interview said the moto of this player was “My body and mind are a temple so I look after them very well”. I can’t say who was the player, if it was NBA or something else. I dont think it was a famous player neither. Or maybe this is something that my mind made up from something. Not sure, but that sentence has been with me since them although it has taken years to fully understand. For many years, I have been trying to look after myself (body and mind) as best as I can. And there is always way to improve and things not to forget.

For that reason I read this book. I have already read Ross’ first book back in 2018, so the new one was appealing .

One of the central subjects of his adventure is taking stoicism as a philosophy base. And that is something I feel quite close lately.

Apart from the philosophy, there are many points important for succeeding in such a challenge (without being sick!)

preparation: getting wintered
control your pace
strength training / stress
manage pain
manage fear
humor
importance of food (hunger)
importance of digestion
sleep
your pyramid of needs (Maslow’s)

As the author says, there is no superpower or birth gift. It is just you and the cocktel above to achieve whatever you want.

Never split the difference

Just finished this book. I heard about it from my goland training… and actually it is quite good. You see the extrapolation of negotiation techniques from a FBI negotiator to the business world. One the first things I noticed is how he highlights the importance our the lack of rationality based on “Thinking, fast and slow” when making decisions.

I dont consider myself a good negotiator or bargainer but you can always learn something new like about how to negotiate a pay rise 🙂

It is interesting the focus in:

Mirroring the other side: create rapport
Labelling: create trust reusing words. Proof you are listening.
Look for the “No“: This is quite unusual as you are always pushed for the “yes”. The “No” provides a lot info for getting to the real deal.
Use “How”, “What”. Avoid “Why”.
Body language is very important and how you say things. Keep feelings at bay. Remember the night show’s DJ voice. Be ready to take a punch.
Create the illusion of control in the other side
Find the lair/time-waster. Ensure the next steps.
Ackerman Bargaining: start at 65% of your target price. Increase to 85%, 95% and 100%. Use empathy and different ways to say “no”. Use precise, nonround numbers for the final offer.
Not all is about the money. You can use non monetary items to get the deal (free publicity, etc)
Find Unknown Unknows (aka the black swan)

As the author says, it is better “no deal” than a “bad deal”.

I hope I can remember things for the next time I am in negotiation situation.

journalctl -fu

I rebooted my laptop today and realised that docker wasnt running… It was running before the reboot and I didn’t upgrade anything related to docker (or I thought)

$ docker ps -a
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
$

Let’s check status and start if needed:

root@athens:/var/log# service docker status
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:34:03 BST; 7min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 12015 (code=exited, status=1/FAILURE)
Aug 21 08:34:03 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:34:03 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:34:03 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:34:03 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:34:03 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:34:42 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:34:42 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:34:42 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#
root@athens:/var/log#
root@athens:/var/log# service docker start
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
root@athens:/var/log# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:41:20 BST; 5s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 35305 ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 35305 (code=exited, status=1/FAILURE)
Aug 21 08:41:19 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:41:19 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:19 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:41:20 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#

Ok, so not much info… let check the recommend details:

root@athens:/var/log# journalctl -xe
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit docker.socket has begun execution.
░░
░░ The job identifier is 4236.
Aug 21 08:41:20 athens systemd[1]: Listening on Docker Socket for the API.
░░ Subject: A start job for unit docker.socket has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit docker.socket has finished successfully.
░░
░░ The job identifier is 4236.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit docker.service has entered the 'failed' state with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
░░ Subject: A start job for unit docker.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit docker.service has finished with a failure.
░░
░░ The job identifier is 4113 and the job result is failed.
Aug 21 08:41:20 athens systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit docker.socket has entered the 'failed' state with result 'service-start-limit-hit'.
root@athens:/var/log# systemctl status docker.service log
Unit log.service could not be found.
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:41:20 BST; 1min 2s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 35305 ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 35305 (code=exited, status=1/FAILURE)
Aug 21 08:41:19 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:41:19 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:19 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:41:20 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#

So “journalctl -xe” and “systemctl status docker.service log” gave nothing useful….

So I searched for “docker.socket: Failed with result ‘service-start-limit-hit'” as it was the message that looked more suspicious. I landed here and tried one command to get more logs that I didnt know: “journaltctl -fu docker”

root@athens:/var/log# journalctl -fu docker
-- Logs begin at Sun 2020-02-02 21:12:23 GMT. --
Aug 21 08:42:41 athens dockerd[35469]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaStat
Aug 21 08:42:41 athens dockerd[35469]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaEntry
Aug 21 08:42:41 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:42:41 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:42:41 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:42:41 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:42:41 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:42:41 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:42:41 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:42:41 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:44:32 athens systemd[1]: Starting Docker Application Container Engine…
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.Metrics
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.HugetlbStat
Aug 21 08:44:32 athens dockerd[35538]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.PidsStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.CPUStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.CPUUsage
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.Throttle
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.MemoryStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.MemoryEntry
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.BlkIOStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.BlkIOEntry
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaEntry
Aug 21 08:44:32 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:44:32 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:44:32 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:44:32 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.
Aug 21 08:44:32 athens systemd[1]: Stopped Docker Application Container Engine.

And now, yes, I could see the docker logs properly… and found the culprit and fixed. I am pretty sure the last time I played with “/etc/docker/daemon.json” I restarted docker and it was fine…

Anyway, I learned a new command “journaltctl -fu SERVICE” to troubleshoot services.

Protobuf/gNMI

As usual, I am following Anton’s blog and now I want to follow his series about Protobuf/gNMI. All merit and hard work is for the author. I am just doing copy/paste. All his code related to this topic is in his github repo:

First time I heard about protobuf was in the context of telemetry from Arista LANZ (44.3.7)

Now it is my chance to get some knowledge about it. Protobuf is a new data encoding type (like JSON) meant for speed mainly. Mayor things, this is a binary protocol. And we are going to use Protobuf to encode YANG/OpenConfig. And the transport protocol is going to be gNMI.

Index

0- Create python env
1- Install protobuf
2- Create and compile protobuf file for the OpenConfig modules openconfig-interfaces.yang.
3- Create python script to write protobuf message based on the model compiled earlier
4- Create python script to read that protobuf message
5- Use gNMI: Create python script to get interface configuration from cEOS

0- Create Python Env

$ mkdir protobuf
$ cd protobuf
$ python3 -m virtualenv ENV
$ source ENV/bin/activate
$ python -m pip install grpcio
$ python -m pip install grpcio-tools
$ python -m pip install pyang

1- Install protobuf

For debian:

$ sudo aptitude install protobuf-compile
$ protoc --version
libprotoc 3.12.3

2- Create and compile protobuf file

This is a quite difficult part. Try to install “pyang” for python and clone openconfig. Keep in mind that I have removed “ro” entries manually below:

$ ls -ltr
total 11
-rw-r--r-- 1 tomas tomas 1240 Aug 19 18:37 README.md
-rw-r--r-- 1 tomas tomas 11358 Aug 19 18:37 LICENSE
drwxr-xr-x 3 tomas tomas 4 Aug 19 18:37 release
drwxr-xr-x 4 tomas tomas 12 Aug 19 18:37 doc
drwxr-xr-x 3 tomas tomas 4 Aug 19 18:37 third_party
$
$ pyang -f tree -p ./release/models/ ./release/models/interfaces/openconfig-interfaces.yang
module: openconfig-interfaces
+--rw interfaces
+--rw interface* [name]
+--rw name -> ../config/name
+--rw config
| +--rw name? string
| +--rw type identityref
| +--rw mtu? uint16
| +--rw loopback-mode? boolean
| +--rw description? string
| +--rw enabled? boolean
+--rw hold-time
| +--rw config
| | +--rw up? uint32
| | +--rw down? uint32
+--rw subinterfaces
+--rw subinterface* [index]
+--rw index -> ../config/index
+--rw config
  +--rw index? uint32
  +--rw description? string
  +--rw enabled? boolean

So this is the YANG model that we want to transform into protobuf.

To be honest, If I have to match that output with the content of the file itself, I dont understant it.

As Anton mentions, you need to check the official protobuf guide and protobuf python guide to create the proto file for the interface YANG model. These two links explain the structure of our new protofile.

In one side, I think I understand the process of converting YANG to Protobug. But I should try something myself to be sure 🙂

The .proto code doesn’t appear properly formatted in my blog so you can see it in the fig above or in github.

Compile:

$ protoc -I=. --python_out=. openconfig_interfaces.proto
$ ls -ltr | grep openconfig_interfaces
-rw-r--r-- 1 tomas tomas 1247 Aug 20 14:01 openconfig_interfaces.proto
-rw-r--r-- 1 tomas tomas 20935 Aug 20 14:03 openconfig_interfaces_pb2.py

3- Create python script to write protobuf

The script has a dict “intend” to be used to populate the proto message. Once it is populated with the info, it is written to a file as byte stream.

$ python create_protobuf.py oc_if.bin
$ file oc_if.bin
oc_if.bin: data

4- Create python script to read protobuf

This is based on the next blog entry of Anton’s series.

The script that read the protobuf message is here.

$ python read_protobuf.py oc_if.bin
{'interfaces': {'interface': [{'name': 'Ethernet1', 'config': {'name': 'Ethernet1', 'type': 0, 'mtu': 1514, 'description': 'ABC', 'enabled': True, 'subinterfaces': {'subinterface': [{'index': 0, 'config': {'index': 0, 'description': 'DEF', 'enabled': True}}]}}}, {'name': 'Ethernet2', 'config': {'name': 'Ethernet2', 'type': 0, 'mtu': 1514, 'description': '123', 'enabled': True, 'subinterfaces': {'subinterface': [{'index': 0, 'config': {'index': 0, 'description': '456', 'enabled': True}}]}}}]}}
$

5- Use gNMI with cEOS

This part is based in the third blog from Anton.

The challenge here is how he found out what files to use.

$ ls -ltr gnmi/proto/gnmi/
total 62
-rw-r--r-- 1 tomas tomas 21907 Aug 20 15:10 gnmi.proto
-rw-r--r-- 1 tomas tomas 125222 Aug 20 15:10 gnmi.pb.go
-rw-r--r-- 1 tomas tomas 76293 Aug 20 15:10 gnmi_pb2.py
-rw-r--r-- 1 tomas tomas 4864 Aug 20 15:10 gnmi_pb2_grpc.py
$
$ ls -ltr gnmi/proto/gnmi_ext/
total 14
-rw-r--r-- 1 tomas tomas 2690 Aug 20 15:10 gnmi_ext.proto
-rw-r--r-- 1 tomas tomas 19013 Aug 20 15:10 gnmi_ext.pb.go
-rw-r--r-- 1 tomas tomas 10191 Aug 20 15:10 gnmi_ext_pb2.py
-rw-r--r-- 1 tomas tomas 83 Aug 20 15:10 gnmi_ext_pb2_grpc.py
$

I can see the blog and github doesnt match and I can’t really follow. Based on that, I have created an script to get the interface config from one cEOS switch using gNMI interface:

$ cat gnmi_get_if_config.py 
#!/usr/bin/env python

# Modules
import grpc
from bin.gnmi_pb2_grpc import *
from bin.gnmi_pb2 import *
import json
import pprint

# Own modules
from bin.PathGenerator import gnmi_path_generator

# Variables
path = {'inventory': 'inventory.json'}
info_to_collect = ['openconfig-interfaces:interfaces']


# User-defined functions
def json_to_dict(path):
    with open(path, 'r') as f:
        return json.loads(f.read())


# Body
if __name__ == '__main__':
    inventory = json_to_dict(path['inventory'])

    for td_entry in inventory['devices']:
        metadata = [('username', td_entry['username']), ('password', td_entry['password'])]

        channel = grpc.insecure_channel(f'{td_entry["ip_address"]}:{td_entry["port"]}', metadata)
        grpc.channel_ready_future(channel).result(timeout=5)

        stub = gNMIStub(channel)

        for itc_entry in info_to_collect:
            print(f'Getting data for {itc_entry} from {td_entry["hostname"]} over gNMI...\n')

            intent_path = gnmi_path_generator(itc_entry)
            print("gnmi_path:\n")
            print(intent_path)
            gnmi_message_request = GetRequest(path=[intent_path], type=0, encoding=4)
            gnmi_message_response = stub.Get(gnmi_message_request, metadata=metadata)
            # we get the outout of gnmi_response that is json as string of bytes
            x = gnmi_message_response.notification[0].update[0].val.json_ietf_val
            # decode the string of bytes as string and then transform to pure json
            y = json.loads(x.decode('utf-8'))
            #import ipdb; ipdb.set_trace()
            # print nicely json
            pprint.pprint(y)

This is my cEOS config:

r01#show management api gnmi
Enabled: Yes
Server: running on port 3333, in default VRF
SSL Profile: none
QoS DSCP: none
r01#
r01#
r01#show version
cEOSLab
Hardware version:
Serial number:
Hardware MAC address: 0242.ac8d.adef
System MAC address: 0242.ac8d.adef
Software image version: 4.23.3M
Architecture: i686
Internal build version: 4.23.3M-16431779.4233M
Internal build ID: afb8ec89-73bd-4410-b090-f000f70505bb
cEOS tools version: 1.1
Uptime: 6 weeks, 1 days, 3 hours and 13 minutes
Total memory: 8124244 kB
Free memory: 1923748 kB
r01#
r01#
r01#show ip interface brief
Address
Interface IP Address Status Protocol MTU Owner

Ethernet1 10.0.12.1/30 up up 1500
Ethernet2 10.0.13.1/30 up up 1500
Loopback1 10.0.0.1/32 up up 65535
Loopback2 192.168.0.1/32 up up 65535
Vlan100 1.1.1.1/24 up up 1500
r01#

And it seems to work:

$ python gnmi_get_if_config.py
Getting data for openconfig-interfaces:interfaces from r01 over gNMI…
gnmi_path:
origin: "openconfig-interfaces"
elem {
name: "interfaces"
}
{'openconfig-interfaces:interface': [{'config': {'arista-intf-augments:load-interval': 300,
'description': '',
'enabled': True,
'loopback-mode': False,
'mtu': 0,
'name': 'Ethernet2',
'openconfig-vlan:tpid': 'openconfig-vlan-types:TPID_0X8100',
'type': 'iana-if-type:ethernetCsmacd'},

Summary

It has been be interesting to play with ProtoBug and gNMI but I have just grasped the surface.

Notes

My test env is here.

Other Info

ripe78 cisco telemetry.

cisco live 2019 intro to gRPC

gRPC and GPB for network engineers here.

SR and TI-LFA

Segment Routing (SR) and Topology Independent Loop Free Alternates (TI-LFA)

Intro

As part of having a MPLS SR lab, I wanted to test FRR (Fast Rerouting) solutions. Arista provides support for FRR TI-LFA based on this link. Unfortunately, if you are not a customer you can’t see that 🙁

But there are other links where you can read about TI-LFA. The two from juniper confuses me when calculating P/Q groups in pre-converge time…

https://blogs.juniper.net/en-us/industry-solutions-and-trends/segment-routing-sr-and-topology-independent-loop-free-alternates-ti-lfa

https://storage.googleapis.com/site-media-prod/meetings/NANOG79/2196/20200530_Bonica_The_Evolution_Of_v1.pdf

The documents above explain the evolution from Loop Free Alternates (LFA) to Remote LFA (RLFA) and finally to TI-LFA.

TI-LFA overcomes the limitations of RLFA using SR paths as repair tunnels.

As well, I have tried to read IETF draft and I didn’t understand things better 🙁

And I doubt I am going to improve it here 🙂

As well, Cisco has good presentations (longer and denser) about SR and TI-LFA.

https://www.ciscolive.com/c/dam/r/ciscolive/us/docs/2016/pdf/BRKRST-3020.pdf

https://www.segment-routing.net/tutorials/2016-09-27-topology-independent-lfa-ti-lfa/

Juniper docs mention always “pre-convergence” but Cisco uses “post-convergence”. I think “post” it is more clear.

EOS TI-LFA Limitations

Backup paths are not computed for prefix segments that do not have a host mask (/32 for v4 and /128 for v6).
When TI-LFA is configured, the number of anycast segments generated by a node cannot exceed 10.
Computing TI-LFA backup paths for proxy node segments is not supported.
Backup paths are not computed for node segments corresponding to multi-homed prefixes. The multi-homing could be the result of them being anycast node segments, loopback interfaces on different routers advertising SIDs for the same prefix, node segments leaked between levels and thus being seen as originated from multiple L1-L2 routers.
Backup paths are only computed for segments that are non-ECMP.
Only IS-IS interfaces that are using the point-to-point network type are eligible for protection.
The backup paths are only computed with respect to link/node failure constraints. SRLG constraint is not yet supported.
Link/node protection only supported in the default VRF owing to the lack of non-default VRF support for IS-IS segment-routing.
Backup paths are computed in the same IS-IS level topology as the primary path.
Even with IS-IS GR configured, ASU2, SSO, agent restart are not hitless events for IS-IS SR LFIB routes or tunnels being
protected by backup paths.

LAB

Based on this, I built a lab using 4.24.1.1F 64 bits on EVE-NG. All links have default ISIS cost of 10 (loopbacks are 1) and we have TI-LFA node-protection enabled globally.

The config are quite simple. This is l1r9. The only change is the IP addressing. The links in the diagram show the third octet of the link address range.

!
service routing protocols model multi-agent
!
hostname l1r9
!
spanning-tree mode mstp
!
aaa authorization exec default local
!
no aaa root
!
vrf instance MGMT
!
interface Ethernet1
no switchport
ip address 10.0.10.2/30
isis enable CORE
isis network point-to-point
!
interface Ethernet2
no switchport
ip address 10.0.11.2/30
isis enable CORE
isis network point-to-point
!
interface Ethernet3
no switchport
ip address 10.0.12.1/30
isis enable CORE
isis network point-to-point
!
interface Ethernet4
no switchport
ip address 10.0.13.1/30
isis enable CORE
isis network point-to-point
!
interface Loopback1
description CORE Loopback
ip address 10.0.0.9/32
node-segment ipv4 index 9
isis enable CORE
isis metric 1
!
interface Management1
vrf MGMT
ip address 192.168.249.18/24
!
ip routing
ip routing vrf MGMT
!
ip route vrf MGMT 0.0.0.0/0 192.168.249.1
!
mpls ip
!
mpls label range isis-sr 800000 65536
!
router isis CORE
net 49.0000.0001.0010.0000.0000.0009.00
is-type level-2
log-adjacency-changes
timers local-convergence-delay protected-prefixes
set-overload-bit on-startup wait-for-bgp
!
address-family ipv4 unicast
bfd all-interfaces
fast-reroute ti-lfa mode node-protection
!
segment-routing mpls
router-id 10.0.0.9
no shutdown
adjacency-segment allocation sr-peers backup-eligible
!
management api http-commands
protocol unix-socket
no shutdown
!
vrf MGMT
no shutdown
!

Using this script (using nornir/napalm), I gather the output of all these commands from all routers:

"show isis segment-routing prefix-segments" -> shows if protection is enabled for these segments

"show isis segment-routing adjacency-segments" -> shows is protection is enabled for these segments

"show isis interface" -> shows state of protection configured

"show isis ti-lfa path" -> shows the repair path with the list of all the system IDs from the P-node to the Q-node for every destination/constraint tuple. You will see that even though node protection is configured a link protecting LFA is computed too. This is to fallback to link protecting LFAs whenever the node protecting LFA becomes unavailable.

"show isis ti-lfa tunnel" -> The TI-LFA repair tunnels are just internal constructs that are shared by multiple LFIB routes that compute similar repair paths. This command displays TI-LFA repair tunnels with the primary and backup via information.

"show isis segment-routing tunnel" -> command displays all the IS-IS SR tunnels. The field ‘ TI-LFA tunnel index ’ shows the index of the TI-LFA tunnel protecting the SR tunnel. The same TI-LFA tunnel that protects the LFIB route also protects the corresponding IS-IS SR tunnel.

"show tunnel fib" -> displays tunnels programmed in the tunnel FIB also includes the TI-LFA tunnels along with protected IS-IS SR tunnels.

"show mpls lfib route" -> displays the backup information along with the primary vias for all node/adjacency segments that have TI-LFA backup paths computed.

"show ip route" -> When services like LDP pseudowires, BGP LU, L2 EVPN or L3 MPLS VPN use IS-IS SR tunnels as an underlay, they are automatically protected by TI-LFA tunnels that protect the IS-IS SR tunnels. The ‘show ip route’ command displays the hierarchy of the overlay-underlay-TI-LFA tunnels like below.

This is the output of l1r3 in the initial state (no failures):

/////////////////////////////////////////////////////////////////////////
///                               Device: l1r3                         //      /////////////////////////////////////////////////////////////////////////

command = show isis segment-routing prefix-segments


System ID: 0000.0000.0003			Instance: 'CORE'
SR supported Data-plane: MPLS			SR Router ID: 10.0.0.3

Node: 11     Proxy-Node: 0      Prefix: 0       Total Segments: 11

Flag Descriptions: R: Re-advertised, N: Node Segment, P: no-PHP
                   E: Explicit-NULL, V: Value, L: Local
Segment status codes: * - Self originated Prefix, L1 - level 1, L2 - level 2
  Prefix                      SID Type       Flags                   System ID       Level Protection
  ------------------------- ----- ---------- ----------------------- --------------- ----- ----------
  10.0.0.1/32                   1 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0001  L2    node      
  10.0.0.2/32                   2 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0002  L2    node      
* 10.0.0.3/32                   3 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0003  L2    unprotected
  10.0.0.4/32                   4 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0004  L2    node      
  10.0.0.5/32                   5 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0005  L2    node      
  10.0.0.6/32                   6 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0006  L2    node      
  10.0.0.7/32                   7 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0007  L2    node      
  10.0.0.8/32                   8 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0008  L2    node      
  10.0.0.9/32                   9 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0009  L2    node      
  10.0.0.10/32                 10 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0010  L2    node      
  10.0.0.11/32                 11 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0011  L2    node      

================================================================================

command = show isis segment-routing adjacency-segments


System ID: l1r3			Instance: CORE
SR supported Data-plane: MPLS			SR Router ID: 10.0.0.3
Adj-SID allocation mode: SR-adjacencies
Adj-SID allocation pool: Base: 100000     Size: 16384
Adjacency Segment Count: 4
Flag Descriptions: F: Ipv6 address family, B: Backup, V: Value
                   L: Local, S: Set

Segment Status codes: L1 - Level-1 adjacency, L2 - Level-2 adjacency, P2P - Point-to-Point adjacency, LAN - Broadcast adjacency

Locally Originated Adjacency Segments
Adj IP Address  Local Intf     SID   SID Source                 Flags     Type  
--------------- ----------- ------- ------------ --------------------- -------- 
      10.0.1.1         Et1  100000      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  
      10.0.2.1         Et2  100001      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  
      10.0.5.2         Et4  100002      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  
      10.0.3.2         Et3  100003      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  

Protection 
---------- 
      node 
      node 
      node 
      node 


================================================================================

command = show isis interface


IS-IS Instance: CORE VRF: default

  Interface Loopback1:
    Index: 12 SNPA: 0:0:0:0:0:0
    MTU: 65532 Type: loopback
    Area Proxy Boundary is Disabled
    Node segment Index IPv4: 3
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 1 (Passive Interface)
      Authentication mode: None
      TI-LFA protection is disabled for IPv4
      TI-LFA protection is disabled for IPv6
  Interface Ethernet1:
    Index: 13 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 0D
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6
  Interface Ethernet2:
    Index: 14 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 0E
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6
  Interface Ethernet3:
    Index: 15 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 0F
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6
  Interface Ethernet4:
    Index: 16 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 10
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6

================================================================================

command = show isis ti-lfa path

TI-LFA paths for IPv4 address family
   Topo-id: Level-2
   Destination       Constraint                     Path           
----------------- --------------------------------- -------------- 
   l1r2              exclude node 0000.0000.0002    Path not found 
                     exclude Ethernet2              l1r6           
   l1r8              exclude Ethernet4              l1r4           
                     exclude node 0000.0000.0007    l1r4           
   l1r9              exclude Ethernet4              l1r4           
                     exclude node 0000.0000.0007    l1r4           
   l1r11             exclude Ethernet4              l1r4           
                     exclude node 0000.0000.0007    l1r4           
   l1r10             exclude Ethernet3              l1r7           
                     exclude node 0000.0000.0004    l1r7           
   l1r1              exclude node 0000.0000.0001    Path not found 
                     exclude Ethernet1              Path not found 
   l1r6              exclude Ethernet4              l1r2           
                     exclude node 0000.0000.0007    l1r2           
   l1r7              exclude node 0000.0000.0007    Path not found 
                     exclude Ethernet4              l1r10          
   l1r4              exclude Ethernet3              l1r9           
                     exclude node 0000.0000.0004    Path not found 
   l1r5              exclude Ethernet2              l1r7           
                     exclude node 0000.0000.0002    l1r7           


================================================================================

command = show isis ti-lfa tunnel

Tunnel Index 2
   via 10.0.5.2, 'Ethernet4'
      label stack 3
   backup via 10.0.3.2, 'Ethernet3'
      label stack 3
Tunnel Index 4
   via 10.0.3.2, 'Ethernet3'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 3
Tunnel Index 6
   via 10.0.3.2, 'Ethernet3'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 800009 800004
Tunnel Index 7
   via 10.0.5.2, 'Ethernet4'
      label stack 3
   backup via 10.0.3.2, 'Ethernet3'
      label stack 800010 800007
Tunnel Index 8
   via 10.0.2.1, 'Ethernet2'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 800006 800002
Tunnel Index 9
   via 10.0.5.2, 'Ethernet4'
      label stack 3
   backup via 10.0.2.1, 'Ethernet2'
      label stack 3
Tunnel Index 10
   via 10.0.2.1, 'Ethernet2'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 3

================================================================================

command = show isis segment-routing tunnel

 Index    Endpoint         Nexthop      Interface     Labels       TI-LFA       
                                                                   tunnel index 
-------- --------------- ------------ ------------- -------------- ------------ 
 1        10.0.0.1/32      10.0.1.1     Ethernet1     [ 3 ]        -            
 2        10.0.0.2/32      10.0.2.1     Ethernet2     [ 3 ]        8            
 3        10.0.0.7/32      10.0.5.2     Ethernet4     [ 3 ]        7            
 4        10.0.0.4/32      10.0.3.2     Ethernet3     [ 3 ]        6            
 5        10.0.0.9/32      10.0.5.2     Ethernet4     [ 800009 ]   2            
 6        10.0.0.10/32     10.0.3.2     Ethernet3     [ 800010 ]   4            
 7        10.0.0.11/32     10.0.5.2     Ethernet4     [ 800011 ]   2            
 8        10.0.0.8/32      10.0.5.2     Ethernet4     [ 800008 ]   2            
 9        10.0.0.6/32      10.0.5.2     Ethernet4     [ 800006 ]   9            
 10       10.0.0.5/32      10.0.2.1     Ethernet2     [ 800005 ]   10           


================================================================================

command = show tunnel fib


Type 'IS-IS SR', index 1, endpoint 10.0.0.1/32, forwarding None
   via 10.0.1.1, 'Ethernet1' label 3

Type 'IS-IS SR', index 2, endpoint 10.0.0.2/32, forwarding None
   via TI-LFA tunnel index 8 label 3
      via 10.0.2.1, 'Ethernet2' label 3
      backup via 10.0.5.2, 'Ethernet4' label 800006 800002

Type 'IS-IS SR', index 3, endpoint 10.0.0.7/32, forwarding None
   via TI-LFA tunnel index 7 label 3
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 800010 800007

Type 'IS-IS SR', index 4, endpoint 10.0.0.4/32, forwarding None
   via TI-LFA tunnel index 6 label 3
      via 10.0.3.2, 'Ethernet3' label 3
      backup via 10.0.5.2, 'Ethernet4' label 800009 800004

Type 'IS-IS SR', index 5, endpoint 10.0.0.9/32, forwarding None
   via TI-LFA tunnel index 2 label 800009
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 3

Type 'IS-IS SR', index 6, endpoint 10.0.0.10/32, forwarding None
   via TI-LFA tunnel index 4 label 800010
      via 10.0.3.2, 'Ethernet3' label 3
      backup via 10.0.5.2, 'Ethernet4' label 3

Type 'IS-IS SR', index 7, endpoint 10.0.0.11/32, forwarding None
   via TI-LFA tunnel index 2 label 800011
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 3

Type 'IS-IS SR', index 8, endpoint 10.0.0.8/32, forwarding None
   via TI-LFA tunnel index 2 label 800008
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 3

Type 'IS-IS SR', index 9, endpoint 10.0.0.6/32, forwarding None
   via TI-LFA tunnel index 9 label 800006
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.2.1, 'Ethernet2' label 3

Type 'IS-IS SR', index 10, endpoint 10.0.0.5/32, forwarding None
   via TI-LFA tunnel index 10 label 800005
      via 10.0.2.1, 'Ethernet2' label 3
      backup via 10.0.5.2, 'Ethernet4' label 3

Type 'TI-LFA', index 2, forwarding None
   via 10.0.5.2, 'Ethernet4' label 3
   backup via 10.0.3.2, 'Ethernet3' label 3

Type 'TI-LFA', index 4, forwarding None
   via 10.0.3.2, 'Ethernet3' label 3
   backup via 10.0.5.2, 'Ethernet4' label 3

Type 'TI-LFA', index 6, forwarding None
   via 10.0.3.2, 'Ethernet3' label 3
   backup via 10.0.5.2, 'Ethernet4' label 800009 800004

Type 'TI-LFA', index 7, forwarding None
   via 10.0.5.2, 'Ethernet4' label 3
   backup via 10.0.3.2, 'Ethernet3' label 800010 800007

Type 'TI-LFA', index 8, forwarding None
   via 10.0.2.1, 'Ethernet2' label 3
   backup via 10.0.5.2, 'Ethernet4' label 800006 800002

Type 'TI-LFA', index 9, forwarding None
   via 10.0.5.2, 'Ethernet4' label 3
   backup via 10.0.2.1, 'Ethernet2' label 3

Type 'TI-LFA', index 10, forwarding None
   via 10.0.2.1, 'Ethernet2' label 3
   backup via 10.0.5.2, 'Ethernet4' label 3

================================================================================

command = show mpls lfib route

MPLS forwarding table (Label [metric] Vias) - 14 routes 
MPLS next-hop resolution allow default route: False
Via Type Codes:
          M - MPLS via, P - Pseudowire via,
          I - IP lookup via, V - VLAN via,
          VA - EVPN VLAN aware via, ES - EVPN ethernet segment via,
          VF - EVPN VLAN flood via, AF - EVPN VLAN aware flood via,
          NG - Nexthop group via
Source Codes:
          G - gRIBI, S - Static MPLS route,
          B2 - BGP L2 EVPN, B3 - BGP L3 VPN,
          R - RSVP, LP - LDP pseudowire,
          L - LDP, M - MLDP,
          IP - IS-IS SR prefix segment, IA - IS-IS SR adjacency segment,
          IL - IS-IS SR segment to LDP, LI - LDP to IS-IS SR segment,
          BL - BGP LU, ST - SR TE policy,
          DE - Debug LFIB

 IA  100000   [1]
                via M, 10.0.1.1, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                 interface Ethernet1
 IA  100001   [1]
                via TI-LFA tunnel index 8, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.2.1, Ethernet2, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800006 800002
 IA  100002   [1]
                via TI-LFA tunnel index 7, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label 800010 800007
 IA  100003   [1]
                via TI-LFA tunnel index 6, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.3.2, Ethernet3, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800009 800004
 IP  800001   [1], 10.0.0.1/32
                via M, 10.0.1.1, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                 interface Ethernet1
 IP  800002   [1], 10.0.0.2/32
                via TI-LFA tunnel index 8, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.2.1, Ethernet2, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800006 800002
 IP  800004   [1], 10.0.0.4/32
                via TI-LFA tunnel index 6, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.3.2, Ethernet3, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800009 800004
 IP  800005   [1], 10.0.0.5/32
                via TI-LFA tunnel index 10, swap 800005 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.2.1, Ethernet2, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label imp-null(3)
 IP  800006   [1], 10.0.0.6/32
                via TI-LFA tunnel index 9, swap 800006 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.2.1, Ethernet2, label imp-null(3)
 IP  800007   [1], 10.0.0.7/32
                via TI-LFA tunnel index 7, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label 800010 800007
 IP  800008   [1], 10.0.0.8/32
                via TI-LFA tunnel index 2, swap 800008 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label imp-null(3)
 IP  800009   [1], 10.0.0.9/32
                via TI-LFA tunnel index 2, swap 800009 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label imp-null(3)
 IP  800010   [1], 10.0.0.10/32
                via TI-LFA tunnel index 4, swap 800010 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.3.2, Ethernet3, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label imp-null(3)
 IP  800011   [1], 10.0.0.11/32
                via TI-LFA tunnel index 2, swap 800011 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label imp-null(3)

================================================================================

command = show ip route


VRF: default
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

Gateway of last resort is not set

 I L2     10.0.0.1/32 [115/11] via 10.0.1.1, Ethernet1
 I L2     10.0.0.2/32 [115/11] via 10.0.2.1, Ethernet2
 C        10.0.0.3/32 is directly connected, Loopback1
 I L2     10.0.0.4/32 [115/11] via 10.0.3.2, Ethernet3
 I L2     10.0.0.5/32 [115/21] via 10.0.2.1, Ethernet2
 I L2     10.0.0.6/32 [115/21] via 10.0.5.2, Ethernet4
 I L2     10.0.0.7/32 [115/11] via 10.0.5.2, Ethernet4
 I L2     10.0.0.8/32 [115/31] via 10.0.5.2, Ethernet4
 I L2     10.0.0.9/32 [115/21] via 10.0.5.2, Ethernet4
 I L2     10.0.0.10/32 [115/21] via 10.0.3.2, Ethernet3
 I L2     10.0.0.11/32 [115/31] via 10.0.5.2, Ethernet4
 C        10.0.1.0/30 is directly connected, Ethernet1
 C        10.0.2.0/30 is directly connected, Ethernet2
 C        10.0.3.0/30 is directly connected, Ethernet3
 I L2     10.0.4.0/30 [115/20] via 10.0.2.1, Ethernet2
 C        10.0.5.0/30 is directly connected, Ethernet4
 I L2     10.0.6.0/30 [115/20] via 10.0.3.2, Ethernet3
 I L2     10.0.7.0/30 [115/30] via 10.0.2.1, Ethernet2
                               via 10.0.5.2, Ethernet4
 I L2     10.0.8.0/30 [115/20] via 10.0.5.2, Ethernet4
 I L2     10.0.9.0/30 [115/30] via 10.0.5.2, Ethernet4
 I L2     10.0.10.0/30 [115/20] via 10.0.5.2, Ethernet4
 I L2     10.0.11.0/30 [115/30] via 10.0.5.2, Ethernet4
 I L2     10.0.12.0/30 [115/30] via 10.0.3.2, Ethernet3
                                via 10.0.5.2, Ethernet4
 I L2     10.0.13.0/30 [115/30] via 10.0.5.2, Ethernet4


================================================================================

In l1r3 we can see:

show isis segment-routing prefix-segments: all prefix segments are under “node” protection (apart from itself – 10.0.0.3/32)
show isis segment-routing adjacency-segments: all adjacent segments are under “node” protection.
show isis interface: All isis enabled interfaces (apart from loopback1) have TI-LFA node protection enabled for ipv4.
show isis ti-lfa path: Here we can see link and node protection to all possible destinations in our ISIS domain (all P routers in our BGP-Free core). When node protection is not possible, link protection is calculated. The exception is l1r1 because it has only one link into the networks, so if that is lost, there is no backup at all.
show isis ti-lfa tunnel: This can be confusing. These are the TI-LFA tunnels, the first two lines refer to the path they are protecting. The last two lines are really the tunnel configuration. Another interesting thing here is the label stack for some backup tunnels (index 6, 7, 8). This a way to avoid a loop. The index is used in the next command.
show isis segment-routing tunnel: Here we see the current SR tunnels and the corresponding backup (index that refers to above command). Label [3] is the implicit null label. Paying attention to the endpoint “10.0.0.2/32” (as per fig2 below). We can see the primary path is via eth2. The backup is via tunnel index 8 (via eth4 – l1r7). If you check the path to “10.0.0.2/32 – 800002” from l1r7 (output after fig2) you can see it is pointing back to l1r3 and we would have a loop! For this reason the backup tunnel index 8 in l1r3 has a label stack to avoid this loop (800006 800002). Once l1r7 received this packet and checks the segment labels, it sends the packet to 800006 via eth2 (l1r6) and then l1r6 uses 8000002 to reach finally l1r2 (via l1r5).

l1r7# show isis segment-routing tunnel
Index Endpoint Nexthop Interface Labels TI-LFA
tunnel index

1 10.0.0.9/32 10.0.10.2 Ethernet3 [ 3 ] 3
2 10.0.0.6/32 10.0.8.1 Ethernet2 [ 3 ] 1
3 10.0.0.3/32 10.0.5.1 Ethernet1 [ 3 ] 2
4 10.0.0.10/32 10.0.10.2 Ethernet3 [ 800010 ] 7
5 10.0.0.11/32 10.0.10.2 Ethernet3 [ 800011 ] 4
6 10.0.0.4/32 10.0.5.1 Ethernet1 [ 800004 ] 11
7 10.0.0.8/32 10.0.8.1 Ethernet2 [ 800008 ] -
- 10.0.10.2 Ethernet3 [ 800008 ] -
8 10.0.0.2/32 10.0.5.1 Ethernet1 [ 800002 ] 9
9 10.0.0.5/32 10.0.8.1 Ethernet2 [ 800005 ] 8
10 10.0.0.1/32 10.0.5.1 Ethernet1 [ 800001 ] 10
l1r7#
l1r7#show mpls lfib route 800006
...
IP 800006 [1], 10.0.0.6/32
via TI-LFA tunnel index 1, pop
payload autoDecide, ttlMode uniform, apply egress-acl
via 10.0.8.1, Ethernet2, label imp-null(3)
backup via 10.0.10.2, Ethernet3, label 800008 800006
l1r7#
l1r7#show mpls lfib route 800002
...
IP 800002 [1], 10.0.0.2/32
via TI-LFA tunnel index 9, swap 800002
payload autoDecide, ttlMode uniform, apply egress-acl
via 10.0.5.1, Ethernet1, label imp-null(3)
backup via 10.0.8.1, Ethernet2, label imp-null(3)

show tunnel fib: you can see all “IS-IS SR” and “TI-LFA” tunnels defined. It is like a merge of “show isis segment-routing tunnel” and “show isis ti-lfa tunnel”.
show mpls lfib route: You can see the programmed labels and TI-LFA. I’ve got confused when I see “imp-null” and the I see some pop/swap for the same entry…
show ip route: nothing really interesting without L3VPNS

Testing

Ok, you need to generate traffic that is labelled to really test TI-LFA and with enough packet rate to see if you are close to the 50ms recovery promissed.

So I have had to make some changes:

create a L3VPN CUST-A (evpn) in l1r3 and l1r9, so they are PEs
l1r1 and l1r11 are CPE in VRF CUST-A

All other devices have no changes

We need to test with and without TI-LFA enabled. The test I have do is to ping from l1r1 to l1r11 and dropping the link l1r3-l1r7, while l1r3 has enabled/disabled TI-LFA.

Routing changes with TI-LFA enabled


BEFORE DROPPING LINK
======

l1r3#show ip route vrf CUST-A

 B I      10.0.13.0/30 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                  via TI-LFA tunnel index 4, label 800009
                                     via 10.0.5.2, Ethernet4, label imp-null(3)
                                     backup via 10.0.3.2, Ethernet3, label imp-null(3)
 C        192.168.0.3/32 is directly connected, Loopback2
 B I      192.168.0.9/32 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                    via TI-LFA tunnel index 4, label 800009
                                       via 10.0.5.2, Ethernet4, label imp-null(3)
                                       backup via 10.0.3.2, Ethernet3, label imp-null(3)

AFTER DROPPING LINK
======

l1r3#show ip route vrf CUST-A

 B I      10.0.13.0/30 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                  via TI-LFA tunnel index 11, label 800009
                                     via 10.0.3.2, Ethernet3, label imp-null(3)
                                     backup via 10.0.2.1, Ethernet2, label 800005
 C        192.168.0.3/32 is directly connected, Loopback2
 B I      192.168.0.9/32 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                    via TI-LFA tunnel index 11, label 800009
                                       via 10.0.3.2, Ethernet3, label imp-null(3)

Ping results

TI-LFA enabled in L1R3  TEST1
=========================

bash-4.2# ping -f 10.0.13.2
PING 10.0.13.2 (10.0.13.2) 56(84) bytes of data.
..................^C                                                                                                      
--- 10.0.13.2 ping statistics ---
1351 packets transmitted, 1333 received, 1% packet loss, time 21035ms
rtt min/avg/max/mdev = 21.081/348.764/1722.587/487.280 ms, pipe 109, ipg/ewma 15.582/67.643 ms
bash-4.2# 


NO TI-LFA enabled in L1R3  TEST1
=========================

bash-4.2# ping -f 10.0.13.2
PING 10.0.13.2 (10.0.13.2) 56(84) bytes of data.
.............................................E...................................................................................^C            
--- 10.0.13.2 ping statistics ---
2274 packets transmitted, 2172 received, +1 errors, 4% packet loss, time 36147ms
rtt min/avg/max/mdev = 20.965/88.300/542.279/86.227 ms, pipe 34, ipg/ewma 15.903/73.403 ms
bash-4.2#

Summary Testing

With TI-LFA enabled in l1r3, we have lost 18 packets (around 280ms)

Without TI-LFA in l1r3, we have lost 102 packets (around 1621ms =~ 1.6s)

Keeping in mind this lab is based in VMs (veos) running in another VM (eve-ng) is not bad result.

It seems far from the 50ms, but still shows the improvement of enabling TI-LFA

Docker + Kubernetes

For some time, I wanted to take a look at kubernetes. There is a lot of talking about microservices in the cloud and after attending some meetups, I wasnt sure what was all this about so I signed for kodekloud to learn about it.

So far, I have completed the beginners course for Docker and Kubernetes. To be honest, I think the product is very good value for money.

I have been using docker a bit the last couple of months but still wanted to take a bit more info to improve my knowledge.

I was surprised when reading that kubernets pods rely on docker images.

Docker Notes

Docker commands

docker run -it xxx (interactive+pseudoterminal)
docker run -d xxx (detach)
docker attach ID (attach)
docker run --name TEST xxx (provide name to container)
docker run -p 80:5000 xxx (maps host port 80 to container port 5000)

docker run -v /opt/datadir:/var/lib/mysql mysql (map a host folder to container folder for data persistence)

docker run -e APP_COLOR=blue xxx (pass env var to the container)
docker inspect "container"  -> check IP, env vars, etc
docker logs "container"

docker build . -t account_name/app_name
docker login
docker push account_name/app_name

docker -H=remote-docker-engine:2375 xxx

cgroups: restrict resources in container
  docker run --cpus=.5  xxx (no more than 50% CPU)
  docker run --memory=100m xxx (no more than 100M memory)

Docker File

----
FROM Ubuntu
ENTRYPOINT ["sleep"]
CMD ["5"]        --> if you dont pass any value in "docker run .." it uses by default 5.
----

Docker Compose

$ cat docker-compose.yml
version: "3"
services:
 db:
  image: postgres
  environment:
    - POSTGRES_PASSWORD=mysecretpassword
 wordpress:
  image: wordpress
  links:
    - db
  ports:
    - 8085:80


verify file: $ docker-compose config

Docker Volumes

docker volume create NAME  --> create /var/lib/docker/volumes/NAME

docker run -v NAME:/var/lib/mysql mysql  (docker volume)
or
docker run -v PATH:/var/lib/mysql mysql  (local folder)
or
docker run --mount type=bind,source=/data/mysql,target=/var/lib/mysql mysql

Docker Networking

networks: --network=xxx
 bridge (default)
 none   isolation
 host (only communication with other containers)   

docker network create --driver bridge --subnet x.x.x.x/x NAME
docker network ls
               inspect

Docker Swarm

I didnt use this, just had the theory. This is for clustering docker hosts: manager, workers.

   manager: docker swarm init
   workers: docker swarm join --token xxx
   manager: docker service create --replicas=3 my-web-server

Kubernetes Notes

container + orchestration
 (docker)    (kubernetes)

node: virtual or physical, kube is installed here
cluster: set of nodes
master: node that manage clusters

kube components:
  api,
  etcd (key-value store),
  scheduler (distribute load),
  kubelet (agent),
  controller (brain: check status),
  container runtime (sw to run containers: docker)

master: api, etcd, controller, scheduler,
   $ kubectl cluster-info
           get nodes -o wide (extra info)
node: kubelet, container

Setup Kubernetes with minikube

Setting up kubernetes doesnt look like an easy task so there are tools to do that like microk8s, kubeadm (my laptop needs more RAM, can’t handle 1master+2nodes) and minikube.

minikube needs: virtualbox(couldnt make it work with kvm2…) and kubectl

Install kubectl

I assume virtualbox is already installed

$ curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"

$ chmod +x ./kubectl
$ sudo mv ./kubectl /usr/local/bin/kubectl
$ kubectl version --client

Install minikube

$ grep -E --color 'vmx|svm' /proc/cpuinfo   // verify your CPU support 
                                               virtualization
$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
>   && chmod +x minikube
$ sudo install minikube /usr/local/bin/

Start/Status minikube

$ minikube start --driver=virtualbox  --> it takes time!!!! 2cpu + 2GB ram !!!!
😄  minikube v1.12.3 on Debian bullseye/sid
✨  Using the virtualbox driver based on user configuration
💿  Downloading VM boot image ...
    > minikube-v1.12.2.iso.sha256: 65 B / 65 B [-------------] 100.00% ? p/s 0s
    > minikube-v1.12.2.iso: 173.73 MiB / 173.73 MiB [] 100.00% 6.97 MiB p/s 25s
👍  Starting control plane node minikube in cluster minikube
💾  Downloading Kubernetes v1.18.3 preload ...
    > preloaded-images-k8s-v5-v1.18.3-docker-overlay2-amd64.tar.lz4: 510.91 MiB
🔥  Creating virtualbox VM (CPUs=2, Memory=2200MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.18.3 on Docker 19.03.12 ...
🔎  Verifying Kubernetes components...
🌟  Enabled addons: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube"

$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

$ kubectl get nodes
NAME       STATUS   ROLES    AGE     VERSION
minikube   Ready    master   5m52s   v1.18.3

$ minikube stop  // stop the virtualbox VM to free up resources once you finish

Basic Test

$ kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.10
deployment.apps/hello-minikube created

$ kubectl get deployments
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
hello-minikube   1/1     1            1           22s

$ kubectl expose deployment hello-minikube --type=NodePort --port=8080
service/hello-minikube exposed

$ minikube service hello-minikube --url
http://192.168.99.100:30020

$ kubectl delete services hello-minikube
$ kubectl delete deployment hello-minikube
$ kubectl get pods

Pods

Based on documentation:

Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
A Pod is a group of one or more containers, with shared storage/network resources, and a specification for how to run the containers. A Pod's contents are always co-located and co-scheduled, and run in a shared context. A Pod models an application-specific "logical host": it contains one or more application containers which are relatively tightly coupled. In non-cloud contexts, applications executed on the same physical or virtual machine are analogous to cloud applications executed on the same logical host.

$ kubectl run nginx --image=nginx
$ kubectl describe pod nginx
$ kubectl get pods -o wi
$ kubectl delete pod nginx

Pods – Yaml

Pod yaml structure:

pod-definition.yml:
---
apiVersion: v1
kind: (type of object: Pod, Service, ReplicatSet, Deployment)
metadata: (only valid k-v)
 name: myapp-pod
 labels: (any kind of k-v)
   app: myapp
   type: front-end
spec:
  containers:
   - name: nginx-container
     image: nginx

Example:

$ cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    app: nginx
    type: frontend
spec:
  containers:
  - name: nginx
    image: nginx

$ kubectl apply -f pod.yaml 
$ kubectl get pods

Replica-Set

Based on documentation:

A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.

> cat replicaset-definition.yml
---
 apiVersion: apps/v1
 kind: ReplicaSet
 metadata:
   name: myapp-replicaset
   labels:
     app: myapp
     type: front-end
 spec:
   template:
     metadata:      -------
       name: nginx         |
       labels:             |
         app: nginx        |
         type: frontend    |-> POD definition
     spec:                 |
       containers:         |
       - name: nginx       |
         image: nginx  -----
   replicas: 3
   selector:       <-- main difference from replication-controller
     matchLabels:
       type: front-end
       
> kubectl create -f replicaset-definition.yml
> kubectl get replicaset
> kubectl get pods

> kubectl delete replicaset myapp-replicaset

How to scale via replica-set

> kubectl replace -f replicaset-definition.yml  (first update file to replicas: 6)

> kubectl scale --replicas=6 -f replicaset-definition.yml  // no need to modify file

> kubectl scale --replicas=6 replicaset myapp-replicaset   // no need to modify file

> kubectl edit replicaset myapp-replicaset (NAME of the replicaset!!!)

> kubectl describe replicaset myapp-replicaset

> kubectl get rs new-replica-set -o yaml > new-replica-set.yaml ==> returns the rs definition in yaml!

Deployments

Based on documentation:

A Deployment provides declarative updates for Pods ReplicaSets.
You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate. You can define Deployments to create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments.

Example:

cat deployment-definition.yml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
  labels:
    app: myapp
    type: front-end
spec:
  template:
    metadata:
      name: myapp-pod
      labels:
        app: myapp
        type: front-end
    spec:
      containers:
      - name: nginx-controller
        image: nginx
  replicas: 3
  selector:
    matchLabels:
      type: front-end


> kubectl create -f deployment-definition.yml
> kubectl get deployments
> kubectl get replicaset
> kubectl get pods

> kubectl get all

Update/Rollback

From documentation.

By default, it follows a “rolling update”: destroy one, create new one. So this doesnt cause an outage

$ kubectl create -f deployment.yml --record
$ kubectl rollout status deployment/myapp-deployment
$ kubectl rollout history deployment/myapp-deployment
$ kubectl rollout undo deployment/myapp-deployment ==> rollback!!!

Networking

Not handled natively by kubernetes, you need another tool like calico, weave, etc. More info here. This has not been covered in details yet. It looks complex (a network engineer talking…)

Services

Based on documentation:

An abstract way to expose an application running on a set of Pods as a network service.
With Kubernetes you don't need to modify your application to use an unfamiliar service discovery mechanism. Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them.

types:
   NodePort: like docker port-mapping
   ClusterIP:
   LoadBalancer

Examples:

nodeport
--------
service: like a virtual server
  targetport - in the pod: 80
  service - 80
  nodeport: 30080 (in the node)
  
service-definition.yml
apiVersion: v1
kind: Service
metadata:
  name: mypapp-service
spec:
  type: NodePort
  ports:
  - targetPort: 80
    port: 80
    nodePort: 30080  (range: 30000-32767)
  selector:
    app: myapp
    type: front-end

> kubectl create -f service-definition.yml
> kubectl get services
> minikube service mypapp-service


clusterip: 
---------
service-definition.yml
apiVersion: v1
kind: Service
metadata:
  name: back-end
spec:
  type: ClusterIP
  ports:
  - targetPort: 80
    port: 80
  selector:
    app: myapp
    type: back-end


loadbalance: gcp, aws, azure only !!!!
-----------
service-definition.yml
apiVersion: v1
kind: Service
metadata:
  name: back-end
spec:
  type: LoadBalancer
  ports:
  - targetPort: 80
    port: 80
    nodePort: 30080
  selector:
    app: myapp


> kubectl create -f service-definition.yml
> kubectl get services

Microservices architecture example

Diagram
=======

voting-app     result-app
 (python)       (nodejs)
   |(1)           ^ (4)
   v              |
in-memoryDB       db
 (redis)       (postgresql)
    ^ (2)         ^ (3)
    |             |
    ------- -------
          | |
         worker
          (.net)

1- deploy containers -> deploy PODs (deployment)
2- enable connectivity -> create service clusterIP for redis
                          create service clusterIP for postgres
3- external access -> create service NodePort for voting
                      create service NodePort for result

Code here. Steps:

$ kubectl create -f voting-app-deployment.yml
$ kubectl create -f voting-app-service.yml

$ kubectl create -f redis-deployment.yml
$ kubectl create -f redis-service.yml

$ kubectl create -f postgres-deployment.yml
$ kubectl create -f postgres-service.yml

$ kubectl create -f worker-deployment.yml

$ kubectl create -f result-app-deployment.yml
$ kubectl create -f result-app-service.yml

Verify:

$ minikube service voting-service --url
http://192.168.99.100:30004
$ minikube service result-service --url
http://192.168.99.100:30005