CCNA DevNet Notes

1) Python Requests status code checks:

r.status_code == requests.codes.ok

2) Docker publish ports:

$ docker run -p 127.0.0.1:80:8080/tcp ubuntu bash

This binds port 8080 of the container to TCP port 80 on 127.0.0.1 of the host machine. You can also specify udp and sctp ports. The Docker User Guide explains in detail how to manipulate ports in Docker.

3) HTTP status codes:

1xx informational
2xx Successful
 201 created
 204 no content (post received by server)
3xx Redirect
 301 moved permanently - future requests should be directed to the given URI
 302 found - requested resource resides temporally under a different URI
 304 not modified
4xx Client Error
 400 bad request
 401 unauthorized (user not authenticated or failed)
 403 forbidden (need permissions)
 404 not found
5xx Server Error
 500 internal server err - generic error message
 501 not implemented
 503 service unavailable

4) Python dictionary filters:

my_dict = {8:'u',4:'t',9:'z',10:'j',5:'k',3:'s'}

# filter(function,iterables)
new_dict = dict(filter(lambda val: val[0] % 3 == 0, my_dict.items()))

print("Filter dictionary:",new_filt)

5) HTTP Authentication

Basic: For "Basic" authentication the credentials are constructed by first combining the username and the password with a colon (aladdin:opensesame), and then by encoding the resulting string in base64 (YWxhZGRpbjpvcGVuc2VzYW1l).

Authorization: Basic YWxhZGRpbjpvcGVuc2VzYW1l

---
auth_type = 'Basic'
creds = '{}:{}'.format(user,pass)
creds_b64 = base64.b64encode(creds)
header = {'Authorization': '{}{}'.format(auth_type,creds_b64)}

Bearer:

Authorization: Bearer <TOKEN>

6) “diff -u file1.txt file2.txt”. link1 link2

The unified format is an option you can add to display output without any redundant context lines

$ diff -u file1.txt file2.txt                                                                                                            
--- file1.txt   2018-01-11 10:39:38.237464052 +0000                                                                                              
+++ file2.txt   2018-01-11 10:40:00.323423021 +0000                                                                                              
@@ -1,4 +1,4 @@                                                                                                                                  
 cat                                                                                                                                             
-mv                                                                                                                                              
-comm                                                                                                                                            
 cp                                                                                                                                              
+diff                                                                                                                                            
+comm

The first file is indicated by —

The second file is indicated by +++.

The first two lines of this output show us information about file 1 and file 2. It lists the file name, modification date, and modification time of each of our files, one per line.
The lines below display the content of the files and how to modify file1.txt to make it identical to file2.txt.
- (minus) – it needs to be deleted from the first file.
+ (plus) – it needs to be added to the first file.

The next line has two at sign @ followed by a line range from the first file (in our case lines 1 through 4, separated by a comma) prefixed by “-“ and then space and then again followed by a line range from the second file prefixed by “+” and at the end two at sign @. Followed by the file content in output tells us which line remain unchanged and which lines needs to added or deleted(indicated by symbols) in the file 1 to make it identical to file 2.

7) Python Testing: Assertions

.assertEqual(a, b)	a == b
.assertTrue(x)	        bool(x) is True
.assertFalse(x)	        bool(x) is False
.assertIs(a, b)	        a is b
.assertIsNone(x)	x is None
.assertIn(a, b)	        a in b
.assertIsInstance(a, b)	isinstance(a, b)

*** .assertIs(), .assertIsNone(), .assertIn(), and .assertIsInstance() all have opposite methods, named .assertIsNot(), and so forth.

ARP Storms – EVPN

We have had an issue with broadcast storms in our network. Checking the CoPP setup in the switches, we could see massive drops of ARP. This is a good link to know how to check CoPP drops in NXOS.

N9K:# show copp status
N9K# show policy-map interface control-plane | grep 'dropped [1-9]' | diff

Having so many ARP drops by CoPP is bad because very likely good ARP requests are going to be dropped.

Initially i thought it was related to ARP problems in EVPN like this link. But after taking a packet capture in a switch from an interface connected to a server, I could see that over 90% ARP traffic coming from the server was not getting a reply…. Checking in different switches, I could see the same pattern all over the place.

So why the server was making so many ARP requests?

After some time, managed to help help from a sysadmin with access to the servers so could troubleshoot the problem.

But, how do you find the process that is triggering the ARP requests? I didnt make the effort to think about it and started to search for an easy answer. This post gave me a clue.

ss does show you connections that have not yet been resolved by arp. They are in state SYN-SENT. The problem is that such a state is only held for a few seconds then the connection fails, so you may not see it. You could try rapid polling for it with

while ! ss -p state syn-sent | grep 1.1.1.100; do sleep .1; done

Somehow I couldnt see anything anything with “ss” so tried netstat as it shows you too the status of the TCP connection (I wonder what would happen is the connection was UDP instead???)

Initially I tried “netstat -a” and it was too slow to show me “SYN-SENT” status

Shame on me, I had to search how to get to show the ports quickly here:

watch netstat -ntup | grep -i syn_sent | awk '{print $4,$5,$6,$7}'

It was slow because it was trying to resolve all IPs to hostname…. :facepalm. Tha is fixed with “-n” (no-resolve)

Anyway, with the command above, finally managed to see the process that were in “SYN_SENT” state

This is not the real thing, just an example:

#  netstat -ntup | grep -i syn_sent 
tcp        0      1 192.168.1.203:35460     4.4.4.4:23              SYN_SENT    98690/telnet        
#

We could see that the destination port was TCP 179, so something in the node was trying to talk BGP! They were “bird” processes. As the node belonged to a kubernetes cluster, we could see a calico container as CNI. Then we connected to the container and tried to check the bird config. We could see clearly the IPs that dont get ARP reply were configured there.

So in summary, basic TCP:

Very summarize, TCP is L4, then goes down to L3 IP. For getting to L2, you need to know the MAC of the IP, so that triggers the ARP request. Once the MAC is learned, it is cached for the next request. For that reason the first time you make a connection is slow (ping, traceroute, etc)

Now we need to workout why the calico/bird config is that way. Fix it to only use IPs of real BGP speakers and then verify the ARP storms stop.

Hopefully, I will learn a bit about calico.

Notes for UDP:

If I generate an UDP connection to a non-existing IP

$ nc -u 4.4.4.4 4000

netstat tells me the UDP connection is established and I can’t see anything in the ARP table for an external IP, for an internal IP (in my own network) I can see an incomplete entry. Why?

#  netstat -ntup | grep -i 4.4.4.4
udp        0      0 192.168.1.203:42653     4.4.4.4:4000            ESTABLISHED 102014/nc           
# 
#  netstat -ntup | grep -i '192.168.1.2:'
udp        0      0 192.168.1.203:44576     192.168.1.2:4000        ESTABLISHED 102369/nc           
# 
#
# arp -a
? (192.168.1.2) at <incomplete> on wlp2s0
something.mynet (192.168.1.1) at xx:xx:xx:yy:yy:zz [ether] on wlp2s0
# 

# tcpdump -i wlp2s0 host 4.4.4.4
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on wlp2s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:35:45.081819 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:45.081850 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:46.082075 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:47.082294 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:48.082504 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel
#

UDP is stateless so we can’t have states…. so it is always going to be “established”. Basic TCP/UDP
When trying to open an UDP connection to an external IP, you need to “route” so my laptop knows it needs to send the UDP connection to the default gateway, so when getting to L2, the destination MAC address is not 4.4.4.4 is the default gateway MAC. BASIC ROUTING !!!! For that reason you dont see 4.4.4.4 in ARP table
- When trying to open an UDP connection to a local IP, my laptop knows it is in the same network so it should be able to find the destination MAC address using ARP.

The Phoenix Project

I wanted to read this book for some time. I thought it was going to be a technical book but it was a novel and felt like a thriller! and IT thriller if you can believe it. While I was reading it, I felt quite tense at some points, like, “I have been there!”. Although I am not a developer, I felt the pain mentioned in the book. I have been like that I spend many years in a good devops environment. When I started there, I didnt have a clue what devops menat but I learnt on the job training. I wish the networks world could be more “devops” but as we nearly always relay in 3rd party vendors to provide equipment, they always want you to lock in their product. Still, it is possible, but you need to have the drive (and time) and some support from your employer.

One of the things that surprise me from the devops methodology is that is based in manufacturing. I read in the past about Kaizen but now, I can see the connection. One of the main references is the book, The Goal.

And another very important point, nothing of these things work if people are not on board. You can have the smartest people around but if people dont buy in, nothing is accomplished.

So I like the idea of quick iterations (return of investment is received by the company and customer sooner) where you get earlier feedback, interactions and communication between all teams, awareness for the business that IT is everywhere, constant testing/experimentation (chaos monkey, antifragility), kanban boards / flow models to visualize process and constraints (WIP), constant learning, etc.

It was interesting at some point in the book where the main characters where interviewing the top people in the company to gather info about what is important for them and what means successful results and bad days. Then map all that to IT process. From there you can see what is clearly important and what is not. So you can focus in value.

Other things I learned is about the types of work we do:

Business projects
Internal projects
Changes
Unplanned work

And that unplanned work is the killer for any attempt to have a process like a manufacturing plant.

As well, based on “The Goal”, there are a lot of mentions about the “Three Ways”:

Find your constraint: maximize flow -> reduce batch, reduce intervals, increase quality to detect failures before moving to next steps.
Exploit your constraint: fast and constant flow of feedback.
Subordinate your constraint: high-trust culture -> dynamic, disciplined and scientific approach to experiment and risks.

In summary, I enjoyed the book. It was engaging, easy to digest and I learned!

Terraform-Part1

After learning about kubernetes from kodekloud. I want to take a look at Terraform.

These are my notes that I am taking along the course.

1- Intro:

A- config mgmt: ansible, puppet, saltstack

Design to install and manage sw

B- Server Templating: docker, packer, vagrant.

Pre install sw and dependencies

vm or docker images

immutable infra

C- Provision tools: terraform, cloudformation

deploy immutable infra resources

servers, dbs, net components

multiple providers.

Terraform is available in AWS, GCP, Azure and physical machines. Multiple providers like cloudflare, paloalto, dns, infoblox, grafana, influxdb, mongodb, etc

It uses a declarative code HCL = HashiCorp Config Language: *.tf

Phases: Init, plan and apply.

2- Install and Basics

I am going to use my laptop initially, so I will follow the official instructions using a precompiled binary. So download the zip file (terraform_0.14.3_linux_amd64.zip), unzip and move the binary somewhere active in your path. I decided to use /usr/bin and install autocompletion.

/terraform/test1$ which terraform
 /usr/bin/terraform

/terraform/test1$ terraform version
 Terraform v0.14.3
 provider registry.terraform.io/hashicorp/local v2.0.0 

/terraform/test1$ terraform -install-autocomplete

HCL Basics:

<block> <parameters> {
  key1 = value1
  key2 = value2
 }

Examples:

// This one use the resource "local_file". We call it "hello". It creates a file with specific content
$ vim local.tf
 resource "local_file" "hello" {
  filename = "/tmp/hello-terra.txt"
  content = "hello world1"
 }

Based on the above:
 block_name -> resource
 provider type -> local
 resource type -> file
 resource_name: hello
   arguments: filename and content


// The next ones use AWS provider types

$ vim aws-ec2.tf
 resource "aws_instance" "webserver" {
  ami = "ami-asdfasdf"
  instance_type = "t2.micro"
 }

$ vim aws-s3.tf
 resource "aws_s3_bucket" "data" {
   bucket = "webserver-bucket-org-2207"
   acl = "private"
 }

Deployment process:

 0- create *.tf file
 1- terraform init --> prepare env / install pluggins, etc
 2- terraform plan --> steps to be done // review
 3- terraform apply -> execute steps from plan
 4- terraform show

Example using “local_file” resource:

/terraform/test1$ terraform init 
 Initializing the backend…
 Initializing provider plugins…
 Reusing previous version of hashicorp/local from the dependency lock file
 Installing hashicorp/local v2.0.0…
 Installed hashicorp/local v2.0.0 (signed by HashiCorp) 
 Terraform has been successfully initialized!
 You may now begin working with Terraform. Try running "terraform plan" to see
 any changes that are required for your infrastructure. All Terraform commands
 should now work.
 If you ever set or change modules or backend configuration for Terraform,
 rerun this command to reinitialize your working directory. If you forget, other
 commands will detect it and remind you to do so if necessary.
/terraform/test1$ 
/terraform/test1$ terraform plan 
 local_file.hello: Refreshing state… [id=c25325615b8492da77c2280a425a3aa82efda6d3]
 An execution plan has been generated and is shown below.
 Resource actions are indicated with the following symbols:
 create 
 Terraform will perform the following actions:
 # local_file.hello will be created
 resource "local_file" "hello" { content              = "hello world1"
 directory_permission = "0777"
 file_permission      = "0700"
 filename             = "/tmp/hello-terra.txt"
 id                   = (known after apply)
 } 
 Plan: 1 to add, 0 to change, 0 to destroy.
 
 Note: You didn't specify an "-out" parameter to save this plan, so Terraform
 can't guarantee that exactly these actions will be performed if
 "terraform apply" is subsequently run.
/terraform/test1$ 
/terraform/test1$ terraform apply 
 local_file.hello: Refreshing state… [id=c25325615b8492da77c2280a425a3aa82efda6d3]
 An execution plan has been generated and is shown below.
 Resource actions are indicated with the following symbols:
 create 
 Terraform will perform the following actions:
 # local_file.hello will be created
 resource "local_file" "hello" { content              = "hello world1"
 directory_permission = "0777"
 file_permission      = "0700"
 filename             = "/tmp/hello-terra.txt"
 id                   = (known after apply)
 } 
 Plan: 1 to add, 0 to change, 0 to destroy.
 Do you want to perform these actions?
   Terraform will perform the actions described above.
   Only 'yes' will be accepted to approve.
 Enter a value: yes
 local_file.hello: Creating…
 local_file.hello: Creation complete after 0s [id=c25325615b8492da77c2280a425a3aa82efda6d3]
 Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
/terraform/test1$ 
/terraform/test1$ cat /tmp/hello-terra.txt 
 hello world1

Update/Destroy:

 $ update tf file
 $ terraform apply   -> apply the changes
or
 $ terraform destroy -> shows the destroy plan and then you need to confirm

Providers:

https://registry.terraform.io/
  oficial: aws, gcp, local, etc
  verified (3rdparty): bigip, heroku, digitalocena
  community: activedirectory, ucloud, netapp-gcps
 
$ terraform init -> show the providers installed

 plugin name format:
  * registry.terraform.io/hashicorp/local
           ^                ^         ^
       hostname      org namespace   type 
 
plugins installed in .terraform/plugins

https://registry.terraform.io/providers/hashicorp/local/latest/docs/resources/file#sensitive_content
 main.tf: resource definition
 variables.tf: variable declarations
 outputs.tf: outouts from resources
 provider.tf: providers definition

Variables:

filename
content
prefix
separator
length

* type is optional
 type: string    "tst"
       number    1
       bool      true/false
       any       whatever
       list      ["cat","dog"]
       map       pet1=cat
       object    mix of the above
       tuple     like a list of types
       set       (it is like a list but can't have duplicate values!) 

Examples:

vim varibles.ttf
// List
variable "prefix" {
  default = ["Mr", "Mrs", "Sir"]   **default is optional!!!
  type = list(string)
 }

// Map
 variable file-content {
  type = map(string)
  default = {
   "state1" = "test1"
   "state2" = "test2"
  }
 }

// Set
 variable "prefix" {
  default = ["10","11","12"]
  type = set(number)
 }

// Object
 variable "bella" {
 type = object({
   name = string
   age = number
   food = list(string)
   alive = bool
  })
 default = {
   name = "bella"
   age = 21
   food = ["pasta", "tuna"]
   alive = true
  }
 }

// Tuple
 variable kitty {
  type = tuple([string, number, bool)]
  default = ["cat", 7, true]
 }

Using variables

vim main.tf
 resource "random_pet" "my-pet" {
  prefix = var.prefix[0]
 }
 resource local_file my-file {
  filename = "/tmp/test1.txt"
  content = var.file-content["state1"]
 }

Example using vars:

/terraform/vars$ cat variables.tf
variable "filename" {
default = "/tmp/test-var.txt"
type = string
description = "xx"
}
variable "content" {
default = "hello test var"
}
/terraform/vars$ cat main.tf
resource "local_file" "test1" {
filename = var.filename
content = var.content
}
/terraform/vars$
/terraform/vars$ terraform init
Initializing the backend…
Initializing provider plugins…
Finding latest version of hashicorp/local…
Installing hashicorp/local v2.0.0…
Installed hashicorp/local v2.0.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
/terraform/vars$
/terraform/vars$ terraform plan
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
create
Terraform will perform the following actions:
# local_file.test1 will be created
resource "local_file" "test1" { content = "hello test var"
directory_permission = "0777"
file_permission = "0777"
filename = "/tmp/test-var.txt"
id = (known after apply)
}
Plan: 1 to add, 0 to change, 0 to destroy.

Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
/terraform/vars$
/terraform/vars$ terraform apply
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
create
Terraform will perform the following actions:
# local_file.test1 will be created
resource "local_file" "test1" { content = "hello test var"
directory_permission = "0777"
file_permission = "0777"
filename = "/tmp/test-var.txt"
id = (known after apply)
}
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
local_file.test1: Creating…
local_file.test1: Creation complete after 0s [id=9f5d7ee95aa30648a2fb6f8e523e0547b7ecb78e]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
/terraform/vars$
/terraform/vars$
/terraform/vars$ cat /tmp/test-var.txt
hello test var

Pass var values:

 1- if there is no values for var, when running "terrafom apply" it will ask for the values interactivily!
 2- cli params
    $ terraform apply -var "filename=/root/test.tst" -var "content=My Test"
 3- env vars  TF_VAR_xxx=xxx
    $ export TF_VAR_filename="/root/test.tst"
    $ terraform apply
 4- var files:
    autoloaded: terraform.tfvars, terraform.tfvars.json, *.auto.tfvars, *.auto.tvars.json
    explicit NAME.tfvars
    $ cat terraform.tfvars
      filename="/root/test.tst"
    $ terraform apply
    $ terraform -var-file NAME.tfvars

VAR PRECEDENCE: less -> more
 1 env vars
 2 terraform.tfvars
 3 *.auto.tfvars (alphabetic order)
 4 -var -r -var-file (cli flags)     --> highest priority!!!! it overrides all above options

Kubernetes-Docker-ASICs

This week I read that kubernetes is going to stop support for Docker soon. I was quite surprised. I am not an expert so it seems they have legit reasons. But I haven’t read anything from the other side. I think it is going to be painful so I need to try that in my lab and see how to do that migration. It has to be nice to learn that.

In the other end, I read a blog entry about ASICs from Cloudflare. I think without getting too technical it is a good one. And I learn about the different type of ASICs from Juniper. In the last years, I have only used devices powered by Broadcom ASICs. One day, I would like to try that P4/Barefoot Tofino devices. And related to this, I remember this NANOG presentation about ASICs that is really good (and fun!).

install-kubeadm-vagrant-libvirt

While studying for CKA, I installed kubeadm using vagrant/virtualbox. Now I want to try the same, but using libvirt instead.

1- Install 3VM (1 master and 2 worker-nodes) I have installed vagrant and libvirtd already. Take this vagrant file as source.

2- I had to make two changes to that file

2.1- I want to use libvirtd, so need to change the Ubuntu vm.box to one that supports it.

#config.vm.box = “ubuntu/bionic64”
config.vm.box = “generic/ubuntu1804”

2.2- Then need to change the network interface

enp0s8 -> eth1

3- Create the VMs with vagrant.

$ ls -ltr
-rw-r--r-- 1 tomas tomas 3612 Nov 15 16:36 Vagrantfile

$ vagrant status
Current machine states:
kubemaster not created (libvirt)
kubenode01 not created (libvirt)
kubenode02 not created (libvirt)

$ vagrant up
...
An unexpected error occurred when executing the action on the
'kubenode01' machine. Please report this as a bug:
cannot load such file -- erubis
...

3.1 Ok, we have to troubleshoot vagrant in my laptop. I googled a bit and couldnt find anything related. I remembered that you could install plugins with vagrant as once I had to update vagrant-libvirtd plugin. So this is kind of what I did.

$ vagrant version
Installed Version: 2.2.13
Latest Version: 2.2.13

$ vagrant plugin list
vagrant-libvirt (0.1.2, global)
Version Constraint: > 0

$ vagrant plugin update
Updating installed plugins…
Fetching fog-core-2.2.3.gem
Fetching nokogiri-1.10.10.gem
Building native extensions. This could take a while…
Building native extensions. This could take a while…
Fetching vagrant-libvirt-0.2.1.gem
Successfully uninstalled excon-0.75.0
Successfully uninstalled fog-core-2.2.0
Removing nokogiri
Successfully uninstalled nokogiri-1.10.9
Successfully uninstalled vagrant-libvirt-0.1.2
Updated 'vagrant-libvirt' to version '0.2.1'!

$ vagrant plugin install erubis

$ vagrant plugin update
Updating installed plugins…
Building native extensions. This could take a while…
Building native extensions. This could take a while…
Updated 'vagrant-libvirt' to version '0.2.1'!

$ vagrant plugin list
erubis (2.7.0, global)
Version Constraint: > 0
vagrant-libvirt (0.2.1, global)
Version Constraint: > 0

3.2. Now, I can start vagrant fine

$ vagrant up
....

$ vagrant status
Current machine states:
kubemaster running (libvirt)
kubenode01 running (libvirt)
kubenode02 running (libvirt)

4- Install kubeadm. I follow the official doc. It seems we have the pre-requisites. My laptop has 8GB RAM and 4 cpus. Our VMs are Ubuntu 16.04+.

4.1 Enable iptables in each VM:

$ vagrant ssh kubemaster

vagrant@kubemaster:~$ lsmod | grep br_net
vagrant@kubemaster:~$
vagrant@kubemaster:~$ sudo modprobe br_netfilter
vagrant@kubemaster:~$ lsmod | grep br_net
br_netfilter 24576 0
bridge 155648 1 br_netfilter
vagrant@kubemaster:~$
vagrant@kubemaster:~$ cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
vagrant@kubemaster:~$ sudo sysctl --system
...

5- Install runtime (docker). Following the official doc, we click on the link at the end of “Installing runtime”. We do this in each node:

vagrant@kubemaster:~$ sudo -i
root@kubemaster:~# sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
...
root@kubemaster:~# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
OK
root@kubemaster:~# sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \ 
$(lsb_release -cs) \
stable"
...
root@kubemaster:~# sudo apt-get update && sudo apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)
....
root@kubemaster:~# cat <<EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
root@kubemaster:~# sudo mkdir -p /etc/systemd/system/docker.service.d
root@kubemaster:~# sudo systemctl daemon-reload
root@kubemaster:~# sudo systemctl restart docker
root@kubemaster:~# sudo systemctl enable docker
Synchronizing state of docker.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable docker
root@kubemaster:~#
root@kubemaster:~#

5- Now we follow “Installing kubeadm, kubelet and kubectl” from main doc in each VM.

root@kubemaster:~#
root@kubemaster:~# sudo apt-get update && sudo apt-get install -y apt-transport-https curl
...
root@kubemaster:~# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
OK
root@kubemaster:~# cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
deb https://apt.kubernetes.io/ kubernetes-xenial main
root@kubemaster:~# sudo apt-get update
...
root@kubemaster:~# sudo apt-get install -y kubelet kubeadm kubectl
...
root@kubemaster:~# ip -4 a

We dont have to do anything with the next section “Configure cgroup driver…” as we are using docker. So from the bottom of the main page, we click on the next section for using kubeadm and create a cluster.

6- So we have our three VMS with kubeadm. Now we are going to create a cluster. The kubemaster VM will be the control-plane node. So following “Initializing your control-plane node”, we dont need 1 (as we have only one control-node), for 2) will install weave-net as CNI in the next step, we need to use a new network for this: 10.244.0.0/16. 3) we dont need it and 4) we will specify the master ip. So, only on kubemaster:

root@kubemaster:~# kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address=192.168.56.2
W1115 17:13:31.213357 9958 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.4
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher

oh, problem. It seems we need to disable swap on the VMs. Actually, we will do in all VMs.

root@kubemaster:~# swapoff -a

Try again kubeadm init in master:

root@kubemaster:~# kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address=192.168.56.2
W1115 17:15:00.378279 10376 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.4
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubemaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.56.2]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubemaster localhost] and IPs [192.168.56.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubemaster localhost] and IPs [192.168.56.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 25.543262 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node kubemaster as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node kubemaster as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: aeseji.kovc0rjt6giakn1v
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.56.2:6443 --token aeseji.kovc0rjt6giakn1v \
--discovery-token-ca-cert-hash sha256:c1b91ec9cebe065665c314bfe9a7ce9c0ef970d56ae762dae5ce308caacbd8cd
root@kubemaster:~#

7- We need to follow the output of kubeadm init in kubemaster. As well pay attention as the info for joining our worker-nodes to the cluster in there too (“kubeadm join ….”)

root@kubemaster:~# exit
logout
vagrant@kubemaster:~$ mkdir -p $HOME/.kube
vagrant@kubemaster:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
vagrant@kubemaster:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

We can test the status of the control-node. It is NotReady because it needs the network configuration.

vagrant@kubemaster:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 2m9s v1.19.4

8- From the same page, now we need to follow “Installing a Pod network add-on”. I dont know why but the documentation is not great about it. You need to dig in all version to find the steps to install wave-net. This is the link. So we install wave-net only on the kubemaster:

vagrant@kubemaster:~$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.apps/weave-net created
vagrant@kubemaster:~$
vagrant@kubemaster:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster Ready master 4m32s v1.19.4

9- We can follow to the section “Joining your nodes”. We need to apply the “kubeadm join…” command from the outout of “kubeadm init” in master node in only the worker-nodes.

root@kubenode02:~# kubeadm join 192.168.56.2:6443 --token aeseji.kovc0rjt6giakn1v --discovery-token-ca-cert-hash sha256:c1b91ec9cebe065665c314bfe9a7ce9c0ef970d56ae762dae5ce308caacbd8cd
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster…
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap…
This node has joined the cluster:
Certificate signing request was sent to apiserver and a response was received.
The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
root@kubenode02:~#

10- We need to wait a bit, but finally the worker nodes will come up as Ready if we check in the master/control-node:

vagrant@kubemaster:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster Ready master 6m35s v1.19.4
kubenode01 Ready 2m13s v1.19.4
kubenode02 Ready 2m10s v1.19.4
vagrant@kubemaster:~$

11- Let’s verify we have a working cluster just creating a pod.

vagrant@kubemaster:~$ kubectl run ngix --image=nginx
pod/ngix created

vagrant@kubemaster:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
ngix 0/1 ContainerCreating 0 5s
vagrant@kubemaster:~$
vagrant@kubemaster:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
ngix 1/1 Running 0 83s
vagrant@kubemaster:~$

vagrant@kubemaster:~$ kubectl delete pod ngix
pod "ngix" deleted

vagrant@kubemaster:~$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-b9b92 1/1 Running 0 10m
coredns-f9fd979d6-t822r 1/1 Running 0 10m
etcd-kubemaster 1/1 Running 0 10m
kube-apiserver-kubemaster 1/1 Running 0 10m
kube-controller-manager-kubemaster 1/1 Running 2 10m
kube-proxy-jpb9p 1/1 Running 0 10m
kube-proxy-lkpv9 1/1 Running 0 6m13s
kube-proxy-sqd9v 1/1 Running 0 6m10s
kube-scheduler-kubemaster 1/1 Running 2 10m
weave-net-8rl49 2/2 Running 0 6m13s
weave-net-fkqdv 2/2 Running 0 6m10s
weave-net-q79pb 2/2 Running 0 7m48s
vagrant@kubemaster:~$

So, we have a working kubernetes cluster built with kubeadm using vagrant/libvirtd!

As a note, while building the VMs and installing software on them, my laptop hang a couple of times as the 3VMS running at the same time takes nearly all RAM. But this is a good exercise to understand the requirements of kubeadm to build a cluster and as well, it is a lab env you can use while studying if the cloud env are down or you dont have internet. Let’s see If I manage to pass the CKA one day!!!

3VMs running
----
# top
top - 17:24:10 up 9 days, 18:18, 1 user, load average: 5.22, 5.09, 4.79
Tasks: 390 total, 1 running, 388 sleeping, 0 stopped, 1 zombie
%Cpu(s): 21.7 us, 19.5 sy, 0.0 ni, 56.5 id, 2.0 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 7867.7 total, 263.0 free, 6798.7 used, 806.0 buff/cache
MiB Swap: 6964.0 total, 991.4 free, 5972.6 used. 409.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
329875 tomas 20 0 9268464 251068 83584 S 55.8 3.1 14:27.84 chrome
187962 tomas 20 0 1302500 105228 46528 S 36.9 1.3 170:58.40 chrome
331127 libvirt+ 20 0 4753296 1.3g 5972 S 35.5 17.5 7:13.00 qemu-system-x86
330979 libvirt+ 20 0 4551524 954212 5560 S 7.3 11.8 4:08.33 qemu-system-x86
5518 root 20 0 1884932 135616 8528 S 5.3 1.7 76:50.45 Xorg
330803 libvirt+ 20 0 4550504 905428 5584 S 5.3 11.2 4:12.68 qemu-system-x86
6070 tomas 9 -11 1180660 6844 4964 S 3.7 0.1 44:04.39 pulseaudio
333253 tomas 20 0 4708156 51400 15084 S 3.3 0.6 1:23.72 chrome
288344 tomas 20 0 2644572 56560 14968 S 1.7 0.7 9:03.78 Web Content
6227 tomas 20 0 139916 8316 4932 S 1.3 0.1 19:59.68 gkrellm

3VMS stopped
----
root@athens:/home/tomas# top
top - 18:40:09 up 9 days, 19:34, 1 user, load average: 0.56, 1.09, 1.30
Tasks: 379 total, 2 running, 376 sleeping, 0 stopped, 1 zombie
%Cpu(s): 4.5 us, 1.5 sy, 0.0 ni, 94.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7867.7 total, 3860.9 free, 3072.9 used, 933.9 buff/cache
MiB Swap: 6964.0 total, 4877.1 free, 2086.9 used. 4122.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
288344 tomas 20 0 2644572 97532 17100 S 6.2 1.2 11:05.35 Web Content
404910 root 20 0 12352 5016 4040 R 6.2 0.1 0:00.01 top
1 root 20 0 253060 7868 5512 S 0.0 0.1 0:47.82 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:02.99 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
10 root 20 0 0 0 0 S 0.0 0.0 0:11.39 ksoftirqd/0
11 root 20 0 0 0 0 I 0.0 0.0 2:13.55 rcu_sched
root@athens:/home/tomas#

Work-Hard

I get mad whenever I hear “work hard” lately. What the f* that means? Do I need to stay in my desk for 16 hours every day? This is what I understand for working hard. I am subscribed to the SDN mail list of IPSpace and this week the email was about this topic and related to network automation. My former CTO told me one day “work smarter, not harder”. I am not very smart, but I try. And one key thing, it is focus.

gnmi-ssl-p2

I was already playing with gNMI and protobuf a couple of months ago. But this week I received a summary from the last NANOG80 meeting and there was a presentation about it. Great job from Colin!

So I decided to give it a go as the demo was based on docker and I have already my Arista lab in cEOS and vEOS as targets.

I started my 3node-ring cEOS lab with docker-topo

ceos-testing/topology master$ docker-topo --create 3-node-simple.yml
INFO:main:Version 2 requires sudo. Restarting script with sudo
[sudo] password for xxx:
INFO:main:
alias r01='docker exec -it 3node_r01 Cli'
alias r02='docker exec -it 3node_r02 Cli'
alias r03='docker exec -it 3node_r03 Cli'
INFO:main:All devices started successfully

Checked they were up:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4160cc354ba2 ceos-lab:4.23.3M "/sbin/init systemd.…" 7 minutes ago Up 7 minutes 0.0.0.0:2002->22/tcp, 0.0.0.0:9002->443/tcp 3node_r03
122f72fb25bd ceos-lab:4.23.3M "/sbin/init systemd.…" 7 minutes ago Up 7 minutes 0.0.0.0:2001->22/tcp, 0.0.0.0:9001->443/tcp 3node_r02
68cf8ca39130 ceos-lab:4.23.3M "/sbin/init systemd.…" 7 minutes ago Up 7 minutes 0.0.0.0:2000->22/tcp, 0.0.0.0:9000->443/tcp 3node_r01

And then, check I had gnmi config in r01:

!
management api gnmi
transport grpc GRPC
port 3333
!

Need to find the IP of r01 in “3node_net-0” as the one used for management. I have had so many times hit this issue,…

$ docker inspect 3node_r01
...
"Networks": {
 "3node_net-0": {
 "IPAMConfig": null, 
 "Links": null,
 "Aliases": [ "68cf8ca39130" ],
 "NetworkID": "d3f72e7473228488f668aa3ed65b6ea94e1c5c9553f93cf0f641c3d4af644e2e", "EndpointID": "bca584040e71a826ef25b8360d92881dad407ff976eff65a38722fd36e9fc873", "Gateway": "172.20.0.1", 
"IPAddress": "172.20.0.2",
....

Now, I cloned the repo and followed the instructions/video. Copied targets.jon and updated it with my r01 device details:

~/storage/technology/gnmi-gateway release$ cat examples/gnmi-prometheus/targets.json 
{
  "request": {
    "default": {
      "subscribe": {
        "prefix": {
        },
        "subscription": [
          {
            "path": {
              "elem": [
                {
                  "name": "interfaces"
                }
              ]
            }
          }
        ]
      }
    }
  },
  "target": {
    "r01": {
      "addresses": [
        "172.20.0.2:3333"
      ],
      "credentials": {
        "username": "xxx",
        "password": "xxx"
      },
      "request": "default",
      "meta": {
        "NoTLS": "yes"
      }
    }
  }
}

Carrying out with the instructions, build docker gnmi-gateway, docker bridge and run docker gnmi-gateway built earlier.

go:1.14.6|py:3.7.3|tomas@athens:~/storage/technology/gnmi-gateway release$ docker run \
-it --rm \
-p 59100:59100 \
-v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json \
--name gnmi-gateway-01 \
--network gnmi-net \
gnmi-gateway:latest
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Initializing target r01 ([172.27.0.2:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Target r01: Connecting"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Target r01: Subscribing"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T16:54:38Z","message":"Target r01: Disconnected"}
E1107 16:54:38.382032 1 reconnect.go:114] client.Subscribe (target "r01") failed: client "gnmi" : client "gnmi" : Dialer(172.27.0.2:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T16:54:48Z","message":"Target r01: Disconnected"}
E1107 16:54:48.935965 1 reconnect.go:114] client.Subscribe (target "r01") failed: client "gnmi" : client "gnmi" : Dialer(172.27.0.2:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s

bash-4.2# tcpdump -i any tcp port 3333 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
17:07:57.621011 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 76: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [S], seq 557316949, win 64240, options [mss 1460,sackOK,TS val 3219811744 ecr 0,nop,wscale 7], length 0
17:07:57.621069 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 76: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [S.], seq 243944609, ack 557316950, win 65160, options [mss 1460,sackOK,TS val 1828853442 ecr 3219811744,nop,wscale 7], length 0
17:07:57.621124 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621348 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 89: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [P.], seq 1:22, ack 1, win 510, options [nop,nop,TS val 1828853442 ecr 3219811744], length 21
17:07:57.621409 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [.], ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621492 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 320: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [P.], seq 1:253, ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 252
17:07:57.621509 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 68: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [.], ack 253, win 509, options [nop,nop,TS val 1828853442 ecr 3219811744], length 0
17:07:57.621586 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [F.], seq 253, ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621904 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 68: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [R.], seq 22, ack 254, win 509, options [nop,nop,TS val 1828853443 ecr 3219811744], length 0

Ok, the container is created and seems running but the gnmi-gateway can’t connect to my cEOS r01….

First thing, I had to check iptables. It is not the first time that when playing with docker and building different environments (vEOS vs gnmi-gateway) with different docker commands, iptables may be not configured properly.

And it was the case again:

# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN

So I moved the new docker bridge network for gnmi-gateway after “ACCEPT” and solved.

# iptables -t filter -D DOCKER-ISOLATION-STAGE-1 -j ACCEPT
# iptables -t filter -I DOCKER-ISOLATION-STAGE-1 -j ACCEPT
#
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#

So, restarted gnmi-gateway, still same issue. Ok, I decided to check if the packets were actually hitting r01.

So at first sight, the tcp handshake is established but then there is TCP RST….

So I double checked that gnmi was runnig in my side:

r1#show management api gnmi 
Enabled:            Yes
Server:             running on port 3333, in MGMT VRF
SSL Profile:        none
QoS DSCP:           none
r1#

At that moment, I thought that was an issue in cEOS… checking logs I couldnt see any confirmation but I decided to give it a go with vEOS that is more feature rich. So I turned up my GCP lab and followed the same steps with gnmi-gateway. I updated the targets.json with the details of one of my vEOS devices. And run again:

~/gnmi/gnmi-gateway release$ sudo docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Initializing target gcp-r1 ([192.168.249.4:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Connecting"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Subscribing"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T19:22:30Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:30.048410 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T19:22:40Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:40.603141 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s

Again, same issue. Let’s see from vEOS perspective.

bash-4.2# tcpdump -i any tcp port 3333 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
18:52:31.874137 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 76: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [S], seq 4076065498, win 64240, options [mss 1460,sackOK,TS val 1752943121 ecr 0,nop,wscale 7], length 0
18:52:31.874579 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 76: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [S.], seq 3922060793, ack 4076065499, win 28960, options [mss 1460,sackOK,TS val 433503 ecr 1752943121,nop,wscale 7], length 0
18:52:31.875882 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 1752943123 ecr 433503], length 0
18:52:31.876284 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 320: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [P.], seq 1:253, ack 1, win 502, options [nop,nop,TS val 1752943124 ecr 433503], length 252
18:52:31.876379 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 68: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [.], ack 253, win 235, options [nop,nop,TS val 433504 ecr 1752943124], length 0
18:52:31.929448 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 89: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [P.], seq 1:22, ack 253, win 235, options [nop,nop,TS val 433517 ecr 1752943124], length 21
18:52:31.930028 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [.], ack 22, win 502, options [nop,nop,TS val 1752943178 ecr 433517], length 0
18:52:31.930090 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [F.], seq 253, ack 22, win 502, options [nop,nop,TS val 1752943178 ecr 433517], length 0
18:52:31.931603 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 68: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [R.], seq 22, ack 254, win 235, options [nop,nop,TS val 433517 ecr 1752943178], length 0

So again in GCP, tcp is established but then TCP RST. As vEOS is my last resort, I tried to dig into that TCP connection. I downloaded a pcap to analyze with wireshark so get a better visual clue…

So, somehow, gnmi-gateway is trying to negotiate TLS!!! As per my understanding, my targets.json was configured with “NoTLS”: “yes” so that should be avoid, shouldn’t be?

At that moment, I wanted to know how to identfiy TLS/SSL packets using tcpdump as it is not always that easy to get quickly a pcap in wireshark. So I found the answer here:

bash-4.2# tcpdump -i any "tcp port 3333 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:47:01.367197 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50486 > 192.168.249.4.dec-notes: Flags [P.], seq 2715923852:2715924104, ack 2576249027, win 511, options [nop,nop,TS val 1194424180 ecr 1250876], length 252
19:47:02.405870 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50488 > 192.168.249.4.dec-notes: Flags [P.], seq 680803294:680803546, ack 3839769659, win 511, options [nop,nop,TS val 1194425218 ecr 1251136], length 252
19:47:04.139458 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50490 > 192.168.249.4.dec-notes: Flags [P.], seq 3963338234:3963338486, ack 1760248652, win 511, options [nop,nop,TS val 1194426952 ecr 1251569], length 252

Not something easy to remember 🙁

Ok, I wanted to be sure that gnmi was functional in vEOS and by a quick internet look up, I found this project gnmic! Great job by the author!

So I configured the tool and tested with my vEOS. And worked (without needing TLS)

~/gnmi/gnmi-gateway release$ gnmic -a 192.168.249.4:3333 -u xxx -p xxx --insecure --insecure get \
--path "/interfaces/interface[name=*]/subinterfaces/subinterface[index=*]/ipv4/addresses/address/config/ip"
Get Response:
[
{
"time": "1970-01-01T00:00:00Z",
"updates": [
{
"Path": "interfaces/interface[name=Management1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.249.4]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.249.4"
}
},
{
"Path": "interfaces/interface[name=Ethernet2]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.13.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.13.1"
}
},
{
"Path": "interfaces/interface[name=Ethernet3]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.1.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.1.1"
}
},
{
"Path": "interfaces/interface[name=Ethernet1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.12.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.12.1"
}
},
{
"Path": "interfaces/interface[name=Loopback1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.0.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.0.1"
}
},
{
"Path": "interfaces/interface[name=Loopback2]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.0.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.0.1"
}
}
]
}
]
~/gnmi/gnmi-gateway release$

So, I kind of I was sure that my issue was configuring gnmi-gateway. I tried to troubleshoot it: removed the NoTLS, using the debugging mode, build the code, read the Go code for Target (too complex for my Goland knowledge 🙁 )

So at the end, I gave up and opened an issue with gnmi-gateway author. And he answered super quick with the solution!!! I misunderstood the meaning of “NoTLS” 🙁

So I followed his instructions to configure TLS in my gnmi cEOS config

security pki certificate generate self-signed r01.crt key r01.key generate rsa 2048 validity 30000 parameters common-name r01
!
management api gnmi
transport grpc GRPC
ssl profile SELFSIGNED
port 3333
!
...
!
management security
ssl profile SELFSIGNED
certificate r01.crt key r01.key
!
end

and all worked straightaway!

~/storage/technology/gnmi-gateway release$ docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Initializing target r01 ([172.20.0.2:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Connecting"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Subscribing"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Connected"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Synced"}
{"level":"info","time":"2020-11-08T09:39:16Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-08T09:39:45Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-08T09:40:15Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}

So I can start prometheus

~/storage/technology/gnmi-gateway release$ docker run \
-it --rm \
-p 9090:9090 \
-v $(pwd)/examples/gnmi-prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
--name prometheus-01 \
--network gnmi-net \
prom/prometheus
Unable to find image 'prom/prometheus:latest' locally
latest: Pulling from prom/prometheus
76df9210b28c: Pull complete
559be8e06c14: Pull complete
66945137dd82: Pull complete
8cbce0960be4: Pull complete
f7bd1c252a58: Pull complete
6ad12224c517: Pull complete
ee9cd36fa25a: Pull complete
d73034c1b9c3: Pull complete
b7103b774752: Pull complete
2ba5d8ece07a: Pull complete
ab11729a0297: Pull complete
1549b85a3587: Pull complete
Digest: sha256:b899dbd1b9017b9a379f76ce5b40eead01a62762c4f2057eacef945c3c22d210
Status: Downloaded newer image for prom/prometheus:latest
level=info ts=2020-11-08T09:40:26.622Z caller=main.go:315 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:353 msg="Starting Prometheus" version="(version=2.22.1, branch=HEAD, revision=00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:358 build_context="(go=go1.15.3, user=root@516b109b1732, date=20201105-14:02:25)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:359 host_details="(Linux 5.9.0-1-amd64 #1 SMP Debian 5.9.1-1 (2020-10-17) x86_64 b0fadf4a4c80 (none))"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:360 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:361 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-11-08T09:40:26.641Z caller=main.go:712 msg="Starting TSDB …"
level=info ts=2020-11-08T09:40:26.641Z caller=web.go:516 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-11-08T09:40:26.668Z caller=head.go:642 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-11-08T09:40:26.669Z caller=head.go:656 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=103.51µs
level=info ts=2020-11-08T09:40:26.669Z caller=head.go:662 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-11-08T09:40:26.672Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-11-08T09:40:26.672Z caller=head.go:719 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=123.684µs wal_replay_duration=2.164743ms total_replay_duration=3.357021ms
level=info ts=2020-11-08T09:40:26.675Z caller=main.go:732 fs_type=2fc12fc1
level=info ts=2020-11-08T09:40:26.676Z caller=main.go:735 msg="TSDB started"
level=info ts=2020-11-08T09:40:26.676Z caller=main.go:861 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-11-08T09:40:26.684Z caller=main.go:892 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=7.601103ms remote_storage=22.929µs web_handler=623ns query_engine=1.64µs scrape=5.517391ms scrape_sd=359.447µs notify=18.349µs notify_sd=3.921µs rules=15.744µs
level=info ts=2020-11-08T09:40:26.685Z caller=main.go:684 msg="Server is ready to receive web requests."

Now we can open prometheus UI and verify if we are consuming data from cEOS r01.

Yeah! it is there.

So all working at then. It has a nice experience. At the end of the day, I want to know more about gNMI/protobuffer, etc. The cold thing here is you can get telemetry and configuration management of your devices. So using gnmi-gateway (that is more for a high availability env like Netflix) and gnmic are great tools to get your head around.

Other lab I want to try is this eos-gnmi-telemetry-grafana.

The to-do list always keeps growing.

Kubernetes Troubleshooting I

Restore ETCD

This is a process no well documented in the official docs and I messed up in my CKA exam:

1- check config of etcd process. Maybe you will need some details for the restore process

$ kubectl describe pod -n kube-system etcd-master
...
--name=master
--initial-cluster=master=https://127.0.0.1:2380
--initial-advertise-peer-urls=https://127.0.0.1:2380
...

2- Stop api-server if not running kubeadm

$ service kube-apiserver stop

3- Check help for all restore options. Keep in mind you will need (very likely) to provide certs for auth.

$ ETCDTL_API=3 etcdctl snapshot restore -h

4- Restore ETCD using a previous backup:

$ ETCDTL_API=3 etcdctl --endpoints 127.0.0.1:2379 snapshot restore FILE \
--cacert xxx --cert xx --key xxx

--data-dir /NEW/DIR \
--initial-cluster-toker TOKEN \ (token is any word) 

--name master \ 
--initial-cluster=master=https://127.0.0.1:2380 \ 
--initial-advertise-peer-urls=https://127.0.0.1:2380 

USE HTTPS!!!!

5- Add new lines and update volume paths in ETCD config. If it is a static pod, check in /etc/kubernetes/manifests in master node.

--data-dir=/NEW/DIR
--initial-cluster-token TOKEN

++ volumeMounts/volumes to new path /NEW/DIR !!!!

6- Restart services if not running kubeadm

$ systemctl daemon-reload
$ service etcd restart
$ service etcd kube-apiserver start

7- Checks

/// if using kubeadm, docker instance for etcd should restart
$ docker ps -a | grep -i etcd

/// check etcd is running showing members:
$ ETCDCTL_API=3 etcdctl member list --cacert xxx --cert xx --key xxx

Sidecar -logging

Based on this doc. You want to send some logs to stderr so you create a new container that takes those.

Container with a sidecar:

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
  - name: count
    image: busybox
   args: 
   - /bin/sh 
   - -c 
   - > i=0; 
       while true; 
       do 
        echo "$i: $(date)" >> /var/log/1.log; 
        echo "$(date) INFO $i" >> /var/log/2.log; i=$((i+1)); sleep 1; 
       done 
   volumeMounts: 
   - name: varlog 
     mountPath: /var/log
  - name: sidecar-1 
    image: busybox 
    args: [/bin/sh, -c, 'tail -n+1 -f /var/log/1.log'] 
    volumeMounts: 
      name: varlog
      mountPath: /var/log
  volumes:
    name: varlog
    emptyDir: {}

Now you can see the logs of “/var/log/1.log” going via “sidecar-1”

$ kubectl logs counter sidecar-1

CPU/Memory of a POD

Based on these links: link1 , link2, link3

If you want to use “kubectl top” you need to install “metrics-server”

$ kubectl top pod --all-namespaces

Keep in mind that “kubectl top” shows metrics for a given pod. That information is based on reports from cAdvisor, which collects real pods resource usage.

And as per link3, “kubectl top” is not the same as running “top” inside the container.

Node NotReady

Based on this link:

$ kubectl get nodes
$ kubectl describe nodes XXX

$ ssh node 
   -> check for kubelet logs 
     cat /var/log/kubelet.log
     $ journalctl -u kubelet // systemctl status kubelet --> if a service

CKA

I am studying for the Kubernetes certification CKA. These are some notes:

1- CORE CONCEPTS

1.1- Cluster Architecture

Master node: manage, plan, schedule and monitor. These are the main components:

etcd: db as k-v
scheduler
controller-manager: node-controller, replication-controller
apiserver: makes communications between all parts
docker

Worker node: host apps as containers. Main components:

kubelet (captain of the ship)
kube-proxy: allow communication between nodes

1.2- ETCD

It is a distributed key-value store (database). TCP 2379. Stores info about nodes, pods, configs, secrets, accounts, roles, bindings etc. Everything related to the cluster.

Basic commands:

client: ./etcdctl set key1 value1
        ./etcdctl get key1

Install Manual:

1- wget "github binary path to etc"
2- setup config file: important "--advertise-client-urls: IP:2379"
                      a lot of certs needed!!!

Install via kubeadm already includes etcd:

$ kubectl get pods -n kube-system | grep etcd

// get all keys from etcd
$ kubectl exec etcd-master -n kube-system etcdctl get / --prefix -keys-only

etcd can be set up as a cluster, but this is for another section.

1.3- Kube API Server

You can install a binary (like etcd) or use it via kubeadm.

It has many options and it defines certs for all connections!!!

1.4- Kube Controller-Manager

You can install a binary (like etcd) or use kubeadm. It gets all the info via the API server. Watch status of pods, remediate situations. Parts:

node-controller
replications-controller

1.5- Kube Scheduler

Decides which pod goes to which node. You can install a binary or via kubeadm.

1.6- Kubelet

It is like the “captain” of the “ship” (node). Communicates with the kube-cluster via the api-server.

Important: kubeadm doesnt install kubelet

1.7- Kube-Proxy

In a cluster, each pod can reach any other pod -> you need a pod network!

It runs in each node. Creates rules in each node (iptables) to use “services”

1.8- POD

It is the smallest kube object.

1 pod =~ 1 container + help container

It can be created via a “kubectl run” or via yaml file.

apiVersion: v1
kind: Pod
metadata:
  name: postgres-pod
  labels:
    name: postgres-pod
    app: demo-voting-app
spec:
  containers:
    - name: postgres
      image: postgres
      ports:
        - containerPort: 5432
      env:
        - name: POSTGRES_USER
          value: "postgres"
        - name: POSTGRES_PASSWORD
          value: "postgres"

Commands:

$ kubectl create -f my-pod.yaml
$ kubectl get pods
$ kubectl describe pod postgres

It always contains “apiVersion”, “kind”, “metadata” and “spec”.

1.9 ReplicaSet

Object in charge of monitoring pods, HA, loadbalancing, scaling. It is a replacement of “replication-controller”. Inside the spec.tempate you “cope/paste” the pod definition.

The important part is “selector.matchLabels” where you decide what pods are going to be managed by this replicaset

Example:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: my-rs
  labels:
    app: myapp
spec:
  replicas: 3
  selector: // match pods created before the RS - main difference between RS 
                                                                      and RC
    matchLabels:
      app: myapp   --> find labels from pods matching this
  template:
    metadata:
      name: myapp-pod
      labels:
        app: myapp
    spec:
      containers:
      - name: nginx-controller
        image: nginx

Commands:

$ kubectl create -f my-rs.yaml
$ kubectl get replicaset
$ kubectl scale --replicas=4 replicaset my-rs
$ kubectl replace -f my-rs.yaml

1.10- Deployments

It is an object that creates a pod + replicaset. It provides the upgrade (rolling updates) feature to the pods.

File is identical as a RS, only changes the “kind”

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
  labels:
    app: myapp
spec:
  replicas: 3
  selector: // match pods created before the RS - main difference between RS 
                                                                   and RC
    matchLabels:
      app: myapp   --> find labesl from pods matching this
  template:
    metadata:
      name: myapp-pod
      labels:
        app: myapp
    spec:
      containers:
      - name: nginx-controller
        image: nginx

Commands:

$ kubectl create -f my-rs.yaml
$ kubectl get deployments
$ kubectl get replicaset
$ kubectl get pods

1.11- Namespace

It is a way to create different environments in the cluster. ie: production, testing, features, etc. You can control the resource allocations for the “ns”

By default you have 3 namespaces:

kube-system: where all control-plane pods are installed
default:
kube-public:

The “ns” is used in DNS.

db-service.dev.svc.cluster.local
---------  --- ---  -----------
svc name   ns  type domain(default)

10-10-1-3.default.pod.cluster-local
--------- ---     ---  -----------
pod IP    ns      type  domain(default)

Keep in mind that POD DNS names are just the “IP” in “-” format.

You can add “namespace: dev” into the “metadata” section of yaml files. By default, namespace=default.

$ kubectl get pods --namespace=xx (by default is used "default" namespace)

Create “ns”:

namespace-dev.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: dev

$ kubectl create -f namespace-dev.yaml
or

$ kubectl create namespace dev

Change “ns” in your context if you dont want to type it in each kubectl command:

$ kubectl config set-context $(kubectl config current-context) -n dev

See all objects in all ns:

$ kubectl get pods --all-namespaces

$ kubectl get ns --no-headers | wc -l

1.12- Resource Quotas

You can state the resources (cpu, memory, etc) for a pod.

Example:

apiVersion: v1
kind: ResourceQuota
metadata:
 name: compute-quota
 namespace: dev
spec:
 hard:
   pods: "10"
   requests.cpu: "4"
   requests.memory: 5Gi
   limits.cpu: "10"
   limits.memory: 10Gi

Commands:

$ kubectl create -f compute-quota.yaml

1.13 Services

It is an object. It connects pods to external users or other pods.

Types:

NodePort: like docker port-mapping
ClusterIP: like a virtual IP that is reachable to all pods in the cluster.
LoadBalancer: only available in Cloud providers

1.13.1 NodePort

Like a virtual server. SessionAffinity: yes. Random Algorithm for scheduling.

Important parts:

targetport: This is the pod port.
port: This is the service port (most of the times, it is the same as targetport).
nodeport: This is in the node (the port other pods in different nodes are going to hit)

Example:

apiVersion: v1
kind: Service
metadata:
  name: mypapp-service
spec:
  type: NodePort
  ports:
  - targetPort: 80
    port: 80
    nodePort: 30080  (range: 30000-32767)
  selector:
    app: myapp        ---|
    type: front-end   ---|-> matches pods !!!!

The important bits are the “spec.ports” and “spec.selector” definitions. The “selector” is used to match on labels from pods where we want to apply this service.

Commands:

// declarative
$ kubectl create -f service-definition.yml
$ kubectl get services

// imperative
$ kubectl expose deployment simple-webapp-deployment --name=webapp-service --target-port=8080 --type=NodePort \
--dry-run=client -o yaml > svc.yaml --> create YAML !!!

Example of creating pod and service imperative way:

$ kubectl run redis --image=redis:alpine --labels=tier=db
$ kubectl expose pod redis --name redis-service --port 7379 --target-port 6379

1.13.2 ClusterIP

It is used for access to several pods (VIP). This is the default service type.

Example:

apiVersion: v1
kind: Service
metadata:
  name: back-end
spec:
  type: ClusterIP // (default)
  ports:
  - targetPort: 80
    port: 80
  selector:
    app: myapp
    type: back-end

Commands:

$ kubectl create -f service-definition.yml
$ kubectl get services

1.13.3 Service Bound

Whatever the service you use, you want to be sure it is in use, you can check that seeing if the service is bound to a node. That is configured by “selector” but to confirm that is correct, use the below command. You must have endpoints to proof your service is attached to some pods.

$ kubectl get service XXX | grep -i endpoint

1.13.4 Microservice Architecture Example

Based on this “diagram”:

voting-app     result-app
 (python)       (nodejs)
   |(1)           ^ (4)
   v              |
in-memoryDB       db
 (redis)       (postgresql)
    ^ (2)         ^ (3)
    |             |
    ------- -------
          | |
         worker
          (.net)

These are the steps we need to define:

1- deploy containers   -> deploy PODs (deployment)
2- enable connectivity -> create service clusterIP for redis
                          create service clusterIP for postgres
3- external access     -> create service NodePort for voting
                          create service NodePort for result

1.14- Imperative vs Declarative

imperative: how to do things (step by step)

$ kubectl run/create/expose/edit/scale/set …
$ kubectl replace -f x.yaml !!! x.yaml has been updated

declarative: just what to do (no how to do) –> infra as code / ansible, puppet, terraform, etc

$ kublectl apply -f x.yaml <--- it creates/updates

1.15 – kubectl and options

--dry-run: By default as soon as the command is run, the resource will be created. If you simply want to test your command , use the --dry-run=client option. This will not create the resource, instead, tell you weather the resource can be created and if your command is right.

-o yaml: This will output the resource definition in YAML format on screen.

$ kubectl explain pod --recursive ==> all options available

$ kubectl logs [-f] POD_NAME [CONTAINER_NAME]

$ kubectl -n prod exec -it PODNAME cat /log/app.log
$ kubectl -n prod logs PODNAME

1.16- Kubectl Apply

There are three type of files:

local file: This is our yaml file
live object config: This is the file generated via our local file and it is what you see when using “get”
last applied config: This is used to find out when fields are REMOVED from the local file

“kubectl apply” compares the three files above to find our what to add/delete.

2- SCHEDULING

2.1- Manual Scheduling

what to schedule? find pod without “nodeName” in the spec section, then finds a node for it.
only add “nodeName” at creation time
After creation, only via API call you can change that

Check you have a scheduler running:

$ kubectl -n kube-system get pods | grep -i scheduler

2.2 Labels and Selectors

group and select things together.
section “label” in yaml files

how to filter via cli:

$ kubectl get pods --selector key=value --selector k1=v1
$ kubectl get pods --selector key=value,k1=v1
$ kubectl get pods -l key=value -l k1=v1

In Replicasets/Services, the labels need to match!

--
spec:
 replicas: 3
 selector:
  matchLabels:
    app:App1 <----
 template:       |
   metadata:     |-- need to match !!!
    labels:      |
     app:App1 <---

2.3 Taints and Tolerations

set restrictions to check what pods can go to nodes. It doesn’t tell the POD where to go!!!

you set “taint” in nodes
you set “tolerance” in pods

Commands:

$ kubectl taint nodes NODE_NAME key=value:taint-effect
$ kubectl taint nodes node1 app=blue:NoSchedule <== apply
$ kubectl taint nodes node1 app=blue:NoSchedule- <== remove(-) !!!
$ kubectl taint nodes node1  <== display taints

*tain-effect = what happens to PODS that DO NOT Tolerate this taint? Three types:

- NoSchedule:
- PreferNoSchedule: will try to avoid the pod in the node, but not guarantee
- NoExecute: new pods will not be installed here, and current pods will exit if they dont tolerate the new taint. The node could have already pods before applying the taint…

Apply toleration in pod, in yaml, it is defined under “spec”:

spec:
 tolerations:
 - key: "app"
   operator: "Equal"
   value: "blue"
   effect: "NoSchedule"

In general, the master node never gets pods (only the static pods for control-plane)

$ kubectl describe node X | grep -i taint

2.4 Node Selector

tell pods where to go (different for taint/toleration)

First, apply on a node a label:

$ kubectl label nodes NODE key=value
$ kubectl label nodes NODE size=Large

Second, apply on pod under “spec” the entry “nodeSelector”:

...
spec:
  nodeSelector:
    size: Large

2.5 Node Affinity

extension of “node selector” with “and” “or” logic ==> mode complex !!!!

apply on pod:#
....
spec:
 affinity:
   nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:  or 
    preferredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: size
          operator: In    ||   NotIn   ||    Exists
          values:
          - Large              Small
          - Medium

DuringScheduling: pod is being created

2.6 Resource Limits

Pod needs by default: cpu(0.5) men(256m) and disk

By default: max cpu = 1 // max mem = 512Mi

Important regarding going over the limit:

if pod uses more cpu than limit -> throttle
                 mem            -> terminate (OOM)

Example:

pod
---
spec:
  containers:
    resources:
      requests:
        memory: "1Gi"
        cpu: 1
      limits:
        memory: "2Gi"
        cpu: 2

2.7 DaemonSets

It is like a replicaset (only kind changes). run 1 pod in each node: ie monitoring, logs viewer, networking (weave-net), kube-proxy!!!

It uses NodeAffinity and default scheduler to schedule pods in nodes.

$ kubectl get daemonset

if you add    a node, the daemonset creates that pod
       delete                       deletes

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: monitoring-daemon
spec:
  selector:
    matchLabels:
      app: monitoring-agent
  template:
    metadata:
      labels:
        app: monitoring-agent
    spec:
      containers:
      - name: monitoring-agent
        image: monitoring-agent

2.8 Static PODs

kubelet in a node, can create pods using files in /etc/kubernetes/manifests automatically. But, it can’t do replicasets, deployments, etc

The path for the static pods folder is defined in kubelet config file

kubelet.service <- config file
...
--config=kubeconfig.yaml \ or
--pod-manifest-path=/etc/kubernetes/manifests


kubeconfig.yaml
---
staticPodPath: /etc/kubernetes/manifests

You can check with”docker ps -a” in master for docker images running the static pods.

Static pods is mainly used by master nodes for installing pods related to the kube cluster (control-plane: controller, apiserver, etcd, ..)

Important:

you can’t delete static pods via kubectl. Only by deleting the yaml file for the folder “/etc/kubernetes/manifests”
the pods created via yaml in that folder, will have “-master” added to the name if you are in master node when using “kubectl get pods” or “-nodename” if it is other node.

Comparation Static-Pod vs Daemon-Set

static pod           vs          daemon-set
----------                       -----------
- created by kubelet              - created by kube-api
- deploy control-plane componets  - deploy monitoring, logging
    as static pods                     agents on nodes
- ignored by kube-scheduler       - ignored by kube-scheduler

2.9 Multiple Schedulers

You can write you own scheduler.

How to create it:

kube-scheduler.service
--scheduler-name= custom-scheduler

/etc/kubernetes/manifests/kube-scheduler.yam --> copy and modify
--- (a scheduler is a pod!!!)
apiVersion: v1
kind: Pod
metadata:
  name: my-custom-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=false
    - --scheduler-name=my-custom-scheduler
    - --lock-object-name=my-custom-scheduler
    image: xxx
    name: kube-scheduler
    ports:
    -  containerPort: XXX

Assign new scheduler to pod:

---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
  schedulerName: my-custom-scheduler

How to see logs:

$ kubectl get events ---> view scheduler logs
$ kubectl logs my-custom-scheduler -n kube-system

3- LOGGING AND MONITORING

Monitoring cluster components. There is nothing built-in (Oct 2018).

pay: datadog, dynatrace
Opensource Options: metrics server, prometheus, elastic stack, etc

3.1- metrics server

one per cluster. data kept in memory. kubelet (via cAdvisor) sends data to metric-server.

install: > minukube addons enable metrics-server //or
           other envs: git clone "github path to binary"
                       kubectl create -f deploy/1.8+/

view: > kubectl top node/pod

4- APPLICATION LIFECYCLE MANAGEMENT

4.1- Rolling updates / Rollout

rollout -> a new revision. This is the reason you create “deployment” objects.

There are two strategies:

recreate: destroy all, then create all -> outage! (scale to 0, then scale to X)
rolling update (default): update a container at each time -> no outage (It creates a new replicaset and then starts introducing new pods)

How to apply a new version?

1) Declarative: make change in deployment yaml file
kubectl apply -f x.yaml (recommended)

or

2) Imperative: 
kubectl create deployment nginx-deploy --image=nginx:1.16
kubectl set image deployment/nginx-deploy nginx=nginx:1.17 --record

How to check status of the rollout

status:   $ kubectl rollout status deployment/NAME
history:  $ kubectl rollout history deployment/NAME
rollback: $ kubectl rollout undo deployment/NAME

4.2- Application commands in Docker and Kube

From a “Dockerfile”:

---
FROM Ubuntu
ENTRYPOINT ["sleep"] --> cli commands are appended to entrypoint
CMD ["5"] --> if you dont pass any value in "docker run .." it uses by 
              default 5.
---

With the docker image created above, you can create a container like this:

$ docker run --name ubuntu-sleeper ubuntu-sleeper 10

So now, kubernetes yaml file:

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-sleeper-pod
spec:
  containers:
  -  name: ubuntu-sleeper
     image: ubuntu-sleeper
     command: ["sleep","10"] --> This overrides ENTRYPOINT in docker
     args: ["10"]   --> This overrides CMD [x] in docker
           ["--color=blue"]

4.3- Environment variables

You define them inside the spec.containers.container section:

spec:
 containers:
 - name: x
   image: x
   ports:
   - containerPort: x
   env:
   - name: APP_COLOR
     value: pink

4.4- ConfigMap

Defining env var can be tedious, so config maps is the way to manage them a bit better. You dont have to define in each pod all env vars… just one entry now.

First, create configmap object:

imperative $ kubectl create configmap NAME \
                       --from-literal=KEY=VALUE \
                       --from-literal=KEY2=VALUE2 \
                       or
                       --from-file=FILE_NAME
FILE_NAME
key1: val1
key2: val2

declarative $ kubectl create -f cm.yaml
            $ kubectl get configmaps

cat app-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
KEY1: VAL1
KEY2: VAL2

Apply configmap to a container in three ways:

1) Via "envFrom": all vars

spec:
  containers:
  - name: xxx
    envFrom:   // all values
    -  configMapRef:
         name: app-config

2) Via "env", to import only specific vars

spec:
 containers:
 - name: x
   image: x
   ports:
   - containerPort: x
   env:
   - name: APP_COLOR  -- get one var from a configmap, dont import everything
     valueFrom:
       configMapKeyRef:
         name: app-config
         key: APP_COLOR

3) Volume:

volumes:
- name: app-config-volume
  configMap:
    name: app-config

Check “explain” for more info:

$ kubectl explain pods --recursive | grep envFrom -A3

4.5- Secrets

This is encode in base64 so not really secure. It just avoid to have sensitive info in clear text…

A secret is only sent to a node if a pod on that node requires it.
Kubelet stores the secret into a tmpfs so that the secret is not written to disk storage. Once the Pod that depends on the secret is deleted, kubelet will delete its local copy of the secret data as well:
https://kubernetes.io/docs/concepts/configuration/secret/#risks

How to create secrets:

imperative $ kubectl create secret generic NAME \
                       --from-literal=KEY=VAL \
                       --from-literal=KEY2=VAL2 
                       or
                       --from-file=FILE
cat FILE
DB_Pass: password

declarative $ kubectl create -f secret.yaml

cat secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
data:
  DB_Pass: HASH <---- $ echo -n 'password' | base64 // ENCODE !!!!
                      $ echo -n 'HASH' | base64 --decode // DECODE !!!!

You can apply secrets in three ways:

1) as "envFrom" to import all params from secret object

spec:
  containers:
  - name: xxx 
    envFrom: 
    - secretRef:
        name: app-secret

2) Via "env" to declare only one secret param

spec:
  containers:
  - name: x
    image: x
    env:
      name: APP_COLOR
      valueFrom:
        secretKeyRef:
          name: app-secret
          key: DB_password

3) Volumes:

spec:
  containers:
  - command: ["sleep", "4800"]
    image: busybox
    name: secret-admin
    volumeMounts:
    - name: secret-volume
      mountPath: "/etc/secret-volume"
      readOnly: true
  volumes:
  - name: secret-volume
    secret:
      secretName: app-secret --> each key from the secret file is created
                                 as a file in the volume.
                                 The content of the file is the secret.


$ ls -ltr /etc/secret-volume
DB_Host
DB_User
DB_Password

4.6- Multi-container Pods

Scenarios where your app needs an agent, ie: web server + log agent

apiVersion: v1
kind: Pod
metadata:
  name: simple-webapp
  labels:
    name: simple-webapp
spec:
 containers:
 - name: simple-webapp
   image: simple-webapp
   ports:
   - containerPort: 8080
 - name: log-agent
   image: log-agent

4.7- Init Container

You use an init container when you want to setup something before the other containers are created. Once the initcontainers complete their job, the other containers are created.

An initContainer is configured in a pod like all other containers, except that it is specified inside a initContainers section

You can configure multiple such initContainers as well, like how we did for multi-pod containers. In that case each init container is run one at a time in sequential order.

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'git clone <some-repository-that-will-be-used-by-application> ;']
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

5- CLUSTER MAINTENANCE

5.1- Drain Node

If you need to upgrade/reboot a node, you need to move the pods to somewhere else to avoid an outage.

Commands:

$ kubectl drain NODE -> pods are moved to another nods and it doesnt 
                           receive anything new
$ kubectl uncordon NODE -> node can receive pods now

$ kubectl cordon NODE -> it doesnt drain the node, it just make the node to not receive new pods

“kube-controller-manager” check the status of the nodes. By default, kcm takes 5 minutes to mark down:

$ kube-controller-manager --pod-eviction-timeout=5m0s (by default) time masters waits for a node to be backup

5.2- Kubernetes upgrade

You need to check the version you are running:

$ kubectl get nodes --> version: v_major.minor.path

Important: kube only supports only the last two version from current, ie:

new current v1.12 -> support v1.11 and v1.10 ==> v1.9 is not supported!!!

Important: nothing can be higher version than kube-apiserver, ie:

kube-apiserver=x (v1.10)
- controller-mamanger, kube-scheduler can be x or x-1 (v1.10 , v1.9)
- kubetet, kube-proxy can be x, x-1 or x-2 (v1.8, v1.9, v1.10)
- kubectl can be x+1,x,x-1 !!!

Upgrade path: one minor upgrade at each time: v1.9 -> v1.10 -> v1.11 etc

Summary Upgrade:

1- upgrade master node
2- upgrade worker nodes (modes)
- all nodes at the same time
or
- one node at each time
- add new nodes with the new sw version, move pods to it, delete old node

5.2.1- Upgrade Master

From v1.11 to v1.12

$ kubeadm upgrade plan --> it gives you the info the upgrade

$ apt-get update

$ apt-get install -y kubeadm=1.12.0-00

$ kubeadm upgrade apply v1.12.0

$ kubectl get nodes (it gives you version of kubelet!!!!)

$ apt-get upgrade -y kubelet=1.12.0-00 // you need to do this if you have "master" in "kubectl get nodes"

$ systemctl restart kubelet

$ kubectl get nodes --> you should see "master" with the new version 1.12

5.2.2- Upgrade Worker

From v1.11 to v1.12

master:                     node-1
---------------------       -----------------------
$ kubectl drain node-1
                            apt-get update
                            apt-get install -y kubeadm=1.12.0-00
                            apt-get install -y kubelet=1.12.0-00
                            kubeadm upgrade node \
                                 [config --kubelet-version v1.12.0]
                            systemctl restart kubelet
$ kubectl uncordon node-1
$ apt-mark hold package

5.3- Backup Resources

$ kubectl get all --all-namespaces -o yaml > all-deploy-service.yaml

There are other tools like “velero” from Heptio that can do it. Out of scope for CKA.

5.4- Backup/Restore ETCD – Difficult

“etcd” is important because stores all cluster info.

The difficult part is to get the certificates parameters to get the etcd command working.

– You can get some clues from the static pod definition of etcd:

/etc/kubernetes/manifests/etcd.yaml: Find under exec.command

– or do a ps -ef | grep -i etcd and see the parameters used by other commands

verify command:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                      --cert=/etc/kubernetes/pki/etcd/server.crt \
                      --key=/etc/kubernetes/pki/etcd/server.key \
                      --endpoints=127.0.0.1:2379 member list

create backup:
ETCDCTL_API=3 etcdctl snapshot save SNAPSHOT-BACKUP.db \
                    --endpoints=https://127.0.0.1:2379 \
                    --cacert=/etc/etcd/ca.crt \
                    --cert=/etc/etcd/etcd-server.crt \
                    --key=/etc/etcd/etcd-server.key

verify backup:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                      --cert=/etc/kubernetes/pki/etcd/server.crt \
                      --key=/etc/kubernetes/pki/etcd/server.key \
                      --endpoints=127.0.0.1:2379 \
                      snapshot status PATH/FILE -w table

Summary:

etcd backup:
1- documentation: find the basic command for the API version
2- ps -ef | grep etcd --> get path for certificates
3- run command
4- verify backup

5.3.1- Restore ETCD

// 1- Stop api server
$ service kube-apiserver stop

// 2- apply etcd backup
$ ETCDCTL_API=3 etcdctl snapshot restore SNAPSHOT-BACKUP.db \
                  --endpoints=https://127.0.0.1:2379 \
                  --cacert=/etc/etcd/ca.crt \
                  --cert=/etc/etcd/etcd-server.crt \
                  --key=/etc/etcd/etcd-server.key
                  --data-dir /var/lib/etcd-from-backup \
                  --initial-cluster master-1=https://127.0.0.1:2380,
                                      master-2=https://x.x.x.y:2380 \
                  --initial-cluster-token NEW_TOKEN \
                  --name=master
                  --initial-advertise-peer-urls https://127.0.0.1:2380

// 3- Check backup folder
$ ls -ltr /var/lib/etcd-from-backup -> you should see a folder "member"

// 4- Update etcd.service file. The changes will apply immediately as it is a static pod

$ vim /etc/kubernetes/manifests/etcd.yaml
...
--data-dir=/var/lib/etcd-from-backup (update this line with new path)
--initial-cluster-token=NEW_TOKEN (add this line)
…
volumeMounts:
- mountPath: /var/lib/etcd-from-backup (update this line with new path)
  name: etcd-data
…
volumes:
- hostPath:
    path: /var/lib/etcd-from-backup (update this line with new path)
    type: DirectoryOrCreate
  name: etcd-data

// 5- Reload services
$ systemctl daemon-reload
$ service etcd restart
$ service kube-apiserver start

Important: In cloud env like aws,gcp you dont have access to ectd…

6- SECURITY

6.1- Security Primitives

kube-apiserver: who can access: files, certs, ldap, service accounts
                what can they do: RBAC authorization, ABAC autho

6.2- Authentication

Kubectl :

users: admin, devs                   --> kubectl can't create accounts
service accountsL 3rd parties (bots) --> kubectl can create accounts

You can use static file for authentication – NO RECOMMENDED

file x.csv:
   password, user, uid, gid --> --basic-auth-file=x.csv

token token.csv:
   token, user, uid, gid --> --token-auth-file=token.csv

Use of auth files in kube-api config:

kube-apiserver.yaml
---
spec:
  containers:
  - command: 
    … 
    - --basic-auth-file=x.csv 
    // or
    - --token-auth-file=x.csv

Use of auth in API calls:

$ curl -v -k https://master-node-ip:6443/api/v1/pods -u "user1:password1"
$ curl -v -k https://master-node-ip:6443/api/v1/pods \
    --header "Authorization: Bearer TOKEN"

6.3- TLS / Generate Certs

openssl commands to create required files:

gen key:  openssl genrsa -out admin.key 2048
gen cert: openssl rsa -in admin.key -pubout > mybank.pem
gen csr:  openssl req -new -key admin-key -out admin.csr \
                   -subj "/CN=kube-admin/O=system:masters"
             (admin, scheduler, controller-manager, kube-proxy,etc)

Generate cert with SAN:

0) Gen key: 
openssl genrsa -out apiserver.key 2048

1) Create openssl.cnf with SAN info
[req]
req_extensions = v3_req
[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
IP.1 = 10.96.1.1
IP.2 = 172.16.0.1

2) Gen CSR:
openssl req -new -key apiserver.key -subj "/CN=kube-apiserve" -out apiserver.csr -config openssl.cnf

3) Sign CSR with CA:
openssl x509 -req -in apiserver.csr -CA ca.crt -CAkey ca.key -out apiserver.crt

Self-Signed Cert: Sign the CSR with own key to generate the cert:

$ openssl x509 -req -in ca.csr -signkey ca.key -out ca.crt

User cers to query API:

$ curl https://kube-apiserver:6443/api/v1/pods --key admin.key --cert admin.crt --cacert ca.crt

Kube-api server config related to certs…:

--etcd-cafile=
--etcd-certfile=
--etcd-keyfile=
…
--kubelet-certificate-authority=
--kubelet-client-certificate=
--kubelet-client-key=
…
--client-ca-file=
--tls-cert-file=
--tls-private-key-file=
…

Kubelet-nodes:

server cert name => kubelet-nodeX.crt
                    kubelet-nodeX.key

client cert name => Group: System:Nodes name: system:node:node0x

kubeadm can generate all certs for you:

cat /etc/kubernetes/manifests/kube-apiserver.yaml
spec:
  containers:
  - command:
    - --client-ca-file=
    - --etcd-cafile
    - --etcd-certfile
    - --etcd-keyfile
    - --kubelet-client-certificate
    - --kubelet-client-key
    - --tls-cert-file
    - --tls-private-key-file

How to check CN, SAN and date in cert?

$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout

Where you check if there are issues with certs in a core service:

if installed manually: > journalctl -u etcd.service -l
if installed kubeadm: > kubectl logs etcd-master

6.4- Certificates API

Generate certificates is quite cumbersome. So kubernetes has a Certificates API to generate the certs for users, etc

How to create a certificate for a user:

1) gen key for user
openssl genrsa -out new-admin.key 2048

2) gen csr for user
openssl req -new -key new-admin.key -subl "/CN=jane" -out new-admin.csr

3) create "CertificateSigningRequest" kubernetes object:

cat new-admin-csr.yaml
---
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: jane
spec:
  groups:
  - system:authenticated
  usages:
  - digital signature
  - key encipherment
  - server auth
  request: (cat new-admin.csr | base64)

kubectl create -f new-admin-csr.yaml

4) approve new certificate, it can't be done automatically:
kubectl get csr
kubectl certificate approve new-admin

5) show certificate to send to user
kubectl get certificate new-admin -o yaml --> put "certificate:" in (echo ".." | base64 --decode)

The certs used by CA API are in controller-manager config file:

kube-controller-manager.yaml
--cluster-signing-cert-file=
--cluster-signing-key-file=

6.5- Kubeconfig

kubectl is always querying the API whenever you run a command and use certs. You dont have to type the certs every time because it is configured in the kubectl config at ~HOME/.kube/config.

The kubeconfig file has three sections: clusters, users and contexts (that join users with contexts). And you can have several of each one.

kubeconfig example:

apiVersion: v1
kind: Config
current-context: dev-user@gcp // example: user@cluster

clusters: ///
  - name:
    cluster:
      certificate-authority: PATH/ca.crt 
       //or
      certificate-authority-data: $(cat ca.crt | base64)
      server: https://my-kube-playground:6443

contexts: /// user@cluster
  - name: my-kube-admin@my-kube-playground
    context: my-kube-playground
      user: my-kube-admin
      cluster: my-kube-playground
      namespace: production

users: //
  - name: my-kube-admin
    user:
    client-certificate: PATH/admin.crt
    client-key: PATH/admin.key
    //or
    client-certificate-data: $(cat admin.crt | base64)
    client-key-data: $(cat admin.key | base64)

You can test other user certs:

$ curl https://kube-apiserver:6443/api/v1/pods --key admin.key \
                                     --cert admin.crt --cacert ca.crt

$ kubectl get pods --server my-kube-playground:6443 \
                   --client-key admin.key \
                   --client-certificate admin.crt \
                   --certificate-authority ca.crt \

Use and view kubeconfig file:

$ kubectl get pods [--kubeconfig PATH/FILE]

$ kubectl config view [--kubeconfig PATH/FILE] <-- show kubectl config file

$ kubectl config use-context prod-user@prod <-- change use-context in file too!

6.6- API groups

This is a basic diagram of the API. Main thing is the difference between “api” (core stuff) and “apis” (named stuff – depends on a namespace):

/metrics  /healthx  /version  /api                /apis          /logs
                             (core)               (named)
                              /v1                   |
                      namespace pods rc      /apps /extensions ... (api groups)
                      pv pvc binding...      /v1                  /v1
                                              |
                                     /deployments /replicaset  (resources)
                                          |
                                     -list,get,create,delete,update (verbs)

You can reach the API via curl but using the certs…

$ curl https://localhost:6443 -k --key admin.key --cert admin.crt \
                                 --cacert ca.crt
$ curl https://localhost:6443/apis -k | grep "name"

You can make your life easier using a kubectl proxy that uses the kubectl credentials to access kupeapi

$ kubectl proxy -> launch a proxy in 8001 to avoid use auth each time
                   as it uses the ones from kube config file

$ curl http://localhost:8001 -k

Important:

                    kube proxy  != kubeCTL proxy (reach kubeapi)
    (service running on node for 
     pods connectivity)

6.7- Authorization

What you can do. There are several method to arrange authorization:

Node authorizer: (defined in certificate: Group: SYSTEM:NODES CN: system:node:node01)

ABAC (Atribute Base Access Control): difficult to manage. each user has a policy…
{"kind": "Policy", "spec": {"user": "dev-user", "namespace": "", "resource": "pods", "apiGroup": ""}}

RBAC: Role Base Access Control: mode standard usage. create role, assign users to roles

Webhook: use external 3rd party: ie "open policy agent"

AlwaysAllow, AlwaysDeny

You define the method in the kubeapi config file:

--authorization-mode=AlwaysAllow (default)
or
--authorization-mode=Node,RBAC,Webhook (you use all these mode for each request until allow)

6.8- RBAC

You need to define a role and a binding role (who uses which role) objects. This is “namespaced“.

dev-role.yaml
--
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: dev
  namespace: xxx
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "get", "create", "update", "delete"]
  resourceNames: ["blue", "orange"] <--- if you want to filter at pod level
                                        too: only access to blue,orange
- apiGroups: [""]
  resources: ["configMap"]
  verbs: ["create"]

$ kubectl create -f dev-role.yaml

dev-binding.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-binding
  namespace: xxx
subjects:
- kind: User
  name: dev-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: dev
  apiGroup: rbac.authorization.k8s.io

$ kubectl create -f dev-binding.yaml

Info about roles/rolebind:

$ kubectl get roles
               rolebindings
          describe role dev
                   rolebinding dev-binding

Important: How to test the access of a user?

$ kubectl auth can-i create deployments [--as dev-user] [-n prod]
                     update pods
                     delete nodes

6.9- Cluster Roles

This is for cluster resources (non-namespae): nodes, pv, csr, namespace, cluster-roles, cluster-roles-binding

You can see the full list for each with:

$ kubectl api-resources --namespaced=true/false

The process is the same, we need to define a cluster role and a cluster role binding:

cluster-admin-role.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-administrator
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list", "get", "create", "delete"]

cluster-admin-role-bind.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-role-bind
subjects:
- kind: User
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-administrator
  apiGroup: rbac.authorization.k8s.io

Important: You can create a “cluster role” for a user to access pods (ie), using cluster role, that give it access to all pod in all namespaces.

6.10- Images Security

Secure access to images used by pods. An image can be in docker, google repo, etc

image: docker.io/nginx/nginx
           |       |     |
       registry  user  image
                account

from google: gcr.io/kubernetes-e2e-test-images/dnsutils

You can use a private repository:

$ docker login private.io
  user:
  pass:

$ docker run private.io/apps/internal-app

How to define a private registry in kubectl:

kubectl create secret docker-registry regcred \
--docker-server= \
--docker-username= \
--docker-password= \
--docker-email=

How to use a specific registry in a pod?

spec:
  containers:
  - name: nginx
    image: private.io/apps/internal-app
    imagePullSecrets:
      name: regcred

6.11- Security Contexts

Like in docker, you can assign security params (like user, group id, etc) in kube containers. You can set the security params at pod or container level:

at pod level:
----
spec:
  securityContext:
  runAsUser: 1000

at container level:
---
spec:
  containers:
  - name: ubuntu
    securityContext:
      runAsUser: 100 (user id)
      capabilities: <=== ONY AT CONTAINER LEVEL!
        add: ["MAC_ADMIN"]

6.12- Network Polices

This is like a firewall, iptables implementation for access control at network level. Regardless the network plugin, all pods in a namespace can reach any other pod (without adding any route into the pod).

Network policies are supported in kube-router, calico, romana and weave-net. It is not supported in flannel (yet)

You have ingress (traffic received in a pod) and egress (traffic generated by a pod) rule. You match the rule to a pod using labels with podSelector:

networkpolicy: apply network rule on pods with label role:db to allow only traffic from pods with label name: api-pod into port 3306

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-policy
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  ingress:
  - from: 
    - podSelector:
        matchLabels:
          name: api-pod
    ports:
    - protocol: TCP
      port: 3306

$ kubectl apply -f xxx

6.13- Commands: kubectx / kubens

I haven’t seen any lab requesting the usage. For the exam is not required but maybe for real envs.

Kubectx
reference: https://github.com/ahmetb/kubectx

With this tool, you don't have to make use of lengthy “kubectl config” commands to switch between contexts. This tool is particularly useful to switch context between clusters in a multi-cluster environment.

Installation:
sudo git clone https://github.com/ahmetb/kubectx /opt/kubectx
sudo ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx

Kubens
This tool allows users to switch between namespaces quickly with a simple command.
sudo git clone https://github.com/ahmetb/kubectx /opt/kubectx
sudo ln -s /opt/kubectx/kubens /usr/local/bin/kubens

7- STORAGE

7.1- Storage in Docker

In docker, /container and /images are under /var/lib/docker.

Docker follows a layered architecture (each line in Dockerfile is a layer):

$ docker build --> Read Only (image layer)
$ docker run -> new layer: it is rw (container layer) - lost once docker finish

So docker follows a “copy-on-write” strategy by default. If you want to be able to access that storage after the docker container is destroyer, you can use volumes:

> docker volume create data_volume 
    --> /var/lib/docker/volumes/data_volume
> docker run -v data_volume:/var/lib/mysql mysql
    --> volume mounting -> dir created in docker folders
> docker run --mount type=bind,source=/data/mysql,target=/var/lib/mysql mysl --> path mounting,dir not created in docker folders

volume driver: local, azure, gce, aws ebs, glusterfs, vmware, etc

storage drivers: enable the layer driver: aufs, zfs, btrfs, device mapper, overlay, overlay2

7.2- Volumes, PersistentVolumes and PV claims.

Volume: Data persistence after container is destroyed

spec:
  containers:
  - image: alpine
    volumeMounts:
    - mountPath: /opt
      name: data-volume ==> /data -> alpine:/opt

  volumes:
  - name: data-volume
    hostPath:
      path: /data
      type: Directory

Persistent volumes: cluster pool of volumes that users can request part of it

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-vol1
spec:
  accessModes:
    - ReadWriteOnce (ReadOnlyMode, ReadWriteMany)
  capacity:
    storage: 1Gi
  hostPath:
    path: /tmp/data
  persistentVolumeReclaimPolicy: Retain (default) [Delete, Recycle]

$ kubectl create -f xxx
$ kubectl get persistenvolume [pv]

PV claims: use of a pv. Each pvc is bind to one pv.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

$ kubectl create -f xxx
$ kubectl get persistentvolumeclaim [pvc]  
      ==> If status is "bound" you have matched a PV

Use a PVC in a pod:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: myfrontend
    image: nginx
    volumeMounts:
    - mountPath: "/var/www/html"
      name: mypd
  volumes:
  - name: mypd
    persistentVolumeClaim:
      claimName: myclaim

Important: a PVC will bound to one PV that fits its requirements. Use “get pvc” to check status.

7.3- Storage Class

dynamic provisioning of storage in clouds:

sc-definition -> pvc-definition -> pod-definition 
     ==> we dont need pv-definition! it is created automatically

Example:

sc-definition
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gcp-storage <===========1
provisioner: kubernetes.io/gce-pd
parameters: (depends on provider!!!!)
  type:
  replication-type:

pvc-def
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim <=========2
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: gcp-storage <======1
  resources:
    requests:
      storage: 500Mi

use pvc in pod
---
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: myfrontend
    image: nginx
    volumeMounts:
    - mountPath: "/var/www/html"
      name: mypd <=======3
  volumes:
  - name: mypd <========3
    persistentVolumeClaim:
      claimName: myclaim <===========2

8- NETWORKING

8.1 Linux Networking Basics

$ ip link (show interfaces)

$ ip addr add 192.168.1.10/24 dev eth0
$ route

$ ip route add 192.168.2.0/24 via 192.168.1.1
$ ip route default via 192.168.1.1
            0.0.0.0/0

// enabling forwarding
$ echo 1 > /proc/sys/net/ipv4/ip_forward
$ vim /etc/sysctl.conf
  net.ipv4.ip_forward = 1

8.2 Linux DNS basics

$ cat /etc/resolv.conf 
nameserver 192.168.1.1
search mycompany.com prod.mycompany.com

$ nslookup x.x.x.x
$ dig

8.3 Linux Namespace

// create ns
ip netns add red
ip netns add blue
ip netns (list ns)
ip netns exec red ip link // ip -n red link
ip netns exec red arp

// create virtual ethernet between ns and assign port to them
ip link add veth-red type veth peer name veth-blue 
  (ip -n red link del veth-red)
ip link set veth-red netns red
ip link set veth-blue netns blue

// assign IPs to each end of the veth
ip -n red addr add 192.168.1.11 dev veth-red
ip -n blue addr add 192.168.1.12 dev veth-blue

// enable links
ip -n red link set veth-red up
ip -n blue link set veth-blue up

// test connectivity
ip netns exec red ping 192.168.1.2

======

// create bridge
ip link add v-net-0 type bridge

// enable bridge
ip link set dev v-net-0 up // ( ip -n red link del veth-red)

// create and attach links to bridge from each ns
ip link add veth-red type veth peer name veth-red-br
ip link add veth-blue type veth peer name veth-blue-br

ip link set veth-red netns red
ip link set veth-red-br master v-net-0

ip link set veth-blue netns blue
ip link set veth-blue-br master v-net-0

ip -n red addr add 192.168.1.11 dev veth-red
ip -n blue addr add 192.168.1.12 dev veth-blue

ip -n red link set veth-red up
ip -n blue link set veth-blue up

ip addr add 192.168.1.1/24 dev v-net-0

ip netns exec blue ip route add 192.168.2.0/24 via 192.168.1.1
ip netns exec blue ip route add default via 192.168.1.1

iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -j MASQUERADE
iptables -t nat -A PREROUTING --dport 80 --to-destination 192.168.1.11:80 -j DNAT

8.4 Docker Networking

Three types:

- none: no connectivity
- host: share host network
- bridge: internal network is created and host is attached
   (docker network ls --> bridge -| are the same thing
    ip link --> docker0          -|

iptables -t nat -A DOCKER -j DNAT --dport 8080 --to-destination 192.168.1.11:80

8.5 Container Network Interface

Container runtime must create network namespace:
- identify network the container must attach to
- container runtime to invoke network plugin (bridge) when container is added/deleted
- json format of network config

CNI: 
 must support command line arguments add/del/chec
 must support parametes container id, network ns
 manage IP
 resutls in specific format

**docker is not a CNI**

kubernetes uses docker. it is created in the "host" network and then uses "bridge"

8.6 Cluster Networking

Most common ports:

etcd: 2379 (2380 as client)
kube-api: 6443
kubelet: 10250
kube-scheduler: 10251
kube-controller: 10252
services: 30000-32767

Configure weave-network:

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

$ kubectl get pod -n kube-system | grep -i weave (one per node)

cluster-networking doc: Doesnt give you steps to configure any CNI….

8.7 Pod Networking

every pod should have an ip.
every pod shoud be able to community with every other pod in the same node and other nodes (without nat)

Networking config in kubelet:

--cni-conf-dir=/etc/cni/net.d
--cni-bin-dir=/etc/cni/bin
./net-script.sh add <container> <namespace>

8.8 CNI Weave-net

installs an agent in each node. deploy as pods in nodes

$ kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" 

$ kubectl get pods -n kube-system | grep weave-net

ipam weave:

where pods and bridges get the IPs?
plugin: host-local -> provide free ips from node

8.9 Service Networking

“service” is cluster-wide object. The service has an IP. Kubeproxy in each node, creates iptables rules.

ClusterIP: IP reachable by all pods in the cluster

$ ps -ef | kube-api-server --service-cluster-ip-range=x.x.x.x/y
!! pod network shouldnt overlap with service-cluster
$ iptables -L -t -nat | grep xxx
$ cat /var/log/kube-proxy.log

NodePort: same port in all nodes, sent to the pod

IPs for pod: check logs of pod weave:

$ kubectl -n kube-system logs weave-POD weave 
    --> the pod has two container so you need to specify one of them

IPs for services –> check kube-api-server config

8.10 CoreDNS

For pods and services in the cluster (nodes are managed externally)

kube dns: hostname    namespace  type  root           ip address
          web-service apps       svc   cluster.local  x.x.x.x (service)
          10-244-2-5  default    pod   cluster.local  x.x.x.y (pod)

fqdn: web-service.apps.svc.cluster.local
      10-244-2-5.default.pod.cluster.local

dns implementation in kubernetes use coredns (two pods for ha)

cat /etc/coredns/Corefile
.53: {
  errors // plugins
  health
  kubernetes cluster.local in-addr.arpa ip6.arpa {
     pods insecure // create record for pod as 10-2-3-1 instead of 10.2.3.1
     upstream
     fallthrough in-addr.arpa ip6.arpa
  }
  prometheus: 9153
  proxy: . /etc/resolv.conf // for external queries (google.com) from a pod
  cache: 30
  reload
}

$ kubectl get configmap -n kube-system

pods dns config:

cat /etc/resolv.conf => nameserver IP 
    <- it is the IP from $ kubectl get service -n kubesystem | grep dns
                         this come from the kubelet config:
                         /var/lib/kubelet/config.yaml:
                           clusterDNS:
                           - 10.96.0.10

$ host ONLY_FQDN

8.11 Ingress

Using a service “LoadBalance” is only possible in Cloud env like GCP, AWS, etc

When you create a service loadbalancer, the cloud provider is going to create a proxy/loadbalancer to access that service. so you can create a hierarchy of loadbalancers in the cloud provider… –> too complex ==> sol: Ingress

ingress = controller + resources. Not deployed by default

supported controller: GCP HTTPS Load Balancer (GCE) and NGINX (used in kubernetes)

8.11.1 Controller

1) nginx --> deployment file:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-ingress-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      name: nginx-ingress
  template:
    metadata:
      labels:
        name: nginx-ingress
    spec:
      containers:
      - name: nginx-ingress-controller
        image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.21.0
      args:
      - /nginx-ingress-controller
      - --configmap=$(POD_NAMESPACE)/nginx-configuration
      env:
      - name: POD_NAME
        valueFrom:
          fieldRef:
            fieldPath: metadata.name
      - name: POD_SPACE
        valueFrom:
          filedRef:
            fieldPath: metadata.namespace
      ports:
      - name: http
        containerPort: 80
      - name: https:
        containerPorts: 443

2) nginx configmap used in deployment
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration

3) service
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
  selector:
    name: nginx-ingress

4) service account (auth): roles, clusterroles, rolebinding, etc
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-ingress-serviceaccount

8.11.2 Options to deploy ingress rules

option1) 1rule/1backend: In this case the selector from the service, gives us the pod

ingress-wear.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear
spec:
  backend:
    serviceName: wear-service
    servicePort: 80


option 2) split traffic via URL: 1 Rule / 2 paths

           www.my-online-store.com
          /wear              /watch
                    |
                    V
                  nginx
                    |
           ----------------------
           |                     |
          svc                   svc
          wear                  vid
          ====                  ====
           |                      |
        wear-pod               vid-pod


ingress-wear-watch.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear-watch
spec:
  rules:
  - http: 
      paths: 
      - path: /wear
        backend:
          serviceName: wear-service
          servicePort: 80
      - path: /watch
        backend:
          serviceName: watch-service
          servicePort: 80

$ kubectl describe ingress NAME
    ==> watchout the default backend !!!! 
        if nothing matches, it goes there!!!
        you need to define a default backend



option 3) split by hostname: 2 Rules / 1 path each

wear.my-online-store.com           watch.my-online-store.com
        |------------------------------------|
                           |
                           V
                         nginx
                           |
                ----------------------
                |                    |
               svc                  svc
               wear                 vid
               ====                 ====
                |                    |
            wear-pod               vid-pod


ingress-wear-watch.yaml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-wear-watch
spec:
  rules:
  - host: wear.my-online-store.com 
    http: 
      paths: 
      - backend:
          serviceName: wear-service
          servicePort: 80
  - host: watch.my-online-store.com
    http: 
      paths: 
      - backend:
          serviceName: watch-service
          servicePort: 80

ingress examples: https://kubernetes.github.io/ingress-nginx/examples/

8.12 Rewrite

I havent seen any question about this in the mock labs but just in case: Rewrite url nginx:

For example: replace(path, rewrite-target)
using: http://<ingress-service>:<ingress-port>/wear 
   --> http://<wear-service>:<port>/

In our case: replace("/wear","/")

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress
  namespace: critical-space
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - http: 
      paths: 
      - path: /wear
        backend:
          serviceName: wear-service
          servicePort: 8282

with regex
replace("/something(/|$)(.*)", "/$2")

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  name: rewrite
  namespace: default
spec:
  rules:
  - host: rewrite.bar.com 
    http: 
      paths: 
      - backend:
          serviceName: http-svc
          servicePort: 80
        path: /something(/|$)(.*)

9- Troubleshooting

9.1 App failure

- make an application diagram
- test the services: curl, kubectl describe service (compare with yaml)
- status pod (restarts), describe pod, pod logs (-f)

9.2 Control plane failure

- get nodes, get pods -n kube-system

- master: service kube-apiserver/kube-controller-manager/
                                            kube-scheduler  status
          kubeadm: kubectl logs kube-apiserver-master -n kube-system
          service: sudo journalctl -u kube-apiserver

- worker: service kubelet/kube-proxy status


- Do exist static pods configured in kubelet config?
   1 check /etc/systemd/system/kubelet.service.d/10-kubeadm.confg for config file
   2 check static pod path in kubelet config

9.3 Worker node failure

- get nodes, describe nodes x (check status column)
- top, dh, service kubelet status, kubelet certificates, kubelet service running?
- kubectl cluster-info

10- JSONPATH

10.1 Basics

$ = root dictionary
results are always in [] // list

$.car.price -> [1000]
---
{
  "car": {
    "color": "blue",
    "price": "1000"
   },
  "bus": {
    "color": "red",
    "price": "1200"
   }
}

$[0] -> ["car"]
---
[
 "car",
 "bus",
 "bike
]

$[?(@>40)] == get all numbers greater than 40 in the array -> [45, 60]
---
[
 12,
 45,
 60
]

$.car.wheels[?(@.location == "xxx")].model

// find prize winner named Malala
$.prizes[?(@)].laureates[?(@.firstname == "Malala")]

wildcard
---
$[*].model 
$.*.wheels[*].model

find the first names of all winners of year 2014
$.prizes[?(@.year == 2014)].laureates[*].firstname

lists
---
$[0:3] (start:end) -> 0,1,2 (first 3 elements)
$[0:8:2] (start:end:step) -> 0,0+2=2,2+2=4,4+2=6 -> 
                                elements in position 0,2,4,6
$[-1:0] = last element
$[-1:] = last element
$[-3:] = last 3 elements

10.2 Jsonpath in Kubernetes

$ kubectl get pods -o json

$ kubectl get nodes -o=jsonpath='{.items[*].metada.name}{"\n"}
                                 {.items[*].status.capacity.cpu}'
master node01
4      4

$ kubectl get nodes -o=jsonpath='{range .items[*]}\
                          {.metada.name}{"\t"}{.status.capacity.cpu}{"\n"}\
                          {end}'
master 4
node01 4

$ kubectl get nodes -o=custom-columns=NODE:.metadata.name,
                                      CPU:.status.capacity.cpu …
NODE CPU
master 4
node01 4

$ kubectl get nodes --sort-by= .metadata.name

$ kubectl config view --kubeconfig=/root/my-kube-config 
            -o=jsonpath='{.users[*].name}' > /opt/outputs/users.txt

$ kubectl config view --kubeconfig=my-kube-config 
       -o jsonpath="{.contexts[?(@.context.user=='aws-user')].name}" >
                     /opt/outputs/aws-context-name

Documentation.

11- Install, Config and Validate Kube Cluster

All based on this.

11.1- Basics

education: minikube
           kubeadm/gcp/aws

on-prem: kubeadm

laptop: minikube: deploys VMs (that are ready) - single node cluster
        kubeadm: require VMS to be ready - single/multi node cluster

turnkey solution: you provision, configure and maintein VMs. 
                  Use scripts to deploy cluster (KOPS in AWS)
                 ie: openshift (redhat), Vagrant, VMware PKS, Cloud Foundry

hosted solutions: (kubernetes as a service) provider provision and maintains VMs, install kubernetes: ie GKE in GCP

11.2 HA for Master

api-server --> need LB (active-active)

active/passive
$ kube-controller-manager --leader-elect true [options]
  --leader-elect-lease-duration 15s
  --leader-elect-renew-deadline 10s
  --leader-elect-retry-period 2s

etcd: inside the masters (2 nodes total) or in separated nodes (4 nodes total)

11.3 HA for ETCD

leader etcd, writes and send the info to the others
leader election - RAFT:
   quorum = n/2 + 1 -> minimun num of nodes to accept a transactio
                       successful.
   recommend: 3 etcd nodes minimun => ODD NUMBER

$ export ETCDCTL_API=3
$ etcdctl put key value
$ etcdctl get key
$ etcdctl get / --prefix --keys-only

11.4 Lab Deployment

LAB setup (5nodes)
  1 LB
  2 master nodes (with etcd)
  2 nodes
  weave-net

> download kubernetes latest release from github
> uncompress
> cd kubernetes
> cluster/get-kube-binaries.sh --> downloads the latest binaries for your system.
> cd server; tar -zxvf server-linux-xxx
> ls kubernetes/server/bin

Plan:
1- deploy etcd cluster
2- deploy control plane components (api-server, controller-manager, scheduler)
3- configure haproxy (for apiserver)

        haproxy
           |
 -------------------------
 |                       |
 M1:                     M2:
 api                     api
 etcd                    etcd
 control-manager         control-manager
 scheduler               scheduler

 W1:                      W2:
 gen certs                TLS Bootstrap:
 config kubelet             - w2 creates and configure certs itself
 renew certs                - config kubelet
 config kube-proxy          - w2 to renew certs by itself
                            - config kube-proxy


TLS bootstrap:
1- in Master
 - create bootstrap token and associate it to group "system:bootstrappers"
 - assign role "system:node-bootstrapper" to group "system:bootstrappers"
 - assing role "system:certificates.k8s.io:certificatesigningrequests:nodeclient" to group "system:bootstrappers"
 - assing role "system:certificates.k8s.io:certificatesigningrequests:selfnodeclient" to group "system:node"

2- kubelet.service
   --bootstrap-kubeconfig="/var/lib/kubelet/bootstrap-kubeconfig" 
       // This is for getting the certs to join the cluster!!
   --rotate-certificates=true // this if for the client certs used to join the cluster (CSR automatic approval)
   --rotate-server-certificates=true // these are the certs we created in the master and copied to the worker manually
the server cert requires CSR manual approval !!!

> kubectl get csr
> kubectl certificate approve csr-XXX


bootsrap-kubeconfig
---
apiVersion: 1
clusters:
- cluster:
    certificate-authority: /var/lib/kubernetes/ca.crt
    server: https://192.168.5.30:6443 //(api-server lb IP)
  name: bootstrap
contexts:
- context:
    cluster: bootstrap
    user: kubelet-bootstrap
  name: bootstrap
current-context: bootstrap
kind: Config
preferences: {}
users:
- name: kubelet-bootstrap
  user:
    token: XXXXXXXXXX

11.5 Testing

11.5.1 manual test

$ kubectl get nodes
              pods -n kube-system (coredns, etcd, kube-paiserver, controller-mamanger, proxy, scheduler, weave)

$ service kube-apiserver status
          kube-controller-manager
          kube-scheduler
          kubelet
          kube-proxy

$ kubectl run nginx
          get pods
          scale --replicas=3 deploy/nginx
          get pods

$ kubectl expose deployment nginx --port=80 --type=NodePort
          get service
$ curl http://worker-1:31850

11.5.2 kubetest

end to end test: 1000 tests (12h) // conformance: 160 tests (1.5h)

1- prepare: creates a namespace for this test
2- creates test pod in this namespace, waits for the pods to come up
3- test: executes curl on one popd to reach the ip of another pod over http
4- record result

$ go get -u k8s.io/test-infra/kubetest
$ kubetest --extract=v1.11.3 (your kubernetes version)
$ cd kubernetes
$ export KUBE_MASTER_IP="192.168.26.10:6443"
$ export KUBE_MASTER=kube-master
$ kubetest --test --provider=skeleton > test-out.txt // takes 12 hours
$ kubetest --test --provider=skeleton --test_args="--ginkgo.focus=[Conformance]" > testout.txt // takes 1.5 hours


$ kubeadm join 172.17.0.93:6443 --token vab2bs.twzblu86r60qommq \
--discovery-token-ca-cert-hash sha256:3c9b88fa034a6f894a21e49ea2e2d52435dd71fa5713f23a7c2aaa83284b6700

12- Official cheatsheet

here