networks – Page 9

Linux+MPLS-Part1

In November 2020, I got an email from the FRR email list about using MPLS with FRR. And the answer that you could do already natively (and easily) MPLS in Linux dumbfound me. So I add in my to-do list, try MPLS in Linux as per the blog. So all credits to the author, that’s a great job.

So reading the blog, I learned that the kernel supported MPLS since 4.3 (I am using 5.10) and creating VRF support was challenging until Cumulus did it. Thanks! So since April 2017 there is full support for L3VPNs in Linux… I’m getting a bit late in the wagon.

Anyway, I want to test myself and see if I can make it work. I downloaded the repo from the author to start working on it.

So I am following the same steps as him and will start with a lab consisting of static LSP. This is the diagram:

Main differences in my lab are:

1- I use libvirt instead of VirtualBox

2- I am using debian10 buster64 as VM

This affect the Vagrant file and the script to configure the static LSP. The libvirt_ commands I am using in Vagrantfile are ignored as I am not able to name the interfaces as I want. As well, I had to change the IP addressing as I had collisions with .1. And debian/buster64 has specific interfaces names that I have to use.

So, now we can turn up the lab.

/mpls-linux/lab1-static-lsps$ vagrant up
 Bringing machine 'r1' up with 'libvirt' provider…
 Bringing machine 'r2' up with 'libvirt' provider…
 Bringing machine 'r3' up with 'libvirt' provider…
 ==> r2: Checking if box 'debian/buster64' version '10.4.0' is up to date…
 ==> r3: Checking if box 'debian/buster64' version '10.4.0' is up to date…
 ==> r1: Checking if box 'debian/buster64' version '10.4.0' is up to date…
 ==> r1: Creating image (snapshot of base box volume).
 ==> r2: Creating image (snapshot of base box volume).
 ==> r3: Creating image (snapshot of base box volume).
 ==> r2: Creating domain with the following settings…
 ==> r1: Creating domain with the following settings…
...
/mpls-linux/lab1-static-lsps master$ vagrant status
 Current machine states:
 r1                        running (libvirt)
 r2                        running (libvirt)
 r3                        running (libvirt)

So we can check R1. One important detail here, is how we can defined a static route to reach R3 loopback and it is encapsulated in MPLS with label 100.

/mpls-linux/lab1-static-lsps$ vagrant ssh r1
...
vagrant@R1:~$ lsmod | grep mpls
 mpls_iptunnel          16384  1
 mpls_router            36864  1 mpls_iptunnel
 ip_tunnel              24576  1 mpls_router
 vagrant@R1:~$ 
 vagrant@R1:~$ ip route
 default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.124 metric 1024 
 172.20.15.3  encap mpls  100 via 192.168.12.102 dev ens6 
 192.168.12.0/24 dev ens6 proto kernel scope link src 192.168.12.101 
 192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.124 
 192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.124 metric 1024 
 vagrant@R1:~$ 
 vagrant@R1:~$ ip -4 a
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet 172.20.15.1/32 scope global lo
        valid_lft forever preferred_lft forever
 2: ens5:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.121.124/24 brd 192.168.121.255 scope global dynamic ens5
        valid_lft 3204sec preferred_lft 3204sec
 3: ens6:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.12.101/24 brd 192.168.12.255 scope global ens6
        valid_lft forever preferred_lft forever
 vagrant@R1:~$

Now check R2 as it is our P router between R1 and R3 as per diagram. Important bit here is “ip -M route show”. This shows the MPLS routing label that is based in labels. In the standard “ip route” you dont seen any reference to MPLS.

vagrant@R2:~$ ip -4 a
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet 172.20.15.2/32 scope global lo
        valid_lft forever preferred_lft forever
 2: ens5:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.121.103/24 brd 192.168.121.255 scope global dynamic ens5
        valid_lft 2413sec preferred_lft 2413sec
 3: ens6:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.12.102/24 brd 192.168.12.255 scope global ens6
        valid_lft forever preferred_lft forever
 4: ens7:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.23.102/24 brd 192.168.23.255 scope global ens7
        valid_lft forever preferred_lft forever
 vagrant@R2:~$ ip route
 default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.103 metric 1024 
 192.168.12.0/24 dev ens6 proto kernel scope link src 192.168.12.102 
 192.168.23.0/24 dev ens7 proto kernel scope link src 192.168.23.102 
 192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.103 
 192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.103 metric 1024 
 vagrant@R2:~$ 
 vagrant@R2:~$ lsmod | grep mpls
 mpls_router            36864  0
 ip_tunnel              24576  1 mpls_router
 vagrant@R2:~$ 
 vagrant@R2:~$ ip -M route show
 100 via inet 192.168.23.101 dev ens7 
 200 via inet 192.168.12.101 dev ens6 
 vagrant@R2:~$

So let’s see if pinging the loopback in R1 and R3 gets labelled traffic:

R1 to R3 (on R2)

root@R2:/home/vagrant# tcpdump -i ens6 -U -w - | tee mpls-r1tor3.pcap | tcpdump -r -
 reading from file -, link-type EN10MB (Ethernet)
 tcpdump: listening on ens6, link-type EN10MB (Ethernet), capture size 262144 bytes
 17:14:01.284942 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:03.300756 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:05.284915 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:07.183328 MPLS (label 100, exp 0, [S], ttl 64) IP 192.168.12.101 > 172.20.15.3: ICMP echo request, id 1771, seq 1, length 64
 17:14:07.300556 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:08.186983 MPLS (label 100, exp 0, [S], ttl 64) IP 192.168.12.101 > 172.20.15.3: ICMP echo request, id 1771, seq 2, length 64
 17:14:09.188867 MPLS (label 100, exp 0, [S], ttl 64) IP 192.168.12.101 > 172.20.15.3: ICMP echo request, id 1771, seq 3, length 64

I can see the labelled packet from R1 to R2 with label 100 as expected, but I dont see any “echo reply”…..

But ping is successful based on R1:

vagrant@R1:~$ ping 172.20.15.3
 PING 172.20.15.3 (172.20.15.3) 56(84) bytes of data.
 64 bytes from 172.20.15.3: icmp_seq=1 ttl=63 time=0.746 ms
 64 bytes from 172.20.15.3: icmp_seq=2 ttl=63 time=1.18 ms
 64 bytes from 172.20.15.3: icmp_seq=3 ttl=63 time=1.11 ms
 64 bytes from 172.20.15.3: icmp_seq=4 ttl=63 time=0.728 ms

Something is wrong. As per pic below, with tcpdump in all interfaces, R3 is seeing the echo request from a different source (not R1).

And if I ping using R1 loopback, I can’t see anything leaving R1 ens6 interface.

vagrant@R1:~$ ping 172.20.15.3 -I lo         
 PING 172.20.15.3 (172.20.15.3) from 172.20.15.1 lo: 56(84) bytes of data.
 ^C
 --- 172.20.15.3 ping statistics ---
 25 packets transmitted, 0 received, 100% packet loss, time 576ms

Based on the original blog post, this should work. The main difference here is I am using libvirt. Need to carry on investigating

This is my IP config, 23.1 is my laptop:

9: virbr3:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr3
        valid_lft forever preferred_lft forever
 10: virbr8:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.12.1/24 brd 192.168.12.255 scope global virbr8
        valid_lft forever preferred_lft forever
 11: virbr9:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.23.1/24 brd 192.168.23.255 scope global virbr9
        valid_lft forever preferred_lft forever

NOTES:

How to scp files from vagrant box: link

$ vagrant plugin install vagrant-scp
$ vagrant scp r2:~/*.pcap .

How to ssh to a vagrant box without using “vagran ssh”: link

# save the config to a file 
vagrant ssh-config > vagrant-ssh 

# run ssh with the file
ssh -F vagrant-ssh default

# update your .gitignore for not tracking this file!!!!

How to write and read tcpdump at the same time:

# tcpdump -i ens7 -U -w - | tee mpls-r3tor1.pcap | tcpdump -r -

UPDATE:

Ok, I have tried again. I rebooted my laptop, rebuilt the VMs, etc. And now it works

9: virbr3:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr3
        valid_lft forever preferred_lft forever
 10: virbr8:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.12.1/24 brd 192.168.12.255 scope global virbr8
        valid_lft forever preferred_lft forever
 11: virbr9:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.23.1/24 brd 192.168.23.255 scope global virbr9
        valid_lft forever preferred_lft forever
 root@athens:/boot# uname -a
 Linux athens 5.9.0-5-amd64 #1 SMP Debian 5.9.15-1 (2020-12-17) x86_64 GNU/Linux
 root@athens:/boot#

I can see now clearly, how the ICMP request packet is encapsulated with MPLS tag 100 from R1 to R2 (ens6 interface), then the label is popped in R2, and you can see the same ICMP request leaving R2 via ens7 to R3.

Then the ICMP reply is encapsulated with MPLS tag 200 in R3 to R2 (ens7) and again, the labels is popped in R2, and you see the packet again from R2 (ens6) to R1.

So this test is successful at the end although not sure what I have been doing wrong before.

3phase-PDU

This is a very interesting blog entry from Cloudflare about PDU deployments in DCs and the theory behind. Nowadays nearly everything is cloud oriented. But we forget they still need to be deployed physically. In a old job, going dense in physical setups, requires more power and 3-phase PDUs where a must. The blog is quite good explaining the reasons to move to 3-phase and the challenges to balance the power. When you buy “colo”, you buy space and power, and that is not cheap.

Kubernetes-Docker-ASICs

This week I read that kubernetes is going to stop support for Docker soon. I was quite surprised. I am not an expert so it seems they have legit reasons. But I haven’t read anything from the other side. I think it is going to be painful so I need to try that in my lab and see how to do that migration. It has to be nice to learn that.

In the other end, I read a blog entry about ASICs from Cloudflare. I think without getting too technical it is a good one. And I learn about the different type of ASICs from Juniper. In the last years, I have only used devices powered by Broadcom ASICs. One day, I would like to try that P4/Barefoot Tofino devices. And related to this, I remember this NANOG presentation about ASICs that is really good (and fun!).

gnmi-ssl-p2

I was already playing with gNMI and protobuf a couple of months ago. But this week I received a summary from the last NANOG80 meeting and there was a presentation about it. Great job from Colin!

So I decided to give it a go as the demo was based on docker and I have already my Arista lab in cEOS and vEOS as targets.

I started my 3node-ring cEOS lab with docker-topo

ceos-testing/topology master$ docker-topo --create 3-node-simple.yml
INFO:main:Version 2 requires sudo. Restarting script with sudo
[sudo] password for xxx:
INFO:main:
alias r01='docker exec -it 3node_r01 Cli'
alias r02='docker exec -it 3node_r02 Cli'
alias r03='docker exec -it 3node_r03 Cli'
INFO:main:All devices started successfully

Checked they were up:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4160cc354ba2 ceos-lab:4.23.3M "/sbin/init systemd.…" 7 minutes ago Up 7 minutes 0.0.0.0:2002->22/tcp, 0.0.0.0:9002->443/tcp 3node_r03
122f72fb25bd ceos-lab:4.23.3M "/sbin/init systemd.…" 7 minutes ago Up 7 minutes 0.0.0.0:2001->22/tcp, 0.0.0.0:9001->443/tcp 3node_r02
68cf8ca39130 ceos-lab:4.23.3M "/sbin/init systemd.…" 7 minutes ago Up 7 minutes 0.0.0.0:2000->22/tcp, 0.0.0.0:9000->443/tcp 3node_r01

And then, check I had gnmi config in r01:

!
management api gnmi
transport grpc GRPC
port 3333
!

Need to find the IP of r01 in “3node_net-0” as the one used for management. I have had so many times hit this issue,…

$ docker inspect 3node_r01
...
"Networks": {
 "3node_net-0": {
 "IPAMConfig": null, 
 "Links": null,
 "Aliases": [ "68cf8ca39130" ],
 "NetworkID": "d3f72e7473228488f668aa3ed65b6ea94e1c5c9553f93cf0f641c3d4af644e2e", "EndpointID": "bca584040e71a826ef25b8360d92881dad407ff976eff65a38722fd36e9fc873", "Gateway": "172.20.0.1", 
"IPAddress": "172.20.0.2",
....

Now, I cloned the repo and followed the instructions/video. Copied targets.jon and updated it with my r01 device details:

~/storage/technology/gnmi-gateway release$ cat examples/gnmi-prometheus/targets.json 
{
  "request": {
    "default": {
      "subscribe": {
        "prefix": {
        },
        "subscription": [
          {
            "path": {
              "elem": [
                {
                  "name": "interfaces"
                }
              ]
            }
          }
        ]
      }
    }
  },
  "target": {
    "r01": {
      "addresses": [
        "172.20.0.2:3333"
      ],
      "credentials": {
        "username": "xxx",
        "password": "xxx"
      },
      "request": "default",
      "meta": {
        "NoTLS": "yes"
      }
    }
  }
}

Carrying out with the instructions, build docker gnmi-gateway, docker bridge and run docker gnmi-gateway built earlier.

go:1.14.6|py:3.7.3|tomas@athens:~/storage/technology/gnmi-gateway release$ docker run \
-it --rm \
-p 59100:59100 \
-v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json \
--name gnmi-gateway-01 \
--network gnmi-net \
gnmi-gateway:latest
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Initializing target r01 ([172.27.0.2:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Target r01: Connecting"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Target r01: Subscribing"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T16:54:38Z","message":"Target r01: Disconnected"}
E1107 16:54:38.382032 1 reconnect.go:114] client.Subscribe (target "r01") failed: client "gnmi" : client "gnmi" : Dialer(172.27.0.2:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T16:54:48Z","message":"Target r01: Disconnected"}
E1107 16:54:48.935965 1 reconnect.go:114] client.Subscribe (target "r01") failed: client "gnmi" : client "gnmi" : Dialer(172.27.0.2:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s

bash-4.2# tcpdump -i any tcp port 3333 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
17:07:57.621011 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 76: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [S], seq 557316949, win 64240, options [mss 1460,sackOK,TS val 3219811744 ecr 0,nop,wscale 7], length 0
17:07:57.621069 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 76: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [S.], seq 243944609, ack 557316950, win 65160, options [mss 1460,sackOK,TS val 1828853442 ecr 3219811744,nop,wscale 7], length 0
17:07:57.621124 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621348 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 89: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [P.], seq 1:22, ack 1, win 510, options [nop,nop,TS val 1828853442 ecr 3219811744], length 21
17:07:57.621409 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [.], ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621492 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 320: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [P.], seq 1:253, ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 252
17:07:57.621509 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 68: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [.], ack 253, win 509, options [nop,nop,TS val 1828853442 ecr 3219811744], length 0
17:07:57.621586 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [F.], seq 253, ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621904 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 68: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [R.], seq 22, ack 254, win 509, options [nop,nop,TS val 1828853443 ecr 3219811744], length 0

Ok, the container is created and seems running but the gnmi-gateway can’t connect to my cEOS r01….

First thing, I had to check iptables. It is not the first time that when playing with docker and building different environments (vEOS vs gnmi-gateway) with different docker commands, iptables may be not configured properly.

And it was the case again:

# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN

So I moved the new docker bridge network for gnmi-gateway after “ACCEPT” and solved.

# iptables -t filter -D DOCKER-ISOLATION-STAGE-1 -j ACCEPT
# iptables -t filter -I DOCKER-ISOLATION-STAGE-1 -j ACCEPT
#
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#

So, restarted gnmi-gateway, still same issue. Ok, I decided to check if the packets were actually hitting r01.

So at first sight, the tcp handshake is established but then there is TCP RST….

So I double checked that gnmi was runnig in my side:

r1#show management api gnmi 
Enabled:            Yes
Server:             running on port 3333, in MGMT VRF
SSL Profile:        none
QoS DSCP:           none
r1#

At that moment, I thought that was an issue in cEOS… checking logs I couldnt see any confirmation but I decided to give it a go with vEOS that is more feature rich. So I turned up my GCP lab and followed the same steps with gnmi-gateway. I updated the targets.json with the details of one of my vEOS devices. And run again:

~/gnmi/gnmi-gateway release$ sudo docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Initializing target gcp-r1 ([192.168.249.4:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Connecting"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Subscribing"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T19:22:30Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:30.048410 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T19:22:40Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:40.603141 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s

Again, same issue. Let’s see from vEOS perspective.

bash-4.2# tcpdump -i any tcp port 3333 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
18:52:31.874137 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 76: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [S], seq 4076065498, win 64240, options [mss 1460,sackOK,TS val 1752943121 ecr 0,nop,wscale 7], length 0
18:52:31.874579 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 76: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [S.], seq 3922060793, ack 4076065499, win 28960, options [mss 1460,sackOK,TS val 433503 ecr 1752943121,nop,wscale 7], length 0
18:52:31.875882 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 1752943123 ecr 433503], length 0
18:52:31.876284 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 320: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [P.], seq 1:253, ack 1, win 502, options [nop,nop,TS val 1752943124 ecr 433503], length 252
18:52:31.876379 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 68: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [.], ack 253, win 235, options [nop,nop,TS val 433504 ecr 1752943124], length 0
18:52:31.929448 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 89: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [P.], seq 1:22, ack 253, win 235, options [nop,nop,TS val 433517 ecr 1752943124], length 21
18:52:31.930028 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [.], ack 22, win 502, options [nop,nop,TS val 1752943178 ecr 433517], length 0
18:52:31.930090 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [F.], seq 253, ack 22, win 502, options [nop,nop,TS val 1752943178 ecr 433517], length 0
18:52:31.931603 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 68: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [R.], seq 22, ack 254, win 235, options [nop,nop,TS val 433517 ecr 1752943178], length 0

So again in GCP, tcp is established but then TCP RST. As vEOS is my last resort, I tried to dig into that TCP connection. I downloaded a pcap to analyze with wireshark so get a better visual clue…

So, somehow, gnmi-gateway is trying to negotiate TLS!!! As per my understanding, my targets.json was configured with “NoTLS”: “yes” so that should be avoid, shouldn’t be?

At that moment, I wanted to know how to identfiy TLS/SSL packets using tcpdump as it is not always that easy to get quickly a pcap in wireshark. So I found the answer here:

bash-4.2# tcpdump -i any "tcp port 3333 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:47:01.367197 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50486 > 192.168.249.4.dec-notes: Flags [P.], seq 2715923852:2715924104, ack 2576249027, win 511, options [nop,nop,TS val 1194424180 ecr 1250876], length 252
19:47:02.405870 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50488 > 192.168.249.4.dec-notes: Flags [P.], seq 680803294:680803546, ack 3839769659, win 511, options [nop,nop,TS val 1194425218 ecr 1251136], length 252
19:47:04.139458 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50490 > 192.168.249.4.dec-notes: Flags [P.], seq 3963338234:3963338486, ack 1760248652, win 511, options [nop,nop,TS val 1194426952 ecr 1251569], length 252

Not something easy to remember 🙁

Ok, I wanted to be sure that gnmi was functional in vEOS and by a quick internet look up, I found this project gnmic! Great job by the author!

So I configured the tool and tested with my vEOS. And worked (without needing TLS)

~/gnmi/gnmi-gateway release$ gnmic -a 192.168.249.4:3333 -u xxx -p xxx --insecure --insecure get \
--path "/interfaces/interface[name=*]/subinterfaces/subinterface[index=*]/ipv4/addresses/address/config/ip"
Get Response:
[
{
"time": "1970-01-01T00:00:00Z",
"updates": [
{
"Path": "interfaces/interface[name=Management1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.249.4]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.249.4"
}
},
{
"Path": "interfaces/interface[name=Ethernet2]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.13.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.13.1"
}
},
{
"Path": "interfaces/interface[name=Ethernet3]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.1.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.1.1"
}
},
{
"Path": "interfaces/interface[name=Ethernet1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.12.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.12.1"
}
},
{
"Path": "interfaces/interface[name=Loopback1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.0.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.0.1"
}
},
{
"Path": "interfaces/interface[name=Loopback2]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.0.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.0.1"
}
}
]
}
]
~/gnmi/gnmi-gateway release$

So, I kind of I was sure that my issue was configuring gnmi-gateway. I tried to troubleshoot it: removed the NoTLS, using the debugging mode, build the code, read the Go code for Target (too complex for my Goland knowledge 🙁 )

So at the end, I gave up and opened an issue with gnmi-gateway author. And he answered super quick with the solution!!! I misunderstood the meaning of “NoTLS” 🙁

So I followed his instructions to configure TLS in my gnmi cEOS config

security pki certificate generate self-signed r01.crt key r01.key generate rsa 2048 validity 30000 parameters common-name r01
!
management api gnmi
transport grpc GRPC
ssl profile SELFSIGNED
port 3333
!
...
!
management security
ssl profile SELFSIGNED
certificate r01.crt key r01.key
!
end

and all worked straightaway!

~/storage/technology/gnmi-gateway release$ docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Initializing target r01 ([172.20.0.2:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Connecting"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Subscribing"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Connected"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Synced"}
{"level":"info","time":"2020-11-08T09:39:16Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-08T09:39:45Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-08T09:40:15Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}

So I can start prometheus

~/storage/technology/gnmi-gateway release$ docker run \
-it --rm \
-p 9090:9090 \
-v $(pwd)/examples/gnmi-prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
--name prometheus-01 \
--network gnmi-net \
prom/prometheus
Unable to find image 'prom/prometheus:latest' locally
latest: Pulling from prom/prometheus
76df9210b28c: Pull complete
559be8e06c14: Pull complete
66945137dd82: Pull complete
8cbce0960be4: Pull complete
f7bd1c252a58: Pull complete
6ad12224c517: Pull complete
ee9cd36fa25a: Pull complete
d73034c1b9c3: Pull complete
b7103b774752: Pull complete
2ba5d8ece07a: Pull complete
ab11729a0297: Pull complete
1549b85a3587: Pull complete
Digest: sha256:b899dbd1b9017b9a379f76ce5b40eead01a62762c4f2057eacef945c3c22d210
Status: Downloaded newer image for prom/prometheus:latest
level=info ts=2020-11-08T09:40:26.622Z caller=main.go:315 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:353 msg="Starting Prometheus" version="(version=2.22.1, branch=HEAD, revision=00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:358 build_context="(go=go1.15.3, user=root@516b109b1732, date=20201105-14:02:25)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:359 host_details="(Linux 5.9.0-1-amd64 #1 SMP Debian 5.9.1-1 (2020-10-17) x86_64 b0fadf4a4c80 (none))"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:360 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:361 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-11-08T09:40:26.641Z caller=main.go:712 msg="Starting TSDB …"
level=info ts=2020-11-08T09:40:26.641Z caller=web.go:516 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-11-08T09:40:26.668Z caller=head.go:642 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-11-08T09:40:26.669Z caller=head.go:656 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=103.51µs
level=info ts=2020-11-08T09:40:26.669Z caller=head.go:662 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-11-08T09:40:26.672Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-11-08T09:40:26.672Z caller=head.go:719 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=123.684µs wal_replay_duration=2.164743ms total_replay_duration=3.357021ms
level=info ts=2020-11-08T09:40:26.675Z caller=main.go:732 fs_type=2fc12fc1
level=info ts=2020-11-08T09:40:26.676Z caller=main.go:735 msg="TSDB started"
level=info ts=2020-11-08T09:40:26.676Z caller=main.go:861 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-11-08T09:40:26.684Z caller=main.go:892 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=7.601103ms remote_storage=22.929µs web_handler=623ns query_engine=1.64µs scrape=5.517391ms scrape_sd=359.447µs notify=18.349µs notify_sd=3.921µs rules=15.744µs
level=info ts=2020-11-08T09:40:26.685Z caller=main.go:684 msg="Server is ready to receive web requests."

Now we can open prometheus UI and verify if we are consuming data from cEOS r01.

Yeah! it is there.

So all working at then. It has a nice experience. At the end of the day, I want to know more about gNMI/protobuffer, etc. The cold thing here is you can get telemetry and configuration management of your devices. So using gnmi-gateway (that is more for a high availability env like Netflix) and gnmic are great tools to get your head around.

Other lab I want to try is this eos-gnmi-telemetry-grafana.

The to-do list always keeps growing.

cumulus-basics

Today finally I have managed to get a very basic cumulus setup. It is annoying because I tried several months ago and found some issues with libvirt (and I opened a ticket but didnt follow up) and gave up.

Now it works. I just want to use KVM-QEMU and Vagrant, that I have already installed in my system. So based on the link, I just created a folder and copied the vagrant file. Then “vagrant up” and wait.

/cumulus/1s2l$ vagrant up
Bringing machine 'spine01' up with 'libvirt' provider…
Bringing machine 'leaf01' up with 'libvirt' provider…
Bringing machine 'leaf02' up with 'libvirt' provider…
==> leaf01: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and install…
leaf01: Box Provider: libvirt
leaf01: Box Version: 4.2.0
==> leaf01: Loading metadata for box 'CumulusCommunity/cumulus-vx'
leaf01: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
==> leaf01: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
leaf01: Downloading: https://vagrantcloud.com/CumulusCommunity/boxes/cumulus-vx/versions/4.2.0/providers/libvirt.box
Download redirected to host: d2cd9e7ca6hntp.cloudfront.net
==> leaf01: Successfully added box 'CumulusCommunity/cumulus-vx' (v4.2.0) for 'libvirt'!
==> spine01: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and install…
spine01: Box Provider: libvirt
spine01: Box Version: 4.2.0
==> leaf01: Uploading base box image as volume into Libvirt storage…
==> spine01: Loading metadata for box 'CumulusCommunity/cumulus-vx'
spine01: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
Progress: 0%==> spine01: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
Progress: 0%==> leaf02: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and install…
leaf02: Box Provider: libvirt
leaf02: Box Version: 4.2.0
Progress: 1%==> leaf02: Loading metadata for box 'CumulusCommunity/cumulus-vx'
leaf02: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
==> leaf02: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
==> leaf01: Creating image (snapshot of base box volume).
==> spine01: Creating image (snapshot of base box volume).
==> leaf02: Creating image (snapshot of base box volume).
==> leaf01: Creating domain with the following settings…
==> leaf01: -- Name: 1s2l_leaf01
==> leaf02: Creating domain with the following settings…
==> spine01: Creating domain with the following settings…
==> leaf02: -- Name: 1s2l_leaf02
==> spine01: -- Name: 1s2l_spine01
==> leaf01: -- Domain type: kvm
==> leaf02: -- Domain type: kvm
==> spine01: -- Domain type: kvm
==> leaf01: -- Cpus: 1
==> leaf02: -- Cpus: 1
==> spine01: -- Cpus: 1
==> leaf01: -- Feature: acpi
==> leaf02: -- Feature: acpi
==> spine01: -- Feature: acpi
==> leaf01: -- Feature: apic
==> leaf01: -- Feature: pae
==> leaf01: -- Memory: 768M
==> leaf02: -- Feature: apic
==> spine01: -- Feature: apic
==> spine01: -- Feature: pae
....
....

You can check the VMs are up:

/cumulus/1s2l$ vagrant status
Current machine states:
spine01 running (libvirt)
leaf01 running (libvirt)
leaf02 running (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.
/cumulus/1s2l$

And we can login and create some network interfaces as per documentation:

/cumulus/1s2l$ vagrant ssh leaf01
Linux leaf01 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4u5 (2020-07-10) x86_64
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
vagrant@leaf01:mgmt:~$ net add interface swp1,swp2,swp3
vagrant@leaf01:mgmt:~$ net commit
--- /etc/network/interfaces 2020-07-15 01:15:58.000000000 +0000
+++ /run/nclu/ifupdown2/interfaces.tmp 2020-10-31 14:12:30.826000000 +0000
@@ -5,15 +5,24 @@
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
vrf mgmt
+auto swp1
+iface swp1
+
+auto swp2
+iface swp2
+
+auto swp3
+iface swp3
+
auto mgmt
iface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
net add/del commands since the last "net commit"
User Timestamp Command
------- -------------------------- --------------------------------
vagrant 2020-10-31 14:12:27.070219 net add interface swp1,swp2,swp3
vagrant@leaf01:mgmt:~$

And after configuring the interfaces in the three VMs, we have LLDP working:

/cumulus/1s2l$ vagrant ssh leaf01
Linux leaf01 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4u5 (2020-07-10) x86_64
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
Last login: Sat Oct 31 14:12:04 2020 from 10.255.1.1
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$ net show lldp
LocalPort Speed Mode RemoteHost RemotePort
--------- ----- ------- ---------- ----------
swp1 1G Default spine01 swp1
swp2 1G Default leaf02 swp2
swp3 1G Default leaf02 swp3
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$ net show system
Hostname……… leaf01
Build………… Cumulus Linux 4.2.0
Uptime……….. 0:06:09.180000
Model………… Cumulus VX
Memory……….. 669MB
Vendor Name…… Cumulus Networks
Part Number…… 4.2.0
Base MAC Address. 52:54:00:17:87:07
Serial Number…. 52:54:00:17:87:07
Product Name….. VX
vagrant@leaf01:mgmt:~$ exit

So I am happy because now I have something to play with and try to build an MPLS lab with cumulus. At some point I would like to try some quaga/frr lab.

I am pretty sure that in the past, I didnt have to type my password every single time I run a vagrant command….

Ok, we can shutdown the VMs and start the work for the next time:

/cumulus/1s2l$ vagrant halt spine01 leaf01 leaf02
==> leaf02: Halting domain…
==> leaf01: Halting domain…
==> spine01: Halting domain…
/cumulus/1s2l$
/cumulus/1s2l$
/cumulus/1s2l$ vagrant status
Current machine states:
spine01 shutoff (libvirt)
leaf01 shutoff (libvirt)
leaf02 shutoff (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.
/cumulus/1s2l$

BGP-StockMarket-EGB

I was reading through my backlog and noticed too close by incidents. A BGP hijack on 30th September from Telstra and Tokyo Stock Exchange outage on 2nd Oct. At the end of the day, small mistakes/errors (on purpose or not) can cause massive impact (depending on your point of view). For BGP, RPKI is the security framework to make sure the advertised routes belong to the real owners. Yeah, quick summary. But at the end of the day, not all Internet providers are using RPKI, and even if you use it, you can make mistakes. This is better than nothing. For the exchanges, thinking that a piece of hardware can cause a stop to a 6 trillion $ market is crazy. And it seems is just a 350 servers system. That tells me that you dont need the biggest system to hold the biggest value and you will always hit a problem no matter how safe/resilience is your design/implementation/etc. Likely I am making this up and I need to review the book, but one of the conclusions I took from it, via Godel, it doesn’t matter how many statements you use to declare your (software) system, you can always find a weakness (false statement).

Evolved-Indiana

This week I realised that Juniper JunOS was moving to Linux…. called Evolved. I guess they will still be supporting FreeBSD version but long term will be Linux. I am quite surprised as this was really announced early 2020, always late joining the party. So all big boys are running linux at some level: Cisco has done it sometime ago with nx-os, Brocade/Extrene did it too with SLX (based on Ubuntu) and obviously Arista with EOS (based on Fedora). So the trend of more “open” network OS will be on the raise.

And as well, I finished “Indiana Jones and the Temple of Doom” book. Indiana Jones films are among my favourites… although this was was always considered the “worse” (I erased from my mind the “fourth”) I have really enjoyed the book. It was like watching the movie at slow pace and didnt care that I knew the plot. I will get the other books likely.

NTS

From a new Cloudflare post, I learned that NTS is a standard. To be honest, I can’t remember there was work for making NTP secure. In the last years I have seen development in PTP for time sync in financial systems but nothing else. So it is nice to see this happening. We only need to encrypt BGP and we are done in the internet.. oh wait. Dreaming is free.

So I am trying to install and configure NTS in my system following these links: link1 link2

I have just installed ntpsec via debian packages system and that’s it, ntpsec is running…

# apt install ntpsec
...
# service ntpsec status
● ntpsec.service - Network Time Service
Loaded: loaded (/lib/systemd/system/ntpsec.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-10-04 20:35:58 BST; 6min ago
Docs: man:ntpd(8)
Main PID: 292116 (ntpd)
Tasks: 1 (limit: 9354)
Memory: 10.2M
CGroup: /system.slice/ntpsec.service
└─292116 /usr/sbin/ntpd -p /run/ntpd.pid -c /etc/ntpsec/ntp.conf -g -N -u ntpsec:ntpsec
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_check: processing 3.debian.pool.ntp.org, 8, 101
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_take_status: 3.debian.pool.ntp.org=>good, 8
#

Checking the default config, there is nothing configured to use NTS so I made some changes based on the links above:

# vim /etc/ntpsec/ntp.conf
...


# Public NTP servers supporting Network Time Security:
server time.cloudflare.com:1234 nts

# Example 2: NTS-secured NTP (default NTS-KE port (123); using certificate pool of the operating system)
server ntp1.glypnod.com iburst minpoll 3 maxpoll 6 nts

#Via https://www.netnod.se/time-and-frequency/how-to-use-nts
server nts.ntp.se:3443 nts iburst
server nts.sth1.ntp.se:3443 nts iburst
server nts.sth2.ntp.se:3443 nts iburst

After restart, still not seeing NTS in sync 🙁

# service ntpsec restart
...
# ntpq -puw
remote refid st t when poll reach delay offset jitter
time.cloudflare.com .NTS. 16 0 - 64 0 0ns 0ns 119ns
ntp1.glypnod.com .NTS. 16 5 - 32 0 0ns 0ns 119ns
2a01:3f7:2:202::202 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:52::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:62::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
0.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
1.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
2.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
3.debian.pool.ntp.org .POOL. 16 p - 64 0 0ns 0ns 119ns
-229.191.57.185.no-ptr.as201971.net .GPS. 1 u 25 64 177 65.754ms 26.539ms 7.7279ms
+ns3.turbodns.co.uk 85.199.214.99 2 u 23 64 177 12.200ms 2.5267ms 1.5544ms
+time.cloudflare.com 10.21.8.19 3 u 25 64 177 5.0848ms 2.6248ms 2.6293ms
-ntp1.wirehive.net 202.70.69.81 2 u 21 64 177 9.6036ms 2.3986ms 1.9814ms
+ns4.turbodns.co.uk 195.195.221.100 2 u 21 64 177 10.896ms 2.9528ms 1.5288ms
-lond-web-1.speedwelshpool.com 194.58.204.148 2 u 23 64 177 5.6202ms 5.8218ms 3.2582ms
-time.shf.uk.as44574.net 85.199.214.98 2 u 29 64 77 9.0190ms 4.9419ms 2.5810ms
lux.22pf.org .INIT. 16 u - 64 0 0ns 0ns 119ns
ns1.thorcom.net .INIT. 16 u - 64 0 0ns 0ns 119ns
time.cloudflare.com .INIT. 16 u - 64 0 0ns 0ns 119ns
time.rdg.uk.as44574.net .INIT. 16 u - 64 0 0ns 0ns 119ns
-herm4.doylem.co.uk 185.203.69.150 2 u 19 64 177 15.024ms 9.5098ms 3.2011ms
-213.251.53.217 193.62.22.74 2 u 17 64 177 5.7211ms 1.4122ms 2.1895ms
*babbage.betadome.net 85.199.214.99 2 u 20 64 177 4.8614ms 4.1187ms 2.5533ms
#
#
# ntpq -c nts
NTS client sends: 56
NTS client recvs good: 0
NTS client recvs w error: 0
NTS server recvs good: 0
NTS server recvs w error: 0
NTS server sends: 0
NTS make cookies: 0
NTS decode cookies: 0
NTS decode cookies old: 0
NTS decode cookies too old: 0
NTS decode cookies error: 0
NTS KE probes good: 8
NTS KE probes_bad: 0
NTS KE serves good: 0
NTS KE serves_bad: 0
#

I ran tcpdump filtering on TCP ports 1234 (cloudflare) and 3443 (netnod), and I can see my system trying to negotiate NTS with Cloudflare and NetNod but both sessions are TCP RST 🙁

Need to carry on researching…

SR and TI-LFA

Segment Routing (SR) and Topology Independent Loop Free Alternates (TI-LFA)

Intro

As part of having a MPLS SR lab, I wanted to test FRR (Fast Rerouting) solutions. Arista provides support for FRR TI-LFA based on this link. Unfortunately, if you are not a customer you can’t see that 🙁

But there are other links where you can read about TI-LFA. The two from juniper confuses me when calculating P/Q groups in pre-converge time…

https://blogs.juniper.net/en-us/industry-solutions-and-trends/segment-routing-sr-and-topology-independent-loop-free-alternates-ti-lfa

https://storage.googleapis.com/site-media-prod/meetings/NANOG79/2196/20200530_Bonica_The_Evolution_Of_v1.pdf

The documents above explain the evolution from Loop Free Alternates (LFA) to Remote LFA (RLFA) and finally to TI-LFA.

TI-LFA overcomes the limitations of RLFA using SR paths as repair tunnels.

As well, I have tried to read IETF draft and I didn’t understand things better 🙁

And I doubt I am going to improve it here 🙂

As well, Cisco has good presentations (longer and denser) about SR and TI-LFA.

https://www.ciscolive.com/c/dam/r/ciscolive/us/docs/2016/pdf/BRKRST-3020.pdf

https://www.segment-routing.net/tutorials/2016-09-27-topology-independent-lfa-ti-lfa/

Juniper docs mention always “pre-convergence” but Cisco uses “post-convergence”. I think “post” it is more clear.

EOS TI-LFA Limitations

Backup paths are not computed for prefix segments that do not have a host mask (/32 for v4 and /128 for v6).
When TI-LFA is configured, the number of anycast segments generated by a node cannot exceed 10.
Computing TI-LFA backup paths for proxy node segments is not supported.
Backup paths are not computed for node segments corresponding to multi-homed prefixes. The multi-homing could be the result of them being anycast node segments, loopback interfaces on different routers advertising SIDs for the same prefix, node segments leaked between levels and thus being seen as originated from multiple L1-L2 routers.
Backup paths are only computed for segments that are non-ECMP.
Only IS-IS interfaces that are using the point-to-point network type are eligible for protection.
The backup paths are only computed with respect to link/node failure constraints. SRLG constraint is not yet supported.
Link/node protection only supported in the default VRF owing to the lack of non-default VRF support for IS-IS segment-routing.
Backup paths are computed in the same IS-IS level topology as the primary path.
Even with IS-IS GR configured, ASU2, SSO, agent restart are not hitless events for IS-IS SR LFIB routes or tunnels being
protected by backup paths.

LAB

Based on this, I built a lab using 4.24.1.1F 64 bits on EVE-NG. All links have default ISIS cost of 10 (loopbacks are 1) and we have TI-LFA node-protection enabled globally.

The config are quite simple. This is l1r9. The only change is the IP addressing. The links in the diagram show the third octet of the link address range.

!
service routing protocols model multi-agent
!
hostname l1r9
!
spanning-tree mode mstp
!
aaa authorization exec default local
!
no aaa root
!
vrf instance MGMT
!
interface Ethernet1
no switchport
ip address 10.0.10.2/30
isis enable CORE
isis network point-to-point
!
interface Ethernet2
no switchport
ip address 10.0.11.2/30
isis enable CORE
isis network point-to-point
!
interface Ethernet3
no switchport
ip address 10.0.12.1/30
isis enable CORE
isis network point-to-point
!
interface Ethernet4
no switchport
ip address 10.0.13.1/30
isis enable CORE
isis network point-to-point
!
interface Loopback1
description CORE Loopback
ip address 10.0.0.9/32
node-segment ipv4 index 9
isis enable CORE
isis metric 1
!
interface Management1
vrf MGMT
ip address 192.168.249.18/24
!
ip routing
ip routing vrf MGMT
!
ip route vrf MGMT 0.0.0.0/0 192.168.249.1
!
mpls ip
!
mpls label range isis-sr 800000 65536
!
router isis CORE
net 49.0000.0001.0010.0000.0000.0009.00
is-type level-2
log-adjacency-changes
timers local-convergence-delay protected-prefixes
set-overload-bit on-startup wait-for-bgp
!
address-family ipv4 unicast
bfd all-interfaces
fast-reroute ti-lfa mode node-protection
!
segment-routing mpls
router-id 10.0.0.9
no shutdown
adjacency-segment allocation sr-peers backup-eligible
!
management api http-commands
protocol unix-socket
no shutdown
!
vrf MGMT
no shutdown
!

Using this script (using nornir/napalm), I gather the output of all these commands from all routers:

"show isis segment-routing prefix-segments" -> shows if protection is enabled for these segments

"show isis segment-routing adjacency-segments" -> shows is protection is enabled for these segments

"show isis interface" -> shows state of protection configured

"show isis ti-lfa path" -> shows the repair path with the list of all the system IDs from the P-node to the Q-node for every destination/constraint tuple. You will see that even though node protection is configured a link protecting LFA is computed too. This is to fallback to link protecting LFAs whenever the node protecting LFA becomes unavailable.

"show isis ti-lfa tunnel" -> The TI-LFA repair tunnels are just internal constructs that are shared by multiple LFIB routes that compute similar repair paths. This command displays TI-LFA repair tunnels with the primary and backup via information.

"show isis segment-routing tunnel" -> command displays all the IS-IS SR tunnels. The field ‘ TI-LFA tunnel index ’ shows the index of the TI-LFA tunnel protecting the SR tunnel. The same TI-LFA tunnel that protects the LFIB route also protects the corresponding IS-IS SR tunnel.

"show tunnel fib" -> displays tunnels programmed in the tunnel FIB also includes the TI-LFA tunnels along with protected IS-IS SR tunnels.

"show mpls lfib route" -> displays the backup information along with the primary vias for all node/adjacency segments that have TI-LFA backup paths computed.

"show ip route" -> When services like LDP pseudowires, BGP LU, L2 EVPN or L3 MPLS VPN use IS-IS SR tunnels as an underlay, they are automatically protected by TI-LFA tunnels that protect the IS-IS SR tunnels. The ‘show ip route’ command displays the hierarchy of the overlay-underlay-TI-LFA tunnels like below.

This is the output of l1r3 in the initial state (no failures):

/////////////////////////////////////////////////////////////////////////
///                               Device: l1r3                         //      /////////////////////////////////////////////////////////////////////////

command = show isis segment-routing prefix-segments


System ID: 0000.0000.0003			Instance: 'CORE'
SR supported Data-plane: MPLS			SR Router ID: 10.0.0.3

Node: 11     Proxy-Node: 0      Prefix: 0       Total Segments: 11

Flag Descriptions: R: Re-advertised, N: Node Segment, P: no-PHP
                   E: Explicit-NULL, V: Value, L: Local
Segment status codes: * - Self originated Prefix, L1 - level 1, L2 - level 2
  Prefix                      SID Type       Flags                   System ID       Level Protection
  ------------------------- ----- ---------- ----------------------- --------------- ----- ----------
  10.0.0.1/32                   1 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0001  L2    node      
  10.0.0.2/32                   2 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0002  L2    node      
* 10.0.0.3/32                   3 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0003  L2    unprotected
  10.0.0.4/32                   4 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0004  L2    node      
  10.0.0.5/32                   5 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0005  L2    node      
  10.0.0.6/32                   6 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0006  L2    node      
  10.0.0.7/32                   7 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0007  L2    node      
  10.0.0.8/32                   8 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0008  L2    node      
  10.0.0.9/32                   9 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0009  L2    node      
  10.0.0.10/32                 10 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0010  L2    node      
  10.0.0.11/32                 11 Node       R:0 N:1 P:0 E:0 V:0 L:0 0000.0000.0011  L2    node      

================================================================================

command = show isis segment-routing adjacency-segments


System ID: l1r3			Instance: CORE
SR supported Data-plane: MPLS			SR Router ID: 10.0.0.3
Adj-SID allocation mode: SR-adjacencies
Adj-SID allocation pool: Base: 100000     Size: 16384
Adjacency Segment Count: 4
Flag Descriptions: F: Ipv6 address family, B: Backup, V: Value
                   L: Local, S: Set

Segment Status codes: L1 - Level-1 adjacency, L2 - Level-2 adjacency, P2P - Point-to-Point adjacency, LAN - Broadcast adjacency

Locally Originated Adjacency Segments
Adj IP Address  Local Intf     SID   SID Source                 Flags     Type  
--------------- ----------- ------- ------------ --------------------- -------- 
      10.0.1.1         Et1  100000      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  
      10.0.2.1         Et2  100001      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  
      10.0.5.2         Et4  100002      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  
      10.0.3.2         Et3  100003      Dynamic   F:0 B:1 V:1 L:1 S:0   P2P L2  

Protection 
---------- 
      node 
      node 
      node 
      node 


================================================================================

command = show isis interface


IS-IS Instance: CORE VRF: default

  Interface Loopback1:
    Index: 12 SNPA: 0:0:0:0:0:0
    MTU: 65532 Type: loopback
    Area Proxy Boundary is Disabled
    Node segment Index IPv4: 3
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 1 (Passive Interface)
      Authentication mode: None
      TI-LFA protection is disabled for IPv4
      TI-LFA protection is disabled for IPv6
  Interface Ethernet1:
    Index: 13 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 0D
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6
  Interface Ethernet2:
    Index: 14 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 0E
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6
  Interface Ethernet3:
    Index: 15 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 0F
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6
  Interface Ethernet4:
    Index: 16 SNPA: P2P
    MTU: 1497 Type: point-to-point
    Area Proxy Boundary is Disabled
    BFD IPv4 is Enabled
    BFD IPv6 is Disabled
    Hello Padding is Enabled
    Level 2:
      Metric: 10, Number of adjacencies: 1
      Link-ID: 10
      Authentication mode: None
      TI-LFA node protection is enabled for the following IPv4 segments: node segments, adjacency segments
      TI-LFA protection is disabled for IPv6

================================================================================

command = show isis ti-lfa path

TI-LFA paths for IPv4 address family
   Topo-id: Level-2
   Destination       Constraint                     Path           
----------------- --------------------------------- -------------- 
   l1r2              exclude node 0000.0000.0002    Path not found 
                     exclude Ethernet2              l1r6           
   l1r8              exclude Ethernet4              l1r4           
                     exclude node 0000.0000.0007    l1r4           
   l1r9              exclude Ethernet4              l1r4           
                     exclude node 0000.0000.0007    l1r4           
   l1r11             exclude Ethernet4              l1r4           
                     exclude node 0000.0000.0007    l1r4           
   l1r10             exclude Ethernet3              l1r7           
                     exclude node 0000.0000.0004    l1r7           
   l1r1              exclude node 0000.0000.0001    Path not found 
                     exclude Ethernet1              Path not found 
   l1r6              exclude Ethernet4              l1r2           
                     exclude node 0000.0000.0007    l1r2           
   l1r7              exclude node 0000.0000.0007    Path not found 
                     exclude Ethernet4              l1r10          
   l1r4              exclude Ethernet3              l1r9           
                     exclude node 0000.0000.0004    Path not found 
   l1r5              exclude Ethernet2              l1r7           
                     exclude node 0000.0000.0002    l1r7           


================================================================================

command = show isis ti-lfa tunnel

Tunnel Index 2
   via 10.0.5.2, 'Ethernet4'
      label stack 3
   backup via 10.0.3.2, 'Ethernet3'
      label stack 3
Tunnel Index 4
   via 10.0.3.2, 'Ethernet3'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 3
Tunnel Index 6
   via 10.0.3.2, 'Ethernet3'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 800009 800004
Tunnel Index 7
   via 10.0.5.2, 'Ethernet4'
      label stack 3
   backup via 10.0.3.2, 'Ethernet3'
      label stack 800010 800007
Tunnel Index 8
   via 10.0.2.1, 'Ethernet2'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 800006 800002
Tunnel Index 9
   via 10.0.5.2, 'Ethernet4'
      label stack 3
   backup via 10.0.2.1, 'Ethernet2'
      label stack 3
Tunnel Index 10
   via 10.0.2.1, 'Ethernet2'
      label stack 3
   backup via 10.0.5.2, 'Ethernet4'
      label stack 3

================================================================================

command = show isis segment-routing tunnel

 Index    Endpoint         Nexthop      Interface     Labels       TI-LFA       
                                                                   tunnel index 
-------- --------------- ------------ ------------- -------------- ------------ 
 1        10.0.0.1/32      10.0.1.1     Ethernet1     [ 3 ]        -            
 2        10.0.0.2/32      10.0.2.1     Ethernet2     [ 3 ]        8            
 3        10.0.0.7/32      10.0.5.2     Ethernet4     [ 3 ]        7            
 4        10.0.0.4/32      10.0.3.2     Ethernet3     [ 3 ]        6            
 5        10.0.0.9/32      10.0.5.2     Ethernet4     [ 800009 ]   2            
 6        10.0.0.10/32     10.0.3.2     Ethernet3     [ 800010 ]   4            
 7        10.0.0.11/32     10.0.5.2     Ethernet4     [ 800011 ]   2            
 8        10.0.0.8/32      10.0.5.2     Ethernet4     [ 800008 ]   2            
 9        10.0.0.6/32      10.0.5.2     Ethernet4     [ 800006 ]   9            
 10       10.0.0.5/32      10.0.2.1     Ethernet2     [ 800005 ]   10           


================================================================================

command = show tunnel fib


Type 'IS-IS SR', index 1, endpoint 10.0.0.1/32, forwarding None
   via 10.0.1.1, 'Ethernet1' label 3

Type 'IS-IS SR', index 2, endpoint 10.0.0.2/32, forwarding None
   via TI-LFA tunnel index 8 label 3
      via 10.0.2.1, 'Ethernet2' label 3
      backup via 10.0.5.2, 'Ethernet4' label 800006 800002

Type 'IS-IS SR', index 3, endpoint 10.0.0.7/32, forwarding None
   via TI-LFA tunnel index 7 label 3
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 800010 800007

Type 'IS-IS SR', index 4, endpoint 10.0.0.4/32, forwarding None
   via TI-LFA tunnel index 6 label 3
      via 10.0.3.2, 'Ethernet3' label 3
      backup via 10.0.5.2, 'Ethernet4' label 800009 800004

Type 'IS-IS SR', index 5, endpoint 10.0.0.9/32, forwarding None
   via TI-LFA tunnel index 2 label 800009
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 3

Type 'IS-IS SR', index 6, endpoint 10.0.0.10/32, forwarding None
   via TI-LFA tunnel index 4 label 800010
      via 10.0.3.2, 'Ethernet3' label 3
      backup via 10.0.5.2, 'Ethernet4' label 3

Type 'IS-IS SR', index 7, endpoint 10.0.0.11/32, forwarding None
   via TI-LFA tunnel index 2 label 800011
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 3

Type 'IS-IS SR', index 8, endpoint 10.0.0.8/32, forwarding None
   via TI-LFA tunnel index 2 label 800008
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.3.2, 'Ethernet3' label 3

Type 'IS-IS SR', index 9, endpoint 10.0.0.6/32, forwarding None
   via TI-LFA tunnel index 9 label 800006
      via 10.0.5.2, 'Ethernet4' label 3
      backup via 10.0.2.1, 'Ethernet2' label 3

Type 'IS-IS SR', index 10, endpoint 10.0.0.5/32, forwarding None
   via TI-LFA tunnel index 10 label 800005
      via 10.0.2.1, 'Ethernet2' label 3
      backup via 10.0.5.2, 'Ethernet4' label 3

Type 'TI-LFA', index 2, forwarding None
   via 10.0.5.2, 'Ethernet4' label 3
   backup via 10.0.3.2, 'Ethernet3' label 3

Type 'TI-LFA', index 4, forwarding None
   via 10.0.3.2, 'Ethernet3' label 3
   backup via 10.0.5.2, 'Ethernet4' label 3

Type 'TI-LFA', index 6, forwarding None
   via 10.0.3.2, 'Ethernet3' label 3
   backup via 10.0.5.2, 'Ethernet4' label 800009 800004

Type 'TI-LFA', index 7, forwarding None
   via 10.0.5.2, 'Ethernet4' label 3
   backup via 10.0.3.2, 'Ethernet3' label 800010 800007

Type 'TI-LFA', index 8, forwarding None
   via 10.0.2.1, 'Ethernet2' label 3
   backup via 10.0.5.2, 'Ethernet4' label 800006 800002

Type 'TI-LFA', index 9, forwarding None
   via 10.0.5.2, 'Ethernet4' label 3
   backup via 10.0.2.1, 'Ethernet2' label 3

Type 'TI-LFA', index 10, forwarding None
   via 10.0.2.1, 'Ethernet2' label 3
   backup via 10.0.5.2, 'Ethernet4' label 3

================================================================================

command = show mpls lfib route

MPLS forwarding table (Label [metric] Vias) - 14 routes 
MPLS next-hop resolution allow default route: False
Via Type Codes:
          M - MPLS via, P - Pseudowire via,
          I - IP lookup via, V - VLAN via,
          VA - EVPN VLAN aware via, ES - EVPN ethernet segment via,
          VF - EVPN VLAN flood via, AF - EVPN VLAN aware flood via,
          NG - Nexthop group via
Source Codes:
          G - gRIBI, S - Static MPLS route,
          B2 - BGP L2 EVPN, B3 - BGP L3 VPN,
          R - RSVP, LP - LDP pseudowire,
          L - LDP, M - MLDP,
          IP - IS-IS SR prefix segment, IA - IS-IS SR adjacency segment,
          IL - IS-IS SR segment to LDP, LI - LDP to IS-IS SR segment,
          BL - BGP LU, ST - SR TE policy,
          DE - Debug LFIB

 IA  100000   [1]
                via M, 10.0.1.1, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                 interface Ethernet1
 IA  100001   [1]
                via TI-LFA tunnel index 8, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.2.1, Ethernet2, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800006 800002
 IA  100002   [1]
                via TI-LFA tunnel index 7, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label 800010 800007
 IA  100003   [1]
                via TI-LFA tunnel index 6, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.3.2, Ethernet3, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800009 800004
 IP  800001   [1], 10.0.0.1/32
                via M, 10.0.1.1, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                 interface Ethernet1
 IP  800002   [1], 10.0.0.2/32
                via TI-LFA tunnel index 8, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.2.1, Ethernet2, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800006 800002
 IP  800004   [1], 10.0.0.4/32
                via TI-LFA tunnel index 6, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.3.2, Ethernet3, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label 800009 800004
 IP  800005   [1], 10.0.0.5/32
                via TI-LFA tunnel index 10, swap 800005 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.2.1, Ethernet2, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label imp-null(3)
 IP  800006   [1], 10.0.0.6/32
                via TI-LFA tunnel index 9, swap 800006 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.2.1, Ethernet2, label imp-null(3)
 IP  800007   [1], 10.0.0.7/32
                via TI-LFA tunnel index 7, pop
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label 800010 800007
 IP  800008   [1], 10.0.0.8/32
                via TI-LFA tunnel index 2, swap 800008 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label imp-null(3)
 IP  800009   [1], 10.0.0.9/32
                via TI-LFA tunnel index 2, swap 800009 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label imp-null(3)
 IP  800010   [1], 10.0.0.10/32
                via TI-LFA tunnel index 4, swap 800010 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.3.2, Ethernet3, label imp-null(3)
                    backup via 10.0.5.2, Ethernet4, label imp-null(3)
 IP  800011   [1], 10.0.0.11/32
                via TI-LFA tunnel index 2, swap 800011 
                 payload autoDecide, ttlMode uniform, apply egress-acl
                    via 10.0.5.2, Ethernet4, label imp-null(3)
                    backup via 10.0.3.2, Ethernet3, label imp-null(3)

================================================================================

command = show ip route


VRF: default
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
       R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
       O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route, V - VXLAN Control Service,
       DH - DHCP client installed default route, M - Martian,
       DP - Dynamic Policy Route, L - VRF Leaked,
       RC - Route Cache Route

Gateway of last resort is not set

 I L2     10.0.0.1/32 [115/11] via 10.0.1.1, Ethernet1
 I L2     10.0.0.2/32 [115/11] via 10.0.2.1, Ethernet2
 C        10.0.0.3/32 is directly connected, Loopback1
 I L2     10.0.0.4/32 [115/11] via 10.0.3.2, Ethernet3
 I L2     10.0.0.5/32 [115/21] via 10.0.2.1, Ethernet2
 I L2     10.0.0.6/32 [115/21] via 10.0.5.2, Ethernet4
 I L2     10.0.0.7/32 [115/11] via 10.0.5.2, Ethernet4
 I L2     10.0.0.8/32 [115/31] via 10.0.5.2, Ethernet4
 I L2     10.0.0.9/32 [115/21] via 10.0.5.2, Ethernet4
 I L2     10.0.0.10/32 [115/21] via 10.0.3.2, Ethernet3
 I L2     10.0.0.11/32 [115/31] via 10.0.5.2, Ethernet4
 C        10.0.1.0/30 is directly connected, Ethernet1
 C        10.0.2.0/30 is directly connected, Ethernet2
 C        10.0.3.0/30 is directly connected, Ethernet3
 I L2     10.0.4.0/30 [115/20] via 10.0.2.1, Ethernet2
 C        10.0.5.0/30 is directly connected, Ethernet4
 I L2     10.0.6.0/30 [115/20] via 10.0.3.2, Ethernet3
 I L2     10.0.7.0/30 [115/30] via 10.0.2.1, Ethernet2
                               via 10.0.5.2, Ethernet4
 I L2     10.0.8.0/30 [115/20] via 10.0.5.2, Ethernet4
 I L2     10.0.9.0/30 [115/30] via 10.0.5.2, Ethernet4
 I L2     10.0.10.0/30 [115/20] via 10.0.5.2, Ethernet4
 I L2     10.0.11.0/30 [115/30] via 10.0.5.2, Ethernet4
 I L2     10.0.12.0/30 [115/30] via 10.0.3.2, Ethernet3
                                via 10.0.5.2, Ethernet4
 I L2     10.0.13.0/30 [115/30] via 10.0.5.2, Ethernet4


================================================================================

In l1r3 we can see:

show isis segment-routing prefix-segments: all prefix segments are under “node” protection (apart from itself – 10.0.0.3/32)
show isis segment-routing adjacency-segments: all adjacent segments are under “node” protection.
show isis interface: All isis enabled interfaces (apart from loopback1) have TI-LFA node protection enabled for ipv4.
show isis ti-lfa path: Here we can see link and node protection to all possible destinations in our ISIS domain (all P routers in our BGP-Free core). When node protection is not possible, link protection is calculated. The exception is l1r1 because it has only one link into the networks, so if that is lost, there is no backup at all.
show isis ti-lfa tunnel: This can be confusing. These are the TI-LFA tunnels, the first two lines refer to the path they are protecting. The last two lines are really the tunnel configuration. Another interesting thing here is the label stack for some backup tunnels (index 6, 7, 8). This a way to avoid a loop. The index is used in the next command.
show isis segment-routing tunnel: Here we see the current SR tunnels and the corresponding backup (index that refers to above command). Label [3] is the implicit null label. Paying attention to the endpoint “10.0.0.2/32” (as per fig2 below). We can see the primary path is via eth2. The backup is via tunnel index 8 (via eth4 – l1r7). If you check the path to “10.0.0.2/32 – 800002” from l1r7 (output after fig2) you can see it is pointing back to l1r3 and we would have a loop! For this reason the backup tunnel index 8 in l1r3 has a label stack to avoid this loop (800006 800002). Once l1r7 received this packet and checks the segment labels, it sends the packet to 800006 via eth2 (l1r6) and then l1r6 uses 8000002 to reach finally l1r2 (via l1r5).

l1r7# show isis segment-routing tunnel
Index Endpoint Nexthop Interface Labels TI-LFA
tunnel index

1 10.0.0.9/32 10.0.10.2 Ethernet3 [ 3 ] 3
2 10.0.0.6/32 10.0.8.1 Ethernet2 [ 3 ] 1
3 10.0.0.3/32 10.0.5.1 Ethernet1 [ 3 ] 2
4 10.0.0.10/32 10.0.10.2 Ethernet3 [ 800010 ] 7
5 10.0.0.11/32 10.0.10.2 Ethernet3 [ 800011 ] 4
6 10.0.0.4/32 10.0.5.1 Ethernet1 [ 800004 ] 11
7 10.0.0.8/32 10.0.8.1 Ethernet2 [ 800008 ] -
- 10.0.10.2 Ethernet3 [ 800008 ] -
8 10.0.0.2/32 10.0.5.1 Ethernet1 [ 800002 ] 9
9 10.0.0.5/32 10.0.8.1 Ethernet2 [ 800005 ] 8
10 10.0.0.1/32 10.0.5.1 Ethernet1 [ 800001 ] 10
l1r7#
l1r7#show mpls lfib route 800006
...
IP 800006 [1], 10.0.0.6/32
via TI-LFA tunnel index 1, pop
payload autoDecide, ttlMode uniform, apply egress-acl
via 10.0.8.1, Ethernet2, label imp-null(3)
backup via 10.0.10.2, Ethernet3, label 800008 800006
l1r7#
l1r7#show mpls lfib route 800002
...
IP 800002 [1], 10.0.0.2/32
via TI-LFA tunnel index 9, swap 800002
payload autoDecide, ttlMode uniform, apply egress-acl
via 10.0.5.1, Ethernet1, label imp-null(3)
backup via 10.0.8.1, Ethernet2, label imp-null(3)

show tunnel fib: you can see all “IS-IS SR” and “TI-LFA” tunnels defined. It is like a merge of “show isis segment-routing tunnel” and “show isis ti-lfa tunnel”.
show mpls lfib route: You can see the programmed labels and TI-LFA. I’ve got confused when I see “imp-null” and the I see some pop/swap for the same entry…
show ip route: nothing really interesting without L3VPNS

Testing

Ok, you need to generate traffic that is labelled to really test TI-LFA and with enough packet rate to see if you are close to the 50ms recovery promissed.

So I have had to make some changes:

create a L3VPN CUST-A (evpn) in l1r3 and l1r9, so they are PEs
l1r1 and l1r11 are CPE in VRF CUST-A

All other devices have no changes

We need to test with and without TI-LFA enabled. The test I have do is to ping from l1r1 to l1r11 and dropping the link l1r3-l1r7, while l1r3 has enabled/disabled TI-LFA.

Routing changes with TI-LFA enabled


BEFORE DROPPING LINK
======

l1r3#show ip route vrf CUST-A

 B I      10.0.13.0/30 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                  via TI-LFA tunnel index 4, label 800009
                                     via 10.0.5.2, Ethernet4, label imp-null(3)
                                     backup via 10.0.3.2, Ethernet3, label imp-null(3)
 C        192.168.0.3/32 is directly connected, Loopback2
 B I      192.168.0.9/32 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                    via TI-LFA tunnel index 4, label 800009
                                       via 10.0.5.2, Ethernet4, label imp-null(3)
                                       backup via 10.0.3.2, Ethernet3, label imp-null(3)

AFTER DROPPING LINK
======

l1r3#show ip route vrf CUST-A

 B I      10.0.13.0/30 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                  via TI-LFA tunnel index 11, label 800009
                                     via 10.0.3.2, Ethernet3, label imp-null(3)
                                     backup via 10.0.2.1, Ethernet2, label 800005
 C        192.168.0.3/32 is directly connected, Loopback2
 B I      192.168.0.9/32 [200/0] via 10.0.0.9/32, IS-IS SR tunnel index 5, label 116384
                                    via TI-LFA tunnel index 11, label 800009
                                       via 10.0.3.2, Ethernet3, label imp-null(3)

Ping results

TI-LFA enabled in L1R3  TEST1
=========================

bash-4.2# ping -f 10.0.13.2
PING 10.0.13.2 (10.0.13.2) 56(84) bytes of data.
..................^C                                                                                                      
--- 10.0.13.2 ping statistics ---
1351 packets transmitted, 1333 received, 1% packet loss, time 21035ms
rtt min/avg/max/mdev = 21.081/348.764/1722.587/487.280 ms, pipe 109, ipg/ewma 15.582/67.643 ms
bash-4.2# 


NO TI-LFA enabled in L1R3  TEST1
=========================

bash-4.2# ping -f 10.0.13.2
PING 10.0.13.2 (10.0.13.2) 56(84) bytes of data.
.............................................E...................................................................................^C            
--- 10.0.13.2 ping statistics ---
2274 packets transmitted, 2172 received, +1 errors, 4% packet loss, time 36147ms
rtt min/avg/max/mdev = 20.965/88.300/542.279/86.227 ms, pipe 34, ipg/ewma 15.903/73.403 ms
bash-4.2#

Summary Testing

With TI-LFA enabled in l1r3, we have lost 18 packets (around 280ms)

Without TI-LFA in l1r3, we have lost 102 packets (around 1621ms =~ 1.6s)

Keeping in mind this lab is based in VMs (veos) running in another VM (eve-ng) is not bad result.

It seems far from the 50ms, but still shows the improvement of enabling TI-LFA

Optical in Networking: 101

This is a very good presentation about optical stuff from NANOG 70 (2017). And I noticed there is an updated version from NANOG 77 (2019). I watched the 2017 (2h) and there is something always bites me: db vs dbm

A bit more info about dB vs dBm: here and here

Close to the end, there are some common questions about optical that he provides answers. I liked the ones about “looking at the lasers can make you blind” and the point that is worth cleaning your fibers. A bit about cleaning here.