Again, I am following the author post but adapting it to my environment using libvirt instead of VirtualBox and Debian10 as VM. All my data is here.
This is the diagram adapted to my lab:
After updating Vagrantfile and provisioning script, I “vagrant up”. The 6 VMs dont take long to boot up so it is a good thing.
The provisioning script is mainly for configuration of PE1 and PE2 . This is a bit more detail:
# enabling ipv4 forwarding (routing)
sudo sysctl net.ipv4.ip_forward=1
# add loopback (not used in lab3)
sudo ip addr add 172.20.5.$self/32 dev lo
# removing ip in link between pe1-pe2 as we will setup a trunk with two vlans.
sudo ip addr del 192.168.66.10$self/24 dev ens8
# creating two vlans 10 (ce1,ce3) and 20 (ce2, ce4)
sudo ip link add link ens8 name vlan10 type vlan id 10
sudo ip link add link ens8 name vlan20 type vlan id 20
# assign IP to each vlan
sudo ip addr add 172.30.10.10$self/24 dev vlan10
sudo ip addr add 172.30.20.10$self/24 dev vlan20
# turn up each vlan as by default are down
sudo ip link set vlan10 up
sudo ip link set vlan20 up
# create two routing tables with a null route
sudo ip route add blackhole 0.0.0.0/0 table 10
sudo ip route add blackhole 0.0.0.0/0 table 20
# create two VRFs and assign one table (created above) to each one
sudo ip link add name vrf_cust1 type vrf table 10
sudo ip link add name vrf_cust2 type vrf table 20
# assign interfaces to the VRFs // ie. PE1:
sudo ip link set ens6 master vrf_cust1 // interface to CE1
sudo ip link set vlan10 master vrf_cust1 // interface to PE2-vlan10
sudo ip link set ens7 master vrf_cust2 // interface to CE2
sudo ip link set vlan20 master vrf_cust2 // interface to PE2-vlan20
# turn up VRFs
sudo ip link set vrf_cust1 up
sudo ip link set vrf_cust2 up
# add static route in each VRF routing table to reach the opposite CE
sudo ip route add 192.168.$route1.0/24 via 172.30.10.10$neighbor table 10
sudo ip route add 192.168.$route2.0/24 via 172.30.20.10$neighbor table 20
Check the status of the VRFs in PE1:
vagrant@PE1:/vagrant$ ip link show type vrf
8: vrf_cust1: mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether c6:b8:f2:3b:53:ed brd ff:ff:ff:ff:ff:ff
9: vrf_cust2: mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 62:1c:1d:0a:68:3d brd ff:ff:ff:ff:ff:ff
vagrant@PE1:/vagrant$
vagrant@PE1:/vagrant$ ip link show vrf vrf_cust1
3: ens6: mtu 1500 qdisc pfifo_fast master vrf_cust1 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:6f:16:1e brd ff:ff:ff:ff:ff:ff
6: vlan10@ens8: mtu 1500 qdisc noqueue master vrf_cust1 state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:33:ab:0b brd ff:ff:ff:ff:ff:ff
vagrant@PE1:/vagrant$
So let’s test if we can ping from CE1 to CE3:
Ok, if fails. I noticed that PE1 sees the packet from CE1… but the source IP is not the expected one (11.1 is the host/my laptop). And the packet reaches to PE2 with the same wrong source IP and then to CE3. In CE3 the ICMP reply is sent to 11.1, to it never reaches CE1.
The positive thing is that VRF lite seems to work.
I double checked all IPs, routing, etc. duplicated MAC in CE1 and my laptop maybe??? I installed “net-tools” to get “arp” command and check the arp table contents in CE1. Checking the ARP request in wireshark, all was good.
Somehow, the host was getting involved…. Keeping in mind that this is a simulated network, the host has access to all “links” in the lab. Libvirt creates a bridge (switch) for each link and it adds a vnet (port) for each VM that uses it:
“.1” is always the host but It was clear my routing was correct in all devices. I remembered that I had some issues during the summer when I was playing with containers/docker and doing some routing…. so I checked iptables….
I didnt have iptables in the VMs… but as stated earlier, the host is connected to all “links” used between the VMs. There is no real point-to-point link.
# iptables -t nat -vnL --line-numbers
...
Chain LIBVIRT_PRT (1 references)
num pkts bytes target prot opt in out source destination
1 11 580 RETURN all -- * * 192.168.11.0/24 224.0.0.0/24
2 0 0 RETURN all -- * * 192.168.11.0/24 255.255.255.255
3 0 0 MASQUERADE tcp -- * * 192.168.11.0/24 !192.168.11.0/24 masq ports: 1024-65535
4 40 7876 MASQUERADE udp -- * * 192.168.11.0/24 !192.168.11.0/24 masq ports: 1024-65535
5 16 1344 MASQUERADE all -- * * 192.168.11.0/24 !192.168.11.0/24
6 15 796 RETURN all -- * * 192.168.24.0/24 224.0.0.0/24
7 0 0 RETURN all -- * * 192.168.24.0/24 255.255.255.255
8 0 0 MASQUERADE tcp -- * * 192.168.24.0/24 !192.168.24.0/24 masq ports: 1024-65535
9 49 9552 MASQUERADE udp -- * * 192.168.24.0/24 !192.168.24.0/24 masq ports: 1024-65535
10 0 0 MASQUERADE all -- * * 192.168.24.0/24 !192.168.24.0/24
# iptables-save -t nat
# Generated by iptables-save v1.8.7 on Sun Feb 7 12:06:09 2021
*nat
:PREROUTING ACCEPT [365:28580]
:INPUT ACCEPT [143:14556]
:OUTPUT ACCEPT [1617:160046]
:POSTROUTING ACCEPT [1390:101803]
:DOCKER - [0:0]
:LIBVIRT_PRT - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o br-4bd17cfa19a8 -j MASQUERADE
-A POSTROUTING -s 172.19.0.0/16 ! -o br-43481af25965 -j MASQUERADE
-A POSTROUTING -j LIBVIRT_PRT
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i br-4bd17cfa19a8 -j RETURN
-A DOCKER -i br-43481af25965 -j RETURN
-A LIBVIRT_PRT -s 192.168.11.0/24 -d 224.0.0.0/24 -j RETURN
-A LIBVIRT_PRT -s 192.168.11.0/24 -d 255.255.255.255/32 -j RETURN
-A LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -j MASQUERADE
-A LIBVIRT_PRT -s 192.168.24.0/24 -d 224.0.0.0/24 -j RETURN
-A LIBVIRT_PRT -s 192.168.24.0/24 -d 255.255.255.255/32 -j RETURN
-A LIBVIRT_PRT -s 192.168.24.0/24 ! -d 192.168.24.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.24.0/24 ! -d 192.168.24.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.24.0/24 ! -d 192.168.24.0/24 -j MASQUERADE
Ok, it seems the traffic form 192.168.11.0 to 192.168.23.0 is NAT-ed (masquerade in iptables). So makes sense that I see the traffic as 11.1 in PE1. Let’s remove that:
Continuation of the first part, this time we want to establish dynamic LSP, so we will use LDP for label exchange and ISIS as IGP.
Again, I am following the author post but adapting it to my environment. The latest stable FRR is 7.5. All my data is here.
So once the routers R1, R2 and R3 are configured and FRR is reload (very important, restart doesnt do the trick). ISIS and LDP will come up, you need just need to be a bit patience.
Checking on R2, we can see ISIS and LDP established to R1 and R3 respectively. So this is a very good sign.
R2# show isis neighbor
Area ISIS:
System Id Interface L State Holdtime SNPA
R1 ens6 2 Up 30 2020.2020.2020
R3 ens7 2 Up 28 2020.2020.2020
R2#
R2# show mpls ldp neighbor
AF ID State Remote Address Uptime
ipv4 172.20.15.1 OPERATIONAL 172.20.15.1 00:27:44
ipv4 172.20.15.3 OPERATIONAL 172.20.15.3 00:27:47
R2#
Let’s check the routing table is programmed as expected. R2 is learning R1 and R3 loopbacks via ISIS and it reachable via MPLS (using implicit-null because R2 is doing Penultimate Hop Popping – PHP) based on the LDP bindings.
R2# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
K>* 0.0.0.0/0 [0/1024] via 192.168.121.1, ens5, src 192.168.121.90, 00:12:42
I>* 172.20.15.1/32 [115/20] via 192.168.12.101, ens6, label implicit-null, weight 1, 00:01:26
C>* 172.20.15.2/32 is directly connected, lo, 00:12:42
I>* 172.20.15.3/32 [115/20] via 192.168.23.101, ens7, label implicit-null, weight 1, 00:01:26
I 192.168.12.0/24 [115/20] via 192.168.12.101, ens6 inactive, weight 1, 00:01:26
C>* 192.168.12.0/24 is directly connected, ens6, 00:12:42
I 192.168.23.0/24 [115/20] via 192.168.23.101, ens7 inactive, weight 1, 00:01:26
C>* 192.168.23.0/24 is directly connected, ens7, 00:12:42
C>* 192.168.121.0/24 is directly connected, ens5, 00:12:42
K>* 192.168.121.1/32 [0/1024] is directly connected, ens5, 00:12:42
R2#
R2# show mpls ldp binding
AF Destination Nexthop Local Label Remote Label In Use
ipv4 172.20.15.1/32 172.20.15.1 16 imp-null yes
ipv4 172.20.15.1/32 172.20.15.3 16 18 no
ipv4 172.20.15.2/32 172.20.15.1 imp-null 16 no
ipv4 172.20.15.2/32 172.20.15.3 imp-null 16 no
ipv4 172.20.15.3/32 172.20.15.1 17 18 no
ipv4 172.20.15.3/32 172.20.15.3 17 imp-null yes
ipv4 192.168.12.0/24 172.20.15.1 imp-null imp-null no
ipv4 192.168.12.0/24 172.20.15.3 imp-null 17 no
ipv4 192.168.23.0/24 172.20.15.1 imp-null 17 no
ipv4 192.168.23.0/24 172.20.15.3 imp-null imp-null no
ipv4 192.168.121.0/24 172.20.15.1 imp-null imp-null no
ipv4 192.168.121.0/24 172.20.15.3 imp-null imp-null no
R2#
Now, let’s do the ping test and see if MPLS is actually used.
I can see clearly on the left hand side, that R2-ens6 (link to R1) is receiving the ICMP request as MPLS packet (label 17) and the ICMP reply is sent back to R1 without label (as expected by PHP). In R2-ens7 (link to R3) we see R2 sending the ICMP request without label (again expected due to PHP) and the ICMP reply from R3 is arriving with label 16 to R2.
I have to say that I had to try twice until things got working as expected. In my first attempt, somehow, R1 was not sending ICMP request to R2 encapsulated as MPLS packet, somehow the routing table was still programmed for only ISIS. Although ISIS, LDP and LDP bindings were correc.t
NOTES:
1- vagrant-nfs: I was thinking how to connect the VMs with my laptop for sharing files easily. It seems that by default the folder which is holding your Vagrant file is automatically exported in NFS in /vagrant in the VMs. Super handy. Just in case, a bit of documentation. My vagrant version is 2.2.14.
2- For loading the FRR config, I had to “lowercase” the VM hostname to match the FRR config file. Based on this link, it is quite easy. “${X,,}”
In November 2020, I got an email from the FRR email list about using MPLS with FRR. And the answer that you could do already natively (and easily) MPLS in Linux dumbfound me. So I add in my to-do list, try MPLS in Linux as per the blog. So all credits to the author, that’s a great job.
So reading the blog, I learned that the kernel supported MPLS since 4.3 (I am using 5.10) and creating VRF support was challenging until Cumulus did it. Thanks! So since April 2017 there is full support for L3VPNs in Linux… I’m getting a bit late in the wagon.
Anyway, I want to test myself and see if I can make it work. I downloaded the repo from the author to start working on it.
So I am following the same steps as him and will start with a lab consisting of static LSP. This is the diagram:
Main differences in my lab are:
1- I use libvirt instead of VirtualBox
2- I am using debian10 buster64 as VM
This affect the Vagrant file and the script to configure the static LSP. The libvirt_ commands I am using in Vagrantfile are ignored as I am not able to name the interfaces as I want. As well, I had to change the IP addressing as I had collisions with .1. And debian/buster64 has specific interfaces names that I have to use.
So, now we can turn up the lab.
/mpls-linux/lab1-static-lsps$ vagrant up
Bringing machine 'r1' up with 'libvirt' provider…
Bringing machine 'r2' up with 'libvirt' provider…
Bringing machine 'r3' up with 'libvirt' provider…
==> r2: Checking if box 'debian/buster64' version '10.4.0' is up to date…
==> r3: Checking if box 'debian/buster64' version '10.4.0' is up to date…
==> r1: Checking if box 'debian/buster64' version '10.4.0' is up to date…
==> r1: Creating image (snapshot of base box volume).
==> r2: Creating image (snapshot of base box volume).
==> r3: Creating image (snapshot of base box volume).
==> r2: Creating domain with the following settings…
==> r1: Creating domain with the following settings…
...
/mpls-linux/lab1-static-lsps master$ vagrant status
Current machine states:
r1 running (libvirt)
r2 running (libvirt)
r3 running (libvirt)
So we can check R1. One important detail here, is how we can defined a static route to reach R3 loopback and it is encapsulated in MPLS with label 100.
/mpls-linux/lab1-static-lsps$ vagrant ssh r1
...
vagrant@R1:~$ lsmod | grep mpls
mpls_iptunnel 16384 1
mpls_router 36864 1 mpls_iptunnel
ip_tunnel 24576 1 mpls_router
vagrant@R1:~$
vagrant@R1:~$ ip route
default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.124 metric 1024
172.20.15.3 encap mpls 100 via 192.168.12.102 dev ens6
192.168.12.0/24 dev ens6 proto kernel scope link src 192.168.12.101
192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.124
192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.124 metric 1024
vagrant@R1:~$
vagrant@R1:~$ ip -4 a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 172.20.15.1/32 scope global lo
valid_lft forever preferred_lft forever
2: ens5: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.121.124/24 brd 192.168.121.255 scope global dynamic ens5
valid_lft 3204sec preferred_lft 3204sec
3: ens6: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.12.101/24 brd 192.168.12.255 scope global ens6
valid_lft forever preferred_lft forever
vagrant@R1:~$
Now check R2 as it is our P router between R1 and R3 as per diagram. Important bit here is “ip -M route show”. This shows the MPLS routing label that is based in labels. In the standard “ip route” you dont seen any reference to MPLS.
vagrant@R2:~$ ip -4 a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 172.20.15.2/32 scope global lo
valid_lft forever preferred_lft forever
2: ens5: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.121.103/24 brd 192.168.121.255 scope global dynamic ens5
valid_lft 2413sec preferred_lft 2413sec
3: ens6: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.12.102/24 brd 192.168.12.255 scope global ens6
valid_lft forever preferred_lft forever
4: ens7: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 192.168.23.102/24 brd 192.168.23.255 scope global ens7
valid_lft forever preferred_lft forever
vagrant@R2:~$ ip route
default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.103 metric 1024
192.168.12.0/24 dev ens6 proto kernel scope link src 192.168.12.102
192.168.23.0/24 dev ens7 proto kernel scope link src 192.168.23.102
192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.103
192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.103 metric 1024
vagrant@R2:~$
vagrant@R2:~$ lsmod | grep mpls
mpls_router 36864 0
ip_tunnel 24576 1 mpls_router
vagrant@R2:~$
vagrant@R2:~$ ip -M route show
100 via inet 192.168.23.101 dev ens7
200 via inet 192.168.12.101 dev ens6
vagrant@R2:~$
So let’s see if pinging the loopback in R1 and R3 gets labelled traffic:
I can see the labelled packet from R1 to R2 with label 100 as expected, but I dont see any “echo reply”…..
But ping is successful based on R1:
vagrant@R1:~$ ping 172.20.15.3
PING 172.20.15.3 (172.20.15.3) 56(84) bytes of data.
64 bytes from 172.20.15.3: icmp_seq=1 ttl=63 time=0.746 ms
64 bytes from 172.20.15.3: icmp_seq=2 ttl=63 time=1.18 ms
64 bytes from 172.20.15.3: icmp_seq=3 ttl=63 time=1.11 ms
64 bytes from 172.20.15.3: icmp_seq=4 ttl=63 time=0.728 ms
Something is wrong. As per pic below, with tcpdump in all interfaces, R3 is seeing the echo request from a different source (not R1).
And if I ping using R1 loopback, I can’t see anything leaving R1 ens6 interface.
vagrant@R1:~$ ping 172.20.15.3 -I lo
PING 172.20.15.3 (172.20.15.3) from 172.20.15.1 lo: 56(84) bytes of data.
^C
--- 172.20.15.3 ping statistics ---
25 packets transmitted, 0 received, 100% packet loss, time 576ms
Based on the original blog post, this should work. The main difference here is I am using libvirt. Need to carry on investigating
This is my IP config, 23.1 is my laptop:
9: virbr3: mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr3
valid_lft forever preferred_lft forever
10: virbr8: mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.12.1/24 brd 192.168.12.255 scope global virbr8
valid_lft forever preferred_lft forever
11: virbr9: mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.23.1/24 brd 192.168.23.255 scope global virbr9
valid_lft forever preferred_lft forever
How to ssh to a vagrant box without using “vagran ssh”: link
# save the config to a file
vagrant ssh-config > vagrant-ssh
# run ssh with the file
ssh -F vagrant-ssh default
# update your .gitignore for not tracking this file!!!!
Ok, I have tried again. I rebooted my laptop, rebuilt the VMs, etc. And now it works
9: virbr3: mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr3
valid_lft forever preferred_lft forever
10: virbr8: mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.12.1/24 brd 192.168.12.255 scope global virbr8
valid_lft forever preferred_lft forever
11: virbr9: mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.23.1/24 brd 192.168.23.255 scope global virbr9
valid_lft forever preferred_lft forever
root@athens:/boot# uname -a
Linux athens 5.9.0-5-amd64 #1 SMP Debian 5.9.15-1 (2020-12-17) x86_64 GNU/Linux
root@athens:/boot#
I can see now clearly, how the ICMP request packet is encapsulated with MPLS tag 100 from R1 to R2 (ens6 interface), then the label is popped in R2, and you can see the same ICMP request leaving R2 via ens7 to R3.
Then the ICMP reply is encapsulated with MPLS tag 200 in R3 to R2 (ens7) and again, the labels is popped in R2, and you see the packet again from R2 (ens6) to R1.
So this test is successful at the end although not sure what I have been doing wrong before.
This is a very interesting blog entry from Cloudflare about PDU deployments in DCs and the theory behind. Nowadays nearly everything is cloud oriented. But we forget they still need to be deployed physically. In a old job, going dense in physical setups, requires more power and 3-phase PDUs where a must. The blog is quite good explaining the reasons to move to 3-phase and the challenges to balance the power. When you buy “colo”, you buy space and power, and that is not cheap.
This week I read that kubernetes is going to stop support for Docker soon. I was quite surprised. I am not an expert so it seems they have legit reasons. But I haven’t read anything from the other side. I think it is going to be painful so I need to try that in my lab and see how to do that migration. It has to be nice to learn that.
In the other end, I read a blog entry about ASICs from Cloudflare. I think without getting too technical it is a good one. And I learn about the different type of ASICs from Juniper. In the last years, I have only used devices powered by Broadcom ASICs. One day, I would like to try that P4/Barefoot Tofino devices. And related to this, I remember this NANOG presentation about ASICs that is really good (and fun!).
I was already playing with gNMI and protobuf a couple of months ago. But this week I received a summary from the last NANOG80 meeting and there was a presentation about it. Great job from Colin!
So I decided to give it a go as the demo was based on docker and I have already my Arista lab in cEOS and vEOS as targets.
Ok, the container is created and seems running but the gnmi-gateway can’t connect to my cEOS r01….
First thing, I had to check iptables. It is not the first time that when playing with docker and building different environments (vEOS vs gnmi-gateway) with different docker commands, iptables may be not configured properly.
And it was the case again:
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
So I moved the new docker bridge network for gnmi-gateway after “ACCEPT” and solved.
# iptables -t filter -D DOCKER-ISOLATION-STAGE-1 -j ACCEPT
# iptables -t filter -I DOCKER-ISOLATION-STAGE-1 -j ACCEPT
#
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#
So, restarted gnmi-gateway, still same issue. Ok, I decided to check if the packets were actually hitting r01.
So at first sight, the tcp handshake is established but then there is TCP RST….
So I double checked that gnmi was runnig in my side:
r1#show management api gnmi
Enabled: Yes
Server: running on port 3333, in MGMT VRF
SSL Profile: none
QoS DSCP: none
r1#
At that moment, I thought that was an issue in cEOS… checking logs I couldnt see any confirmation but I decided to give it a go with vEOS that is more feature rich. So I turned up my GCP lab and followed the same steps with gnmi-gateway. I updated the targets.json with the details of one of my vEOS devices. And run again:
~/gnmi/gnmi-gateway release$ sudo docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Initializing target gcp-r1 ([192.168.249.4:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Connecting"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Subscribing"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T19:22:30Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:30.048410 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T19:22:40Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:40.603141 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s
Again, same issue. Let’s see from vEOS perspective.
So again in GCP, tcp is established but then TCP RST. As vEOS is my last resort, I tried to dig into that TCP connection. I downloaded a pcap to analyze with wireshark so get a better visual clue…
So, somehow, gnmi-gateway is trying to negotiate TLS!!! As per my understanding, my targets.json was configured with “NoTLS”: “yes” so that should be avoid, shouldn’t be?
At that moment, I wanted to know how to identfiy TLS/SSL packets using tcpdump as it is not always that easy to get quickly a pcap in wireshark. So I found the answer here:
bash-4.2# tcpdump -i any "tcp port 3333 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:47:01.367197 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50486 > 192.168.249.4.dec-notes: Flags [P.], seq 2715923852:2715924104, ack 2576249027, win 511, options [nop,nop,TS val 1194424180 ecr 1250876], length 252
19:47:02.405870 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50488 > 192.168.249.4.dec-notes: Flags [P.], seq 680803294:680803546, ack 3839769659, win 511, options [nop,nop,TS val 1194425218 ecr 1251136], length 252
19:47:04.139458 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50490 > 192.168.249.4.dec-notes: Flags [P.], seq 3963338234:3963338486, ack 1760248652, win 511, options [nop,nop,TS val 1194426952 ecr 1251569], length 252
Not something easy to remember 🙁
Ok, I wanted to be sure that gnmi was functional in vEOS and by a quick internet look up, I found this project gnmic! Great job by the author!
So I configured the tool and tested with my vEOS. And worked (without needing TLS)
So, I kind of I was sure that my issue was configuring gnmi-gateway. I tried to troubleshoot it: removed the NoTLS, using the debugging mode, build the code, read the Go code for Target (too complex for my Goland knowledge 🙁 )
So at the end, I gave up and opened an issue with gnmi-gateway author. And he answered super quick with the solution!!! I misunderstood the meaning of “NoTLS” 🙁
So I followed his instructions to configure TLS in my gnmi cEOS config
Now we can open prometheus UI and verify if we are consuming data from cEOS r01.
Yeah! it is there.
So all working at then. It has a nice experience. At the end of the day, I want to know more about gNMI/protobuffer, etc. The cold thing here is you can get telemetry and configuration management of your devices. So using gnmi-gateway (that is more for a high availability env like Netflix) and gnmic are great tools to get your head around.
Today finally I have managed to get a very basic cumulus setup. It is annoying because I tried several months ago and found some issues with libvirt (and I opened a ticket but didnt follow up) and gave up.
Now it works. I just want to use KVM-QEMU and Vagrant, that I have already installed in my system. So based on the link, I just created a folder and copied the vagrant file. Then “vagrant up” and wait.
/cumulus/1s2l$ vagrant up
Bringing machine 'spine01' up with 'libvirt' provider…
Bringing machine 'leaf01' up with 'libvirt' provider…
Bringing machine 'leaf02' up with 'libvirt' provider…
==> leaf01: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and install…
leaf01: Box Provider: libvirt
leaf01: Box Version: 4.2.0
==> leaf01: Loading metadata for box 'CumulusCommunity/cumulus-vx'
leaf01: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
==> leaf01: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
leaf01: Downloading: https://vagrantcloud.com/CumulusCommunity/boxes/cumulus-vx/versions/4.2.0/providers/libvirt.box
Download redirected to host: d2cd9e7ca6hntp.cloudfront.net
==> leaf01: Successfully added box 'CumulusCommunity/cumulus-vx' (v4.2.0) for 'libvirt'!
==> spine01: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and install…
spine01: Box Provider: libvirt
spine01: Box Version: 4.2.0
==> leaf01: Uploading base box image as volume into Libvirt storage…
==> spine01: Loading metadata for box 'CumulusCommunity/cumulus-vx'
spine01: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
Progress: 0%==> spine01: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
Progress: 0%==> leaf02: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and install…
leaf02: Box Provider: libvirt
leaf02: Box Version: 4.2.0
Progress: 1%==> leaf02: Loading metadata for box 'CumulusCommunity/cumulus-vx'
leaf02: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
==> leaf02: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
==> leaf01: Creating image (snapshot of base box volume).
==> spine01: Creating image (snapshot of base box volume).
==> leaf02: Creating image (snapshot of base box volume).
==> leaf01: Creating domain with the following settings…
==> leaf01: -- Name: 1s2l_leaf01
==> leaf02: Creating domain with the following settings…
==> spine01: Creating domain with the following settings…
==> leaf02: -- Name: 1s2l_leaf02
==> spine01: -- Name: 1s2l_spine01
==> leaf01: -- Domain type: kvm
==> leaf02: -- Domain type: kvm
==> spine01: -- Domain type: kvm
==> leaf01: -- Cpus: 1
==> leaf02: -- Cpus: 1
==> spine01: -- Cpus: 1
==> leaf01: -- Feature: acpi
==> leaf02: -- Feature: acpi
==> spine01: -- Feature: acpi
==> leaf01: -- Feature: apic
==> leaf01: -- Feature: pae
==> leaf01: -- Memory: 768M
==> leaf02: -- Feature: apic
==> spine01: -- Feature: apic
==> spine01: -- Feature: pae
....
....
You can check the VMs are up:
/cumulus/1s2l$ vagrant status
Current machine states:
spine01 running (libvirt)
leaf01 running (libvirt)
leaf02 running (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.
/cumulus/1s2l$
And we can login and create some network interfaces as per documentation:
/cumulus/1s2l$ vagrant ssh leaf01
Linux leaf01 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4u5 (2020-07-10) x86_64
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
vagrant@leaf01:mgmt:~$ net add interface swp1,swp2,swp3
vagrant@leaf01:mgmt:~$ net commit
--- /etc/network/interfaces 2020-07-15 01:15:58.000000000 +0000
+++ /run/nclu/ifupdown2/interfaces.tmp 2020-10-31 14:12:30.826000000 +0000
@@ -5,15 +5,24 @@
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
vrf mgmt
+auto swp1
+iface swp1
+
+auto swp2
+iface swp2
+
+auto swp3
+iface swp3
+
auto mgmt
iface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
net add/del commands since the last "net commit"
User Timestamp Command
------- -------------------------- --------------------------------
vagrant 2020-10-31 14:12:27.070219 net add interface swp1,swp2,swp3
vagrant@leaf01:mgmt:~$
And after configuring the interfaces in the three VMs, we have LLDP working:
/cumulus/1s2l$ vagrant ssh leaf01
Linux leaf01 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4u5 (2020-07-10) x86_64
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
Last login: Sat Oct 31 14:12:04 2020 from 10.255.1.1
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$ net show lldp
LocalPort Speed Mode RemoteHost RemotePort
--------- ----- ------- ---------- ----------
swp1 1G Default spine01 swp1
swp2 1G Default leaf02 swp2
swp3 1G Default leaf02 swp3
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$ net show system
Hostname……… leaf01
Build………… Cumulus Linux 4.2.0
Uptime……….. 0:06:09.180000
Model………… Cumulus VX
Memory……….. 669MB
Vendor Name…… Cumulus Networks
Part Number…… 4.2.0
Base MAC Address. 52:54:00:17:87:07
Serial Number…. 52:54:00:17:87:07
Product Name….. VX
vagrant@leaf01:mgmt:~$ exit
So I am happy because now I have something to play with and try to build an MPLS lab with cumulus. At some point I would like to try some quaga/frr lab.
I am pretty sure that in the past, I didnt have to type my password every single time I run a vagrant command….
Ok, we can shutdown the VMs and start the work for the next time:
/cumulus/1s2l$ vagrant halt spine01 leaf01 leaf02
==> leaf02: Halting domain…
==> leaf01: Halting domain…
==> spine01: Halting domain…
/cumulus/1s2l$
/cumulus/1s2l$
/cumulus/1s2l$ vagrant status
Current machine states:
spine01 shutoff (libvirt)
leaf01 shutoff (libvirt)
leaf02 shutoff (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.
/cumulus/1s2l$
I was reading through my backlog and noticed too close by incidents. A BGP hijack on 30th September from Telstra and Tokyo Stock Exchange outage on 2nd Oct. At the end of the day, small mistakes/errors (on purpose or not) can cause massive impact (depending on your point of view). For BGP, RPKI is the security framework to make sure the advertised routes belong to the real owners. Yeah, quick summary. But at the end of the day, not all Internet providers are using RPKI, and even if you use it, you can make mistakes. This is better than nothing. For the exchanges, thinking that a piece of hardware can cause a stop to a 6 trillion $ market is crazy. And it seems is just a 350 servers system. That tells me that you dont need the biggest system to hold the biggest value and you will always hit a problem no matter how safe/resilience is your design/implementation/etc. Likely I am making this up and I need to review the book, but one of the conclusions I took from it, via Godel, it doesn’t matter how many statements you use to declare your (software) system, you can always find a weakness (false statement).
This week I realised that Juniper JunOS was moving to Linux…. called Evolved. I guess they will still be supporting FreeBSD version but long term will be Linux. I am quite surprised as this was really announced early 2020, always late joining the party. So all big boys are running linux at some level: Cisco has done it sometime ago with nx-os, Brocade/Extrene did it too with SLX (based on Ubuntu) and obviously Arista with EOS (based on Fedora). So the trend of more “open” network OS will be on the raise.
And as well, I finished “Indiana Jones and the Temple of Doom” book. Indiana Jones films are among my favourites… although this was was always considered the “worse” (I erased from my mind the “fourth”) I have really enjoyed the book. It was like watching the movie at slow pace and didnt care that I knew the plot. I will get the other books likely.
From a new Cloudflare post, I learned that NTS is a standard. To be honest, I can’t remember there was work for making NTP secure. In the last years I have seen development in PTP for time sync in financial systems but nothing else. So it is nice to see this happening. We only need to encrypt BGP and we are done in the internet.. oh wait. Dreaming is free.
So I am trying to install and configure NTS in my system following these links: link1link2
I have just installed ntpsec via debian packages system and that’s it, ntpsec is running…
# apt install ntpsec
...
# service ntpsec status
● ntpsec.service - Network Time Service
Loaded: loaded (/lib/systemd/system/ntpsec.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-10-04 20:35:58 BST; 6min ago
Docs: man:ntpd(8)
Main PID: 292116 (ntpd)
Tasks: 1 (limit: 9354)
Memory: 10.2M
CGroup: /system.slice/ntpsec.service
└─292116 /usr/sbin/ntpd -p /run/ntpd.pid -c /etc/ntpsec/ntp.conf -g -N -u ntpsec:ntpsec
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_check: processing 3.debian.pool.ntp.org, 8, 101
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_take_status: 3.debian.pool.ntp.org=>good, 8
#
Checking the default config, there is nothing configured to use NTS so I made some changes based on the links above:
# vim /etc/ntpsec/ntp.conf
...
# Public NTP servers supporting Network Time Security:
server time.cloudflare.com:1234 nts
# Example 2: NTS-secured NTP (default NTS-KE port (123); using certificate pool of the operating system)
server ntp1.glypnod.com iburst minpoll 3 maxpoll 6 nts
#Via https://www.netnod.se/time-and-frequency/how-to-use-nts
server nts.ntp.se:3443 nts iburst
server nts.sth1.ntp.se:3443 nts iburst
server nts.sth2.ntp.se:3443 nts iburst
After restart, still not seeing NTS in sync 🙁
# service ntpsec restart
...
# ntpq -puw
remote refid st t when poll reach delay offset jitter
time.cloudflare.com .NTS. 16 0 - 64 0 0ns 0ns 119ns
ntp1.glypnod.com .NTS. 16 5 - 32 0 0ns 0ns 119ns
2a01:3f7:2:202::202 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:52::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:62::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
0.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
1.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
2.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
3.debian.pool.ntp.org .POOL. 16 p - 64 0 0ns 0ns 119ns
-229.191.57.185.no-ptr.as201971.net .GPS. 1 u 25 64 177 65.754ms 26.539ms 7.7279ms
+ns3.turbodns.co.uk 85.199.214.99 2 u 23 64 177 12.200ms 2.5267ms 1.5544ms
+time.cloudflare.com 10.21.8.19 3 u 25 64 177 5.0848ms 2.6248ms 2.6293ms
-ntp1.wirehive.net 202.70.69.81 2 u 21 64 177 9.6036ms 2.3986ms 1.9814ms
+ns4.turbodns.co.uk 195.195.221.100 2 u 21 64 177 10.896ms 2.9528ms 1.5288ms
-lond-web-1.speedwelshpool.com 194.58.204.148 2 u 23 64 177 5.6202ms 5.8218ms 3.2582ms
-time.shf.uk.as44574.net 85.199.214.98 2 u 29 64 77 9.0190ms 4.9419ms 2.5810ms
lux.22pf.org .INIT. 16 u - 64 0 0ns 0ns 119ns
ns1.thorcom.net .INIT. 16 u - 64 0 0ns 0ns 119ns
time.cloudflare.com .INIT. 16 u - 64 0 0ns 0ns 119ns
time.rdg.uk.as44574.net .INIT. 16 u - 64 0 0ns 0ns 119ns
-herm4.doylem.co.uk 185.203.69.150 2 u 19 64 177 15.024ms 9.5098ms 3.2011ms
-213.251.53.217 193.62.22.74 2 u 17 64 177 5.7211ms 1.4122ms 2.1895ms
*babbage.betadome.net 85.199.214.99 2 u 20 64 177 4.8614ms 4.1187ms 2.5533ms
#
#
# ntpq -c nts
NTS client sends: 56
NTS client recvs good: 0
NTS client recvs w error: 0
NTS server recvs good: 0
NTS server recvs w error: 0
NTS server sends: 0
NTS make cookies: 0
NTS decode cookies: 0
NTS decode cookies old: 0
NTS decode cookies too old: 0
NTS decode cookies error: 0
NTS KE probes good: 8
NTS KE probes_bad: 0
NTS KE serves good: 0
NTS KE serves_bad: 0
#
I ran tcpdump filtering on TCP ports 1234 (cloudflare) and 3443 (netnod), and I can see my system trying to negotiate NTS with Cloudflare and NetNod but both sessions are TCP RST 🙁