Containerlab

Some months ago I read about containerlab (and here). It looked like a very simple way to build labs quickly, easily and multi-vendor. I have used in the past gns3 and docker-topo for my labs but somehow I liked the documentation and the idea to try to mix cEOS with FRR images.

As I have felt more comfortable with the years with Arista and I had some images in my laptop, I installed the software (no mayor issues following the instructions for Debian) and try the example for a cEOS lab.

It didnt work. The containers started but I didnt get to the Arista CLI, just bash CLI and couldnt see anything runing on them… I remembered some Arista specific processes but none was there. In the following weeks, I tried newer cEOS but no luck always stuck in the same point. But at the end, never had enough time (or put the effort and interest) to troubleshoot the problem properly.

For too many months, I havent had the chance (I can write a post with excuses) to do much tech self-learning (I can write a book of all things I would like to learn), it was easier cooking or reading.

But finally, this week, talking with a colleague at work, he mentioned containerlab was great and he used it. I commented that I tried and failed. With that, I finally find a bit of interest and time today to give another go.

Firstly, I made sure I was running the latest containerlab version and my cEOS was recent enough (4.26.0F) and get to basics, check T-H-E logs!

So one thing I noticed after paying attention to the startup logs, I could see an warning about lack of memory in my laptop. So I closed several applications and tried again. My lab looked stuck in the same point:

go:1.16.3|py:3.7.3|tomas@athens:~/storage/technology/containerlabs/ceos$ sudo containerlab deploy --topo ceos-lab1.yaml 
INFO[0000] Parsing & checking topology file: ceos-lab1.yaml 
INFO[0000] Creating lab directory: /home/tomas/storage/technology/containerlabs/ceos/clab-ceos 
INFO[0000] Creating docker network: Name='clab', IPv4Subnet='172.20.20.0/24', IPv6Subnet='2001:172:20:20::/64', MTU='1500' 
INFO[0000] config file '/home/tomas/storage/technology/containerlabs/ceos/clab-ceos/ceos1/flash/startup-config' for node 'ceos1' already exists and will not be generated/reset 
INFO[0000] Creating container: ceos1                    
INFO[0000] config file '/home/tomas/storage/technology/containerlabs/ceos/clab-ceos/ceos2/flash/startup-config' for node 'ceos2' already exists and will not be generated/reset 
INFO[0000] Creating container: ceos2                    
INFO[0003] Creating virtual wire: ceos1:eth1 <--> ceos2:eth1 
INFO[0003] Running postdeploy actions for Arista cEOS 'ceos2' node 
INFO[0003] Running postdeploy actions for Arista cEOS 'ceos1' node 

I did a bit of searching about containerlab and ceos, for example, I could see this blog where the author started up successfully a lab with cEOS and I could see his logs!

So it was clear, my containers were stuck. So I searched for that message “Running postdeploy actions for Arista cEOS”.

I didnt see anything promising, just links back to the main container lab ceos page. I read it again and I noticed something in the bottom of the page regarding a known issue…. So I checked if that applied to me (although I doubted as it looked like it was for CentOS…) and indeed it applied to me too!

$ docker logs clab-ceos-ceos2
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted

So I started to find info about what is cgroup: link1, link2

First I wanted to check what cgroup version I was running. With this link, I could see that based on my kernel version, I should have cgroup2:

$ grep cgroup /proc/filesystems
nodev	cgroup
nodev	cgroup2

$ ls /sys/fs/cgroup/memory/
cgroup.clone_children           memory.kmem.tcp.limit_in_bytes      memory.stat
cgroup.event_control            memory.kmem.tcp.max_usage_in_bytes  memory.swappiness
cgroup.procs                    memory.kmem.tcp.usage_in_bytes      memory.usage_in_bytes
cgroup.sane_behavior            memory.kmem.usage_in_bytes          memory.use_hierarchy
dev-hugepages.mount             memory.limit_in_bytes               notify_on_release
dev-mqueue.mount                memory.max_usage_in_bytes           proc-fs-nfsd.mount
docker                          memory.memsw.failcnt                proc-sys-fs-binfmt_misc.mount
machine.slice                   memory.memsw.limit_in_bytes         release_agent
memory.failcnt                  memory.memsw.max_usage_in_bytes     sys-fs-fuse-connections.mount
memory.force_empty              memory.memsw.usage_in_bytes         sys-kernel-config.mount
memory.kmem.failcnt             memory.move_charge_at_immigrate     sys-kernel-debug.mount
memory.kmem.limit_in_bytes      memory.numa_stat                    sys-kernel-tracing.mount
memory.kmem.max_usage_in_bytes  memory.oom_control                  system.slice
memory.kmem.slabinfo            memory.pressure_level               tasks
memory.kmem.tcp.failcnt         memory.soft_limit_in_bytes          user.slice

As I had “cgroup.*” in my “/sys/fs/cgroup/memory” it was confirmed I was running cgroup2.

So how could I change to cgroup1 for docker only?

It seems that I couldnt change that only for an application because it is parameter that you pass to the kernel in boot time.

I learned that there is something called podman to replace docker in this blog.

So at the end, searching how to change cgroup in Debian, I used this link:

$ cat /etc/default/grub
...
# systemd.unified_cgroup_hierarchy=0 enables cgroupv1 that is needed for containerlabs to run ceos.... 
# https://github.com/srl-labs/containerlab/issues/467
# https://mbien.dev/blog/entry/java-in-rootless-containers-with
GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0"
....
$ sudo grub-mkconfig -o /boot/grub/grub.cfg
....
$ sudo reboot.

Good thing that the laptop rebooted fine! That was a relief šŸ™‚

Then I checked if the change made any difference. It failed but because it containerlab couldnt connect to docker… somehow docker had died. I restarted again docker and tried container lab…

$ sudo containerlab deploy --topo ceos-lab1.yaml 
INFO[0000] Parsing & checking topology file: ceos-lab1.yaml 
INFO[0000] Creating lab directory: /home/xxx/storage/technology/containerlabs/ceos/clab-ceos 
INFO[0000] Creating docker network: Name='clab', IPv4Subnet='172.20.20.0/24', IPv6Subnet='2001:172:20:20::/64', MTU='1500' 
INFO[0000] config file '/home/xxx/storage/technology/containerlabs/ceos/clab-ceos/ceos1/flash/startup-config' for node 'ceos1' already exists and will not be generated/reset 
INFO[0000] Creating container: ceos1                    
INFO[0000] config file '/home/xxx/storage/technology/containerlabs/ceos/clab-ceos/ceos2/flash/startup-config' for node 'ceos2' already exists and will not be generated/reset 
INFO[0000] Creating container: ceos2                    
INFO[0003] Creating virtual wire: ceos1:eth1 <--> ceos2:eth1 
INFO[0003] Running postdeploy actions for Arista cEOS 'ceos2' node 
INFO[0003] Running postdeploy actions for Arista cEOS 'ceos1' node 
INFO[0145] Adding containerlab host entries to /etc/hosts file 
+---+-----------------+--------------+------------------+------+-------+---------+----------------+----------------------+
| # |      Name       | Container ID |      Image       | Kind | Group |  State  |  IPv4 Address  |     IPv6 Address     |
+---+-----------------+--------------+------------------+------+-------+---------+----------------+----------------------+
| 1 | clab-ceos-ceos1 | 2807cd2f689f | ceos-lab:4.26.0F | ceos |       | running | 172.20.20.2/24 | 2001:172:20:20::2/64 |
| 2 | clab-ceos-ceos2 | e5d2aa4578b5 | ceos-lab:4.26.0F | ceos |       | running | 172.20.20.3/24 | 2001:172:20:20::3/64 |
+---+-----------------+--------------+------------------+------+-------+---------+----------------+----------------------+
$ sudo clab graph -t ceos-lab1.yaml 
INFO[0000] Parsing & checking topology file: ceos-lab1.yaml 
INFO[0000] Listening on :50080...       

After a bit, it seems it worked! And learned about an option to show a graph of your topology with “graph”

I checked the ceos container logs

$ docker logs clab-ceos-ceos1
....
[  OK  ] Started SYSV: Eos system init scrip...uns after POST, before ProcMgr).
         Starting Power-On Self Test...
         Starting EOS Warmup Service...
[  OK  ] Started Power-On Self Test.
[  OK  ] Reached target EOS regular mode.
[  OK  ] Started EOS Diagnostic Mode.
[     *] A start job is running for EOS Warmup Service (2min 9s / no limit)Reloading.
$ 
$ docker exec -it clab-ceos-ceos1 Cli
ceos1>
ceos1>enable 
ceos1#show version 
 cEOSLab
Hardware version: 
Serial number: 
Hardware MAC address: 001c.7389.2099
System MAC address: 001c.7389.2099

Software image version: 4.26.0F-21792469.4260F (engineering build)
Architecture: i686
Internal build version: 4.26.0F-21792469.4260F
Internal build ID: c5b41f65-54cd-44b1-b576-b5c48700ee19

cEOS tools version: 1.1
Kernel version: 5.10.0-8-amd64

Uptime: 0 minutes
Total memory: 8049260 kB
Free memory: 2469328 kB

ceos1#
ceos1#show interfaces description 
Interface                      Status         Protocol           Description
Et1                            up             up                 
Ma0                            up             up                 
ceos1#show running-config interfaces ethernet 1
interface Ethernet1
ceos1#

Yes! Finally working!

So now, I dont have excuses to keep learning new things!

BTW, these are the different versions I am using at the moment:

$ uname -a
Linux athens 5.10.0-8-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03) x86_64 GNU/Linux 

$ docker -v
Docker version 20.10.5+dfsg1, build 55c4c88

$ containerlab version

                           _                   _       _     
                 _        (_)                 | |     | |    
 ____ ___  ____ | |_  ____ _ ____   ____  ____| | ____| | _  
/ ___) _ \|  _ \|  _)/ _  | |  _ \ / _  )/ ___) |/ _  | || \ 
( (__| |_|| | | | |_( ( | | | | | ( (/ /| |   | ( ( | | |_) )
\____)___/|_| |_|\___)_||_|_|_| |_|\____)_|   |_|\_||_|____/ 

    version: 0.17.0
     commit: eba1b82
       date: 2021-08-25T09:31:53Z
     source: https://github.com/srl-labs/containerlab
 rel. notes: https://containerlab.srlinux.dev/rn/0.17/

My concern is, how this cgroup1 will affect other applications like kubernetes?

BTW, I have the same issue with containerlab as with docker-topo, when I use “Alt+Home(left arrow)” my laptop leave X-Windows and gets to the tty!!!

BGP AIGP and BGP churn

I have read BGP churn before, I find the meaning and then forget about it until next time.

BGP churn = the rate of routing updates thatĀ BGP routers must process.

And I can’t really find a straight definition.

As well, I didnt know the actual meaning of churn was. So it s a machine to produce butter and the amount of customer stopping using a product.


I read at work something about AIGP from a different department. I searched and found that is a new BGP optional non-transitive path attribute and the BGP decision process is updated for that. And this is from 2015!!! So I am 6 years behind…. And there is a RFC7311

Note:Ā BGP routers that do not support the optional non-transitive attributes (e.g. AIGP) must delete such attributes and must not pass them to other BGP peers.

So it seems a good feature if you run a big AS and want BGP to take into account the IGP cost.

Linux+MPLS-Part4

Finally I am trying to setup MPLS L3VPN.

Again, I am following theĀ author postĀ but adapting it to my environment using libvirt instead of VirtualBox and Debian10 as VM. All my data is here.

This is the diagram for the lab:

Difference from lab3 and lab2. We have P1, that is a pure P router, only handling labels, it doesnt do any BGP.

This time all devices FRR config are generated automatically via gen_frr_config.py (in lab2 all config was manual).

Again the environment is configured via Vagrant file + l3vpn_provisioning script. This is mix of lab2 (install FRR), lab3 (define VRFs) and lab1 (configure MPLS at linux level).

So after some tuning, everything is installed, routing looks correct (although I dont know why but I have to reload FRR to get the proper generated BGP config in PE1 and PE2. P1 is fine).

So let’s see PE1:

IGP (IS-IS) is up:

PE1# show isis neighbor 
 Area ISIS:
   System Id           Interface   L  State        Holdtime SNPA
   P1                  ens8        2  Up            30       2020.2020.2020
 PE1# 
 PE1# exit
 root@PE1:/home/vagrant# 

BGP is up to PE2 and we can see routes received in AF IPv4VPN:

PE1# 
 PE1# show bgp summary 
 IPv4 Unicast Summary:
 BGP router identifier 172.20.5.1, local AS number 65010 vrf-id 0
 BGP table version 0
 RIB entries 0, using 0 bytes of memory
 Peers 1, using 21 KiB of memory
 Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
 172.20.5.2      4      65010       111       105        0    0    0 01:39:14            0        0
 Total number of neighbors 1
 IPv4 VPN Summary:
 BGP router identifier 172.20.5.1, local AS number 65010 vrf-id 0
 BGP table version 0
 RIB entries 11, using 2112 bytes of memory
 Peers 1, using 21 KiB of memory
 Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
 172.20.5.2      4      65010       111       105        0    0    0 01:39:14            2        2
 Total number of neighbors 1
 PE1# 

Check routing tables, we can see prefixes in both VRFs, so that’s good. And the labels needed.

PE1# show ip route vrf all 
 Codes: K - kernel route, C - connected, S - static, R - RIP,
        O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
        T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
        F - PBR, f - OpenFabric,
        > - selected route, * - FIB route, q - queued, r - rejected, b - backup
 VRF default:
 C>* 172.20.5.1/32 is directly connected, lo, 02:19:16
 I>* 172.20.5.2/32 [115/30] via 192.168.66.102, ens8, label 17, weight 1, 02:16:10
 I>* 172.20.5.5/32 [115/20] via 192.168.66.102, ens8, label implicit-null, weight 1, 02:18:34
 I   192.168.66.0/24 [115/20] via 192.168.66.102, ens8 inactive, weight 1, 02:18:34
 C>* 192.168.66.0/24 is directly connected, ens8, 02:19:16
 I>* 192.168.77.0/24 [115/20] via 192.168.66.102, ens8, label implicit-null, weight 1, 02:18:34
 C>* 192.168.121.0/24 is directly connected, ens5, 02:19:16
 K>* 192.168.121.1/32 [0/1024] is directly connected, ens5, 02:19:16
 VRF vrf_cust1:
 C>* 192.168.11.0/24 is directly connected, ens6, 02:19:05
 B>  192.168.23.0/24 [200/0] via 172.20.5.2 (vrf default) (recursive), label 80, weight 1, 02:13:32
 via 192.168.66.102, ens8 (vrf default), label 17/80, weight 1, 02:13:32 
 VRF vrf_cust2:
 C>* 192.168.12.0/24 is directly connected, ens7, 02:19:05
 B>  192.168.24.0/24 [200/0] via 172.20.5.2 (vrf default) (recursive), label 81, weight 1, 02:13:32
 via 192.168.66.102, ens8 (vrf default), label 17/81, weight 1, 02:13:32
 PE1#  

Now check LDP and MPLS labels. Everything looks sane. We have LDP labels for P1 (17) and PE2 (18). And labels for each VFR.

PE1# show mpls table 
  Inbound Label  Type  Nexthop         Outbound Label  
 
 16             LDP   192.168.66.102  implicit-null   
  17             LDP   192.168.66.102  implicit-null   
  18             LDP   192.168.66.102  17              
  80             BGP   vrf_cust1       -               
  81             BGP   vrf_cust2       -               
 PE1# 
 PE1# show mpls ldp neighbor 
 AF   ID              State       Remote Address    Uptime
 ipv4 172.20.5.5      OPERATIONAL 172.20.5.5      02:20:20
 PE1# 
 PE1# 
 PE1# show mpls ldp binding  
 AF   Destination          Nexthop         Local Label Remote Label  In Use
 ipv4 172.20.5.1/32        172.20.5.5      imp-null    16                no
 ipv4 172.20.5.2/32        172.20.5.5      18          17               yes
 ipv4 172.20.5.5/32        172.20.5.5      16          imp-null         yes
 ipv4 192.168.11.0/24      0.0.0.0         imp-null    -                 no
 ipv4 192.168.12.0/24      0.0.0.0         imp-null    -                 no
 ipv4 192.168.66.0/24      172.20.5.5      imp-null    imp-null          no
 ipv4 192.168.77.0/24      172.20.5.5      17          imp-null         yes
 ipv4 192.168.121.0/24     172.20.5.5      imp-null    imp-null          no
 PE1# 

Similar view happens in PE2.

From P1 that is our P router. We only care about LDP and ISIS

P1# 
 P1# show mpls table 
  Inbound Label  Type  Nexthop         Outbound Label  
 
 16             LDP   192.168.66.101  implicit-null   
  17             LDP   192.168.77.101  implicit-null   
 P1# show mpls ldp neighbor 
 AF   ID              State       Remote Address    Uptime
 ipv4 172.20.5.1      OPERATIONAL 172.20.5.1      02:23:55
 ipv4 172.20.5.2      OPERATIONAL 172.20.5.2      02:21:01
 P1# 
 P1# show isis neighbor 
 Area ISIS:
   System Id           Interface   L  State        Holdtime SNPA
   PE1                 ens6        2  Up            28       2020.2020.2020
   PE2                 ens7        2  Up            29       2020.2020.2020
 P1# 
 P1# show ip route
 Codes: K - kernel route, C - connected, S - static, R - RIP,
        O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
        T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
        F - PBR, f - OpenFabric,
        > - selected route, * - FIB route, q - queued, r - rejected, b - backup
 K>* 0.0.0.0/0 [0/1024] via 192.168.121.1, ens5, src 192.168.121.253, 02:24:45
 I>* 172.20.5.1/32 [115/20] via 192.168.66.101, ens6, label implicit-null, weight 1, 02:24:04
 I>* 172.20.5.2/32 [115/20] via 192.168.77.101, ens7, label implicit-null, weight 1, 02:21:39
 C>* 172.20.5.5/32 is directly connected, lo, 02:24:45
 I   192.168.66.0/24 [115/20] via 192.168.66.101, ens6 inactive, weight 1, 02:24:04
 C>* 192.168.66.0/24 is directly connected, ens6, 02:24:45
 I   192.168.77.0/24 [115/20] via 192.168.77.101, ens7 inactive, weight 1, 02:21:39
 C>* 192.168.77.0/24 is directly connected, ens7, 02:24:45
 C>* 192.168.121.0/24 is directly connected, ens5, 02:24:45
 K>* 192.168.121.1/32 [0/1024] is directly connected, ens5, 02:24:45
 P1# 

So as usual, let’s try to test connectivity. Will ping from CE1 (connected to PE1) to CE3 (connected to PE2) that belong to the same VRF vrf_cust1.

First of all, I had to modify iptables in my host to avoid unnecessary NAT (iptables masquerade) between CE1 and CE3.

# iptables -t nat -vnL LIBVIRT_PRT --line-numbers
 Chain LIBVIRT_PRT (1 references)
 num   pkts bytes target     prot opt in     out     source               destination         
 1       15  1451 RETURN     all  --  *      *       192.168.77.0/24      224.0.0.0/24        
 2        0     0 RETURN     all  --  *      *       192.168.77.0/24      255.255.255.255     
 3        0     0 MASQUERADE  tcp  --  *      *       192.168.77.0/24     !192.168.77.0/24      masq ports: 1024-65535
 4       18  3476 MASQUERADE  udp  --  *      *       192.168.77.0/24     !192.168.77.0/24      masq ports: 1024-65535
 5        0     0 MASQUERADE  all  --  *      *       192.168.77.0/24     !192.168.77.0/24     
 6       13  1754 RETURN     all  --  *      *       192.168.122.0/24     224.0.0.0/24        
 7        0     0 RETURN     all  --  *      *       192.168.122.0/24     255.255.255.255     
 8        0     0 MASQUERADE  tcp  --  *      *       192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
 9        0     0 MASQUERADE  udp  --  *      *       192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
 10       0     0 MASQUERADE  all  --  *      *       192.168.122.0/24    !192.168.122.0/24    
 11      24  2301 RETURN     all  --  *      *       192.168.11.0/24      224.0.0.0/24        
 12       0     0 RETURN     all  --  *      *       192.168.11.0/24      255.255.255.255     
 13       0     0 MASQUERADE  tcp  --  *      *       192.168.11.0/24     !192.168.11.0/24      masq ports: 1024-65535
 14      23  4476 MASQUERADE  udp  --  *      *       192.168.11.0/24     !192.168.11.0/24      masq ports: 1024-65535
 15       1    84 MASQUERADE  all  --  *      *       192.168.11.0/24     !192.168.11.0/24     
 16      29  2541 RETURN     all  --  *      *       192.168.121.0/24     224.0.0.0/24        
 17       0     0 RETURN     all  --  *      *       192.168.121.0/24     255.255.255.255     
 18      36  2160 MASQUERADE  tcp  --  *      *       192.168.121.0/24    !192.168.121.0/24     masq ports: 1024-65535
 19      65  7792 MASQUERADE  udp  --  *      *       192.168.121.0/24    !192.168.121.0/24     masq ports: 1024-65535
 20       0     0 MASQUERADE  all  --  *      *       192.168.121.0/24    !192.168.121.0/24    
 21      20  2119 RETURN     all  --  *      *       192.168.24.0/24      224.0.0.0/24        
 22       0     0 RETURN     all  --  *      *       192.168.24.0/24      255.255.255.255     
 23       0     0 MASQUERADE  tcp  --  *      *       192.168.24.0/24     !192.168.24.0/24      masq ports: 1024-65535
 24      21  4076 MASQUERADE  udp  --  *      *       192.168.24.0/24     !192.168.24.0/24      masq ports: 1024-65535
 25       0     0 MASQUERADE  all  --  *      *       192.168.24.0/24     !192.168.24.0/24     
 26      20  2119 RETURN     all  --  *      *       192.168.23.0/24      224.0.0.0/24        
 27       0     0 RETURN     all  --  *      *       192.168.23.0/24      255.255.255.255     
 28       1    60 MASQUERADE  tcp  --  *      *       192.168.23.0/24     !192.168.23.0/24      masq ports: 1024-65535
 29      20  3876 MASQUERADE  udp  --  *      *       192.168.23.0/24     !192.168.23.0/24      masq ports: 1024-65535
 30       1    84 MASQUERADE  all  --  *      *       192.168.23.0/24     !192.168.23.0/24     
 31      25  2389 RETURN     all  --  *      *       192.168.66.0/24      224.0.0.0/24        
 32       0     0 RETURN     all  --  *      *       192.168.66.0/24      255.255.255.255     
 33       0     0 MASQUERADE  tcp  --  *      *       192.168.66.0/24     !192.168.66.0/24      masq ports: 1024-65535
 34      23  4476 MASQUERADE  udp  --  *      *       192.168.66.0/24     !192.168.66.0/24      masq ports: 1024-65535
 35       0     0 MASQUERADE  all  --  *      *       192.168.66.0/24     !192.168.66.0/24     
 36      24  2298 RETURN     all  --  *      *       192.168.12.0/24      224.0.0.0/24        
 37       0     0 RETURN     all  --  *      *       192.168.12.0/24      255.255.255.255     
 38       0     0 MASQUERADE  tcp  --  *      *       192.168.12.0/24     !192.168.12.0/24      masq ports: 1024-65535
 39      23  4476 MASQUERADE  udp  --  *      *       192.168.12.0/24     !192.168.12.0/24      masq ports: 1024-65535
 40       0     0 MASQUERADE  all  --  *      *       192.168.12.0/24     !192.168.12.0/24     
#


# iptables -t nat -I LIBVIRT_PRT 13 -s 192.168.11.0/24 -d 192.168.23.0/24 -j RETURN
# iptables -t nat -I LIBVIRT_PRT 29 -s 192.168.23.0/24 -d 192.168.11.0/24 -j RETURN

Ok, staring pinging from CE1 to CE3:

vagrant@CE1:~$ ping 192.168.23.102
 PING 192.168.23.102 (192.168.23.102) 56(84) bytes of data.

No good. Let’s check what the next hop, PE1, is doing. It seem it is sending the traffic double encapsulated to P1 as expected

root@PE1:/home/vagrant# tcpdump -i ens8
...
20:29:16.648325 MPLS (label 17, exp 0, ttl 63) (label 80, exp 0, [S], ttl 63) IP 192.168.11.102 > 192.168.23.102: ICMP echo request, id 2298, seq 2627, length 64
20:29:17.672287 MPLS (label 17, exp 0, ttl 63) (label 80, exp 0, [S], ttl 63) IP 192.168.11.102 > 192.168.23.102: ICMP echo request, id 2298, seq 2628, length 64
...

Let’s check next hop, P1. I can see it is sending the traffic to PE2 doing PHP, so removing the top label (LDP) and only leaving the BGP label:

root@PE2:/home/vagrant# tcpdump -i ens8
...
20:29:16.648176 MPLS (label 80, exp 0, [S], ttl 63) IP 192.168.11.102 > 192.168.23.102: ICMP echo request, id 2298, seq 2627, length 64
20:29:17.671968 MPLS (label 80, exp 0, [S], ttl 63) IP 192.168.11.102 > 192.168.23.102: ICMP echo request, id 2298, seq 2628, length 64
...

But then PE2 is not sending anything to CE3. I can’t see anything in the links:

root@CE3:/home/vagrant# tcpdump -i ens6
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on ens6, link-type EN10MB (Ethernet), capture size 262144 bytes
 20:32:03.174796 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:e2:cb:54.8001, length 35
 20:32:05.158761 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:e2:cb:54.8001, length 35
 20:32:07.174742 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:e2:cb:54.8001, length 35

I have double-checked the configs. All routing and config looks sane in PE2:

vagrant@PE2:~$ ip route
 default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.31 metric 1024 
 172.20.5.1  encap mpls  16 via 192.168.77.102 dev ens8 proto isis metric 20 
 172.20.5.5 via 192.168.77.102 dev ens8 proto isis metric 20 
 192.168.66.0/24 via 192.168.77.102 dev ens8 proto isis metric 20 
 192.168.77.0/24 dev ens8 proto kernel scope link src 192.168.77.101 
 192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.31 
 192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.31 metric 1024 
 vagrant@PE2:~$ 
 vagrant@PE2:~$ ip -4 a
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet 172.20.5.2/32 scope global lo
        valid_lft forever preferred_lft forever
 2: ens5:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.121.31/24 brd 192.168.121.255 scope global dynamic ens5
        valid_lft 2524sec preferred_lft 2524sec
 3: ens6:  mtu 1500 qdisc pfifo_fast master vrf_cust1 state UP group default qlen 1000
     inet 192.168.23.101/24 brd 192.168.23.255 scope global ens6
        valid_lft forever preferred_lft forever
 4: ens7:  mtu 1500 qdisc pfifo_fast master vrf_cust2 state UP group default qlen 1000
     inet 192.168.24.101/24 brd 192.168.24.255 scope global ens7
        valid_lft forever preferred_lft forever
 5: ens8:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.77.101/24 brd 192.168.77.255 scope global ens8
        valid_lft forever preferred_lft forever
 vagrant@PE2:~$ 
 vagrant@PE2:~$ 
 vagrant@PE2:~$ 
 vagrant@PE2:~$ 
 vagrant@PE2:~$ ip -M route
 16 as to 16 via inet 192.168.77.102 dev ens8 proto ldp 
 17 via inet 192.168.77.102 dev ens8 proto ldp 
 18 via inet 192.168.77.102 dev ens8 proto ldp 
 vagrant@PE2:~$ 
 vagrant@PE2:~$ ip route show table 10
 blackhole default 
 192.168.11.0/24  encap mpls  16/80 via 192.168.77.102 dev ens8 proto bgp metric 20 
 broadcast 192.168.23.0 dev ens6 proto kernel scope link src 192.168.23.101 
 192.168.23.0/24 dev ens6 proto kernel scope link src 192.168.23.101 
 local 192.168.23.101 dev ens6 proto kernel scope host src 192.168.23.101 
 broadcast 192.168.23.255 dev ens6 proto kernel scope link src 192.168.23.101 
 vagrant@PE2:~$ 
 vagrant@PE2:~$                       
 vagrant@PE2:~$ ip vrf      
 Name              Table
 vrf_cust1           10
 vrf_cust2           20
 vagrant@PE2:~$ 

root@PE2:/home/vagrant# sysctl -a | grep mpls
 net.mpls.conf.ens5.input = 0
 net.mpls.conf.ens6.input = 0
 net.mpls.conf.ens7.input = 0
 net.mpls.conf.ens8.input = 1
 net.mpls.conf.lo.input = 0
 net.mpls.conf.vrf_cust1.input = 0
 net.mpls.conf.vrf_cust2.input = 0
 net.mpls.default_ttl = 255
 net.mpls.ip_ttl_propagate = 1
 net.mpls.platform_labels = 100000
root@PE2:/home/vagrant# 
root@PE2:/home/vagrant# lsmod | grep mpls
 mpls_iptunnel          16384  3
 mpls_router            36864  1 mpls_iptunnel
 ip_tunnel              24576  1 mpls_router
root@PE2:/home/vagrant# 

So I am a bit puzzled the last couple of weeks about this issue. I was thinking that iptables was fooling me again and was dropping the traffic somehow but as far as I can see. PE2 is not sending anything and I dont really know how to troubleshoot FRR in this case. I have asked for help in the FRR list. Let’s see how it goes. I think I am doing something wrong because I am not doing anything new.

Linux+MPLS-Part3


Continuation of theĀ second part, this time we want to test VRF-lite.

Again, I am following theĀ author postĀ but adapting it to my environment using libvirt instead of VirtualBox and Debian10 as VM. All my data is here.

This is the diagram adapted to my lab:

After updating Vagrantfile and provisioning script, I “vagrant up”. The 6 VMs dont take long to boot up so it is a good thing.

The provisioning script is mainly for configuration of PE1 and PE2 . This is a bit more detail:

    # enabling ipv4 forwarding (routing)
    sudo sysctl net.ipv4.ip_forward=1

    # add loopback (not used in lab3)
    sudo ip addr add 172.20.5.$self/32 dev lo

    # removing ip in link between pe1-pe2 as we will setup a trunk with two vlans.
    sudo ip addr del 192.168.66.10$self/24 dev ens8

    # creating two vlans 10 (ce1,ce3) and 20 (ce2, ce4)
    sudo ip link add link ens8 name vlan10 type vlan id 10
    sudo ip link add link ens8 name vlan20 type vlan id 20

    # assign IP to each vlan
    sudo ip addr add 172.30.10.10$self/24 dev vlan10
    sudo ip addr add 172.30.20.10$self/24 dev vlan20

    # turn up each vlan as by default are down
    sudo ip link set vlan10 up
    sudo ip link set vlan20 up

    # create two routing tables with a null route
    sudo ip route add blackhole 0.0.0.0/0 table 10
    sudo ip route add blackhole 0.0.0.0/0 table 20

    # create two VRFs and assign one table (created above) to each one
    sudo ip link add name vrf_cust1 type vrf table 10
    sudo ip link add name vrf_cust2 type vrf table 20

    # assign interfaces to the VRFs            // ie. PE1:
    sudo ip link set ens6 master vrf_cust1     // interface to CE1
    sudo ip link set vlan10 master vrf_cust1   // interface to PE2-vlan10

    sudo ip link set ens7 master vrf_cust2     // interface to CE2
    sudo ip link set vlan20 master vrf_cust2   // interface to PE2-vlan20

    # turn up VRFs
    sudo ip link set vrf_cust1 up
    sudo ip link set vrf_cust2 up

    # add static route in each VRF routing table to reach the opposite CE
    sudo ip route add 192.168.$route1.0/24 via 172.30.10.10$neighbor table 10
    sudo ip route add 192.168.$route2.0/24 via 172.30.20.10$neighbor table 20

Check the status of the VRFs in PE1:

vagrant@PE1:/vagrant$ ip link show type vrf
 8: vrf_cust1:  mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
     link/ether c6:b8:f2:3b:53:ed brd ff:ff:ff:ff:ff:ff
 9: vrf_cust2:  mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
     link/ether 62:1c:1d:0a:68:3d brd ff:ff:ff:ff:ff:ff
 vagrant@PE1:/vagrant$ 
 vagrant@PE1:/vagrant$ ip link show vrf vrf_cust1
 3: ens6:  mtu 1500 qdisc pfifo_fast master vrf_cust1 state UP mode DEFAULT group default qlen 1000
     link/ether 52:54:00:6f:16:1e brd ff:ff:ff:ff:ff:ff
 6: vlan10@ens8:  mtu 1500 qdisc noqueue master vrf_cust1 state UP mode DEFAULT group default qlen 1000
     link/ether 52:54:00:33:ab:0b brd ff:ff:ff:ff:ff:ff
 vagrant@PE1:/vagrant$ 

So let’s test if we can ping from CE1 to CE3:

Ok, if fails. I noticed that PE1 sees the packet from CE1… but the source IP is not the expected one (11.1 is the host/my laptop). And the packet reaches to PE2 with the same wrong source IP and then to CE3. In CE3 the ICMP reply is sent to 11.1, to it never reaches CE1.

The positive thing is that VRF lite seems to work.

I double checked all IPs, routing, etc. duplicated MAC in CE1 and my laptop maybe??? I installed “net-tools” to get “arp” command and check the arp table contents in CE1. Checking the ARP request in wireshark, all was good.

Somehow, the host was getting involved…. Keeping in mind that this is a simulated network, the host has access to all “links” in the lab. Libvirt creates a bridge (switch) for each link and it adds a vnet (port) for each VM that uses it:

# brctl show 
 bridge name    bridge id       STP enabled interfaces
 virbr10        8000.525400b747b0   yes     vnet27
                                            vnet30
 virbr11        8000.5254006e5a56   yes     vnet23
                                            vnet31
 virbr12        8000.525400dd521a   yes     vnet19
                                            vnet21
 virbr3        8000.525400a38db1   yes     vnet16
                             vnet18
                             vnet20
                             vnet24
                             vnet26
                             vnet28
 virbr8        8000.525400de61f2   yes     vnet17
                                           vnet22
 virbr9        8000.525400e2cb54   yes     vnet25
                                           vnet29

“.1” is always the host but It was clear my routing was correct in all devices. I remembered that I had some issues during the summer when I was playing with containers/docker and doing some routing…. so I checked iptables….

I didnt have iptables in the VMs… but as stated earlier, the host is connected to all “links” used between the VMs. There is no real point-to-point link.

# iptables -t nat -vnL --line-numbers
...
Chain LIBVIRT_PRT (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1       11   580 RETURN     all  --  *      *       192.168.11.0/24      224.0.0.0/24        
2        0     0 RETURN     all  --  *      *       192.168.11.0/24      255.255.255.255     
3        0     0 MASQUERADE  tcp  --  *      *       192.168.11.0/24     !192.168.11.0/24      masq ports: 1024-65535
4       40  7876 MASQUERADE  udp  --  *      *       192.168.11.0/24     !192.168.11.0/24      masq ports: 1024-65535
5       16  1344 MASQUERADE  all  --  *      *       192.168.11.0/24     !192.168.11.0/24     
6       15   796 RETURN     all  --  *      *       192.168.24.0/24      224.0.0.0/24        
7        0     0 RETURN     all  --  *      *       192.168.24.0/24      255.255.255.255     
8        0     0 MASQUERADE  tcp  --  *      *       192.168.24.0/24     !192.168.24.0/24      masq ports: 1024-65535
9       49  9552 MASQUERADE  udp  --  *      *       192.168.24.0/24     !192.168.24.0/24      masq ports: 1024-65535
10       0     0 MASQUERADE  all  --  *      *       192.168.24.0/24     !192.168.24.0/24     



# iptables-save -t nat
# Generated by iptables-save v1.8.7 on Sun Feb  7 12:06:09 2021
*nat
:PREROUTING ACCEPT [365:28580]
:INPUT ACCEPT [143:14556]
:OUTPUT ACCEPT [1617:160046]
:POSTROUTING ACCEPT [1390:101803]
:DOCKER - [0:0]
:LIBVIRT_PRT - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.18.0.0/16 ! -o br-4bd17cfa19a8 -j MASQUERADE
-A POSTROUTING -s 172.19.0.0/16 ! -o br-43481af25965 -j MASQUERADE
-A POSTROUTING -j LIBVIRT_PRT
-A POSTROUTING -s 192.168.122.0/24 -d 224.0.0.0/24 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 -d 255.255.255.255/32 -j RETURN
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A POSTROUTING -s 192.168.122.0/24 ! -d 192.168.122.0/24 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i br-4bd17cfa19a8 -j RETURN
-A DOCKER -i br-43481af25965 -j RETURN
-A LIBVIRT_PRT -s 192.168.11.0/24 -d 224.0.0.0/24 -j RETURN
-A LIBVIRT_PRT -s 192.168.11.0/24 -d 255.255.255.255/32 -j RETURN
-A LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -j MASQUERADE
-A LIBVIRT_PRT -s 192.168.24.0/24 -d 224.0.0.0/24 -j RETURN
-A LIBVIRT_PRT -s 192.168.24.0/24 -d 255.255.255.255/32 -j RETURN
-A LIBVIRT_PRT -s 192.168.24.0/24 ! -d 192.168.24.0/24 -p tcp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.24.0/24 ! -d 192.168.24.0/24 -p udp -j MASQUERADE --to-ports 1024-65535
-A LIBVIRT_PRT -s 192.168.24.0/24 ! -d 192.168.24.0/24 -j MASQUERADE

Ok, it seems the traffic form 192.168.11.0 to 192.168.23.0 is NAT-ed (masquerade in iptables). So makes sense that I see the traffic as 11.1 in PE1. Let’s remove that:

# iptables -t nat -D LIBVIRT_PRT -s 192.168.11.0/24 ! -d 192.168.11.0/24 -j MASQUERADE

Test again pinging from CE1 to CE3:

So it works properly, we can see the the correct IPs in every hop: PE1, PE2 and CE3.

So it seems this is a built-in behaviour in libvirt. I need to find out how to “fix” this behaviour whenever I do “vagrant up”.

Linux+MPLS-Part2

Continuation of the first part, this time we want to establish dynamic LSP, so we will use LDP for label exchange and ISIS as IGP.

Again, I am following the author post but adapting it to my environment. The latest stable FRR is 7.5. All my data is here.

So once the routers R1, R2 and R3 are configured and FRR is reload (very important, restart doesnt do the trick). ISIS and LDP will come up, you need just need to be a bit patience.

Checking on R2, we can see ISIS and LDP established to R1 and R3 respectively. So this is a very good sign.

R2# show isis neighbor 
 Area ISIS:
   System Id           Interface   L  State        Holdtime SNPA
   R1                  ens6        2  Up            30       2020.2020.2020
   R3                  ens7        2  Up            28       2020.2020.2020
 R2# 
 R2# show mpls ldp neighbor 
 AF   ID              State       Remote Address    Uptime
 ipv4 172.20.15.1     OPERATIONAL 172.20.15.1     00:27:44
 ipv4 172.20.15.3     OPERATIONAL 172.20.15.3     00:27:47
 R2# 

Let’s check the routing table is programmed as expected. R2 is learning R1 and R3 loopbacks via ISIS and it reachable via MPLS (using implicit-null because R2 is doing Penultimate Hop Popping – PHP) based on the LDP bindings.

R2# show ip route
 Codes: K - kernel route, C - connected, S - static, R - RIP,
        O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
        T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
        F - PBR, f - OpenFabric,
        > - selected route, * - FIB route, q - queued, r - rejected, b - backup
 K>* 0.0.0.0/0 [0/1024] via 192.168.121.1, ens5, src 192.168.121.90, 00:12:42
 I>* 172.20.15.1/32 [115/20] via 192.168.12.101, ens6, label implicit-null, weight 1, 00:01:26
 C>* 172.20.15.2/32 is directly connected, lo, 00:12:42
 I>* 172.20.15.3/32 [115/20] via 192.168.23.101, ens7, label implicit-null, weight 1, 00:01:26
 I   192.168.12.0/24 [115/20] via 192.168.12.101, ens6 inactive, weight 1, 00:01:26
 C>* 192.168.12.0/24 is directly connected, ens6, 00:12:42
 I   192.168.23.0/24 [115/20] via 192.168.23.101, ens7 inactive, weight 1, 00:01:26
 C>* 192.168.23.0/24 is directly connected, ens7, 00:12:42
 C>* 192.168.121.0/24 is directly connected, ens5, 00:12:42
 K>* 192.168.121.1/32 [0/1024] is directly connected, ens5, 00:12:42
 R2# 
R2# show mpls ldp binding 
 AF   Destination          Nexthop         Local Label Remote Label  In Use
 ipv4 172.20.15.1/32       172.20.15.1     16          imp-null         yes
 ipv4 172.20.15.1/32       172.20.15.3     16          18                no
 ipv4 172.20.15.2/32       172.20.15.1     imp-null    16                no
 ipv4 172.20.15.2/32       172.20.15.3     imp-null    16                no
 ipv4 172.20.15.3/32       172.20.15.1     17          18                no
 ipv4 172.20.15.3/32       172.20.15.3     17          imp-null         yes
 ipv4 192.168.12.0/24      172.20.15.1     imp-null    imp-null          no
 ipv4 192.168.12.0/24      172.20.15.3     imp-null    17                no
 ipv4 192.168.23.0/24      172.20.15.1     imp-null    17                no
 ipv4 192.168.23.0/24      172.20.15.3     imp-null    imp-null          no
 ipv4 192.168.121.0/24     172.20.15.1     imp-null    imp-null          no
 ipv4 192.168.121.0/24     172.20.15.3     imp-null    imp-null          no
 R2# 

Now, let’s do the ping test and see if MPLS is actually used.

I can see clearly on the left hand side, that R2-ens6 (link to R1) is receiving the ICMP request as MPLS packet (label 17) and the ICMP reply is sent back to R1 without label (as expected by PHP). In R2-ens7 (link to R3) we see R2 sending the ICMP request without label (again expected due to PHP) and the ICMP reply from R3 is arriving with label 16 to R2.

I have to say that I had to try twice until things got working as expected. In my first attempt, somehow, R1 was not sending ICMP request to R2 encapsulated as MPLS packet, somehow the routing table was still programmed for only ISIS. Although ISIS, LDP and LDP bindings were correc.t

NOTES:

1- vagrant-nfs: I was thinking how to connect the VMs with my laptop for sharing files easily. It seems that by default the folder which is holding your Vagrant file is automatically exported in NFS in /vagrant in the VMs. Super handy. Just in case, a bit of documentation. My vagrant version is 2.2.14.

2- For loading the FRR config, I had to “lowercase” the VM hostname to match the FRR config file. Based on this link, it is quite easy. “${X,,}”

Linux+MPLS-Part1

In November 2020, I got an email from the FRR email list about using MPLS with FRR. And the answer that you could do already natively (and easily) MPLS in Linux dumbfound me. So I add in my to-do list, try MPLS in Linux as per the blog. So all credits to the author, that’s a great job.

So reading the blog, I learned that the kernel supported MPLS since 4.3 (I am using 5.10) and creating VRF support was challenging until Cumulus did it. Thanks! So since April 2017 there is full support for L3VPNs in Linux… I’m getting a bit late in the wagon.

Anyway, I want to test myself and see if I can make it work. I downloaded the repo from the author to start working on it.

So I am following the same steps as him and will start with a lab consisting of static LSP. This is the diagram:

Main differences in my lab are:

1- I use libvirt instead of VirtualBox

2- I am using debian10 buster64 as VM

This affect the Vagrant file and the script to configure the static LSP. The libvirt_ commands I am using in Vagrantfile are ignored as I am not able to name the interfaces as I want. As well, I had to change the IP addressing as I had collisions with .1. And debian/buster64 has specific interfaces names that I have to use.

So, now we can turn up the lab.

/mpls-linux/lab1-static-lsps$ vagrant up
 Bringing machine 'r1' up with 'libvirt' providerā€¦
 Bringing machine 'r2' up with 'libvirt' providerā€¦
 Bringing machine 'r3' up with 'libvirt' providerā€¦
 ==> r2: Checking if box 'debian/buster64' version '10.4.0' is up to dateā€¦
 ==> r3: Checking if box 'debian/buster64' version '10.4.0' is up to dateā€¦
 ==> r1: Checking if box 'debian/buster64' version '10.4.0' is up to dateā€¦
 ==> r1: Creating image (snapshot of base box volume).
 ==> r2: Creating image (snapshot of base box volume).
 ==> r3: Creating image (snapshot of base box volume).
 ==> r2: Creating domain with the following settingsā€¦
 ==> r1: Creating domain with the following settingsā€¦
...
/mpls-linux/lab1-static-lsps master$ vagrant status
 Current machine states:
 r1                        running (libvirt)
 r2                        running (libvirt)
 r3                        running (libvirt)

So we can check R1. One important detail here, is how we can defined a static route to reach R3 loopback and it is encapsulated in MPLS with label 100.

/mpls-linux/lab1-static-lsps$ vagrant ssh r1
...
vagrant@R1:~$ lsmod | grep mpls
 mpls_iptunnel          16384  1
 mpls_router            36864  1 mpls_iptunnel
 ip_tunnel              24576  1 mpls_router
 vagrant@R1:~$ 
 vagrant@R1:~$ ip route
 default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.124 metric 1024 
 172.20.15.3  encap mpls  100 via 192.168.12.102 dev ens6 
 192.168.12.0/24 dev ens6 proto kernel scope link src 192.168.12.101 
 192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.124 
 192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.124 metric 1024 
 vagrant@R1:~$ 
 vagrant@R1:~$ ip -4 a
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet 172.20.15.1/32 scope global lo
        valid_lft forever preferred_lft forever
 2: ens5:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.121.124/24 brd 192.168.121.255 scope global dynamic ens5
        valid_lft 3204sec preferred_lft 3204sec
 3: ens6:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.12.101/24 brd 192.168.12.255 scope global ens6
        valid_lft forever preferred_lft forever
 vagrant@R1:~$ 

Now check R2 as it is our P router between R1 and R3 as per diagram. Important bit here is “ip -M route show”. This shows the MPLS routing label that is based in labels. In the standard “ip route” you dont seen any reference to MPLS.

vagrant@R2:~$ ip -4 a
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet 172.20.15.2/32 scope global lo
        valid_lft forever preferred_lft forever
 2: ens5:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.121.103/24 brd 192.168.121.255 scope global dynamic ens5
        valid_lft 2413sec preferred_lft 2413sec
 3: ens6:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.12.102/24 brd 192.168.12.255 scope global ens6
        valid_lft forever preferred_lft forever
 4: ens7:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
     inet 192.168.23.102/24 brd 192.168.23.255 scope global ens7
        valid_lft forever preferred_lft forever
 vagrant@R2:~$ ip route
 default via 192.168.121.1 dev ens5 proto dhcp src 192.168.121.103 metric 1024 
 192.168.12.0/24 dev ens6 proto kernel scope link src 192.168.12.102 
 192.168.23.0/24 dev ens7 proto kernel scope link src 192.168.23.102 
 192.168.121.0/24 dev ens5 proto kernel scope link src 192.168.121.103 
 192.168.121.1 dev ens5 proto dhcp scope link src 192.168.121.103 metric 1024 
 vagrant@R2:~$ 
 vagrant@R2:~$ lsmod | grep mpls
 mpls_router            36864  0
 ip_tunnel              24576  1 mpls_router
 vagrant@R2:~$ 
 vagrant@R2:~$ ip -M route show
 100 via inet 192.168.23.101 dev ens7 
 200 via inet 192.168.12.101 dev ens6 
 vagrant@R2:~$ 

So let’s see if pinging the loopback in R1 and R3 gets labelled traffic:

R1 to R3 (on R2)

root@R2:/home/vagrant# tcpdump -i ens6 -U -w - | tee mpls-r1tor3.pcap | tcpdump -r -
 reading from file -, link-type EN10MB (Ethernet)
 tcpdump: listening on ens6, link-type EN10MB (Ethernet), capture size 262144 bytes
 17:14:01.284942 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:03.300756 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:05.284915 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:07.183328 MPLS (label 100, exp 0, [S], ttl 64) IP 192.168.12.101 > 172.20.15.3: ICMP echo request, id 1771, seq 1, length 64
 17:14:07.300556 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:de:61:f2.8001, length 35
 17:14:08.186983 MPLS (label 100, exp 0, [S], ttl 64) IP 192.168.12.101 > 172.20.15.3: ICMP echo request, id 1771, seq 2, length 64
 17:14:09.188867 MPLS (label 100, exp 0, [S], ttl 64) IP 192.168.12.101 > 172.20.15.3: ICMP echo request, id 1771, seq 3, length 64

I can see the labelled packet from R1 to R2 with label 100 as expected, but I dont see any “echo reply”…..

But ping is successful based on R1:

vagrant@R1:~$ ping 172.20.15.3
 PING 172.20.15.3 (172.20.15.3) 56(84) bytes of data.
 64 bytes from 172.20.15.3: icmp_seq=1 ttl=63 time=0.746 ms
 64 bytes from 172.20.15.3: icmp_seq=2 ttl=63 time=1.18 ms
 64 bytes from 172.20.15.3: icmp_seq=3 ttl=63 time=1.11 ms
 64 bytes from 172.20.15.3: icmp_seq=4 ttl=63 time=0.728 ms

Something is wrong. As per pic below, with tcpdump in all interfaces, R3 is seeing the echo request from a different source (not R1).

And if I ping using R1 loopback, I can’t see anything leaving R1 ens6 interface.

vagrant@R1:~$ ping 172.20.15.3 -I lo         
 PING 172.20.15.3 (172.20.15.3) from 172.20.15.1 lo: 56(84) bytes of data.
 ^C
 --- 172.20.15.3 ping statistics ---
 25 packets transmitted, 0 received, 100% packet loss, time 576ms

Based on the original blog post, this should work. The main difference here is I am using libvirt. Need to carry on investigating

This is my IP config, 23.1 is my laptop:

9: virbr3:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr3
        valid_lft forever preferred_lft forever
 10: virbr8:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.12.1/24 brd 192.168.12.255 scope global virbr8
        valid_lft forever preferred_lft forever
 11: virbr9:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.23.1/24 brd 192.168.23.255 scope global virbr9
        valid_lft forever preferred_lft forever

NOTES:

How to scp files from vagrant box: link

$ vagrant plugin install vagrant-scp
$ vagrant scp r2:~/*.pcap .

How to ssh to a vagrant box without using “vagran ssh”: link

# save the config to a file 
vagrant ssh-config > vagrant-ssh 

# run ssh with the file
ssh -F vagrant-ssh default

# update your .gitignore for not tracking this file!!!!

How to write and read tcpdump at the same time:

# tcpdump -i ens7 -U -w - | tee mpls-r3tor1.pcap | tcpdump -r -

UPDATE:

Ok, I have tried again. I rebooted my laptop, rebuilt the VMs, etc. And now it works

9: virbr3:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.121.1/24 brd 192.168.121.255 scope global virbr3
        valid_lft forever preferred_lft forever
 10: virbr8:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.12.1/24 brd 192.168.12.255 scope global virbr8
        valid_lft forever preferred_lft forever
 11: virbr9:  mtu 1500 qdisc noqueue state UP group default qlen 1000
     inet 192.168.23.1/24 brd 192.168.23.255 scope global virbr9
        valid_lft forever preferred_lft forever
 root@athens:/boot# uname -a
 Linux athens 5.9.0-5-amd64 #1 SMP Debian 5.9.15-1 (2020-12-17) x86_64 GNU/Linux
 root@athens:/boot# 

I can see now clearly, how the ICMP request packet is encapsulated with MPLS tag 100 from R1 to R2 (ens6 interface), then the label is popped in R2, and you can see the same ICMP request leaving R2 via ens7 to R3.

Then the ICMP reply is encapsulated with MPLS tag 200 in R3 to R2 (ens7) and again, the labels is popped in R2, and you see the packet again from R2 (ens6) to R1.

So this test is successful at the end although not sure what I have been doing wrong before.

3phase-PDU

This is a very interesting blog entry from Cloudflare about PDU deployments in DCs and the theory behind. Nowadays nearly everything is cloud oriented. But we forget they still need to be deployed physically. In a old job, going dense in physical setups, requires more power and 3-phase PDUs where a must. The blog is quite good explaining the reasons to move to 3-phase and the challenges to balance the power. When you buy “colo”, you buy space and power, and that is not cheap.

Kubernetes-Docker-ASICs

This week I read that kubernetes is going to stop support for Docker soon. I was quite surprised. I am not an expert so it seems they have legit reasons. But I haven’t read anything from the other side. I think it is going to be painful so I need to try that in my lab and see how to do that migration. It has to be nice to learn that.

In the other end, I read a blog entry about ASICs from Cloudflare. I think without getting too technical it is a good one. And I learn about the different type of ASICs from Juniper. In the last years, I have only used devices powered by Broadcom ASICs. One day, I would like to try that P4/Barefoot Tofino devices. And related to this, I remember this NANOG presentation about ASICs that is really good (and fun!).

gnmi-ssl-p2

I was already playing with gNMI and protobuf a couple of months ago. But this week I received a summary from the last NANOG80 meeting and there was a presentation about it. Great job from Colin!

So I decided to give it a go as the demo was based on docker and I have already my Arista lab in cEOS and vEOS as targets.

I started my 3node-ring cEOS lab with docker-topo

ceos-testing/topology master$ docker-topo --create 3-node-simple.yml
INFO:main:Version 2 requires sudo. Restarting script with sudo
[sudo] password for xxx:
INFO:main:
alias r01='docker exec -it 3node_r01 Cli'
alias r02='docker exec -it 3node_r02 Cli'
alias r03='docker exec -it 3node_r03 Cli'
INFO:main:All devices started successfully

Checked they were up:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4160cc354ba2 ceos-lab:4.23.3M "/sbin/init systemd.ā€¦" 7 minutes ago Up 7 minutes 0.0.0.0:2002->22/tcp, 0.0.0.0:9002->443/tcp 3node_r03
122f72fb25bd ceos-lab:4.23.3M "/sbin/init systemd.ā€¦" 7 minutes ago Up 7 minutes 0.0.0.0:2001->22/tcp, 0.0.0.0:9001->443/tcp 3node_r02
68cf8ca39130 ceos-lab:4.23.3M "/sbin/init systemd.ā€¦" 7 minutes ago Up 7 minutes 0.0.0.0:2000->22/tcp, 0.0.0.0:9000->443/tcp 3node_r01

And then, check I had gnmi config in r01:

!
management api gnmi
transport grpc GRPC
port 3333
!

Need to find the IP of r01 in “3node_net-0” as the one used for management. I have had so many times hit this issue,…

$ docker inspect 3node_r01
...
"Networks": {
 "3node_net-0": {
 "IPAMConfig": null, 
 "Links": null,
 "Aliases": [ "68cf8ca39130" ],
 "NetworkID": "d3f72e7473228488f668aa3ed65b6ea94e1c5c9553f93cf0f641c3d4af644e2e", "EndpointID": "bca584040e71a826ef25b8360d92881dad407ff976eff65a38722fd36e9fc873", "Gateway": "172.20.0.1", 
"IPAddress": "172.20.0.2",
....

Now, I cloned the repo and followed the instructions/video. Copied targets.jon and updated it with my r01 device details:

~/storage/technology/gnmi-gateway release$ cat examples/gnmi-prometheus/targets.json 
{
  "request": {
    "default": {
      "subscribe": {
        "prefix": {
        },
        "subscription": [
          {
            "path": {
              "elem": [
                {
                  "name": "interfaces"
                }
              ]
            }
          }
        ]
      }
    }
  },
  "target": {
    "r01": {
      "addresses": [
        "172.20.0.2:3333"
      ],
      "credentials": {
        "username": "xxx",
        "password": "xxx"
      },
      "request": "default",
      "meta": {
        "NoTLS": "yes"
      }
    }
  }
}

Carrying out with the instructions, build docker gnmi-gateway, docker bridge and run docker gnmi-gateway built earlier.

go:1.14.6|py:3.7.3|tomas@athens:~/storage/technology/gnmi-gateway release$ docker run \
-it --rm \
-p 59100:59100 \
-v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json \
--name gnmi-gateway-01 \
--network gnmi-net \
gnmi-gateway:latest
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Initializing target r01 ([172.27.0.2:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Target r01: Connecting"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Target r01: Subscribing"}
{"level":"info","time":"2020-11-07T16:54:28Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T16:54:38Z","message":"Target r01: Disconnected"}
E1107 16:54:38.382032 1 reconnect.go:114] client.Subscribe (target "r01") failed: client "gnmi" : client "gnmi" : Dialer(172.27.0.2:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T16:54:48Z","message":"Target r01: Disconnected"}
E1107 16:54:48.935965 1 reconnect.go:114] client.Subscribe (target "r01") failed: client "gnmi" : client "gnmi" : Dialer(172.27.0.2:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s
bash-4.2# tcpdump -i any tcp port 3333 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
17:07:57.621011 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 76: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [S], seq 557316949, win 64240, options [mss 1460,sackOK,TS val 3219811744 ecr 0,nop,wscale 7], length 0
17:07:57.621069 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 76: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [S.], seq 243944609, ack 557316950, win 65160, options [mss 1460,sackOK,TS val 1828853442 ecr 3219811744,nop,wscale 7], length 0
17:07:57.621124 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621348 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 89: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [P.], seq 1:22, ack 1, win 510, options [nop,nop,TS val 1828853442 ecr 3219811744], length 21
17:07:57.621409 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [.], ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621492 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 320: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [P.], seq 1:253, ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 252
17:07:57.621509 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 68: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [.], ack 253, win 509, options [nop,nop,TS val 1828853442 ecr 3219811744], length 0
17:07:57.621586 In 02:42:7c:61:10:40 ethertype IPv4 (0x0800), length 68: 172.27.0.1.43644 > 172.27.0.2.3333: Flags [F.], seq 253, ack 22, win 502, options [nop,nop,TS val 3219811744 ecr 1828853442], length 0
17:07:57.621904 Out 02:42:ac:1b:00:02 ethertype IPv4 (0x0800), length 68: 172.27.0.2.3333 > 172.27.0.1.43644: Flags [R.], seq 22, ack 254, win 509, options [nop,nop,TS val 1828853443 ecr 3219811744], length 0

Ok, the container is created and seems running but the gnmi-gateway can’t connect to my cEOS r01….

First thing, I had to check iptables. It is not the first time that when playing with docker and building different environments (vEOS vs gnmi-gateway) with different docker commands, iptables may be not configured properly.

And it was the case again:

# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN

So I moved the new docker bridge network for gnmi-gateway after “ACCEPT” and solved.

# iptables -t filter -D DOCKER-ISOLATION-STAGE-1 -j ACCEPT
# iptables -t filter -I DOCKER-ISOLATION-STAGE-1 -j ACCEPT
#
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-43481af25965 ! -o br-43481af25965 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#

So, restarted gnmi-gateway, still same issue. Ok, I decided to check if the packets were actually hitting r01.

So at first sight, the tcp handshake is established but then there is TCP RST….

So I double checked that gnmi was runnig in my side:

r1#show management api gnmi 
Enabled:            Yes
Server:             running on port 3333, in MGMT VRF
SSL Profile:        none
QoS DSCP:           none
r1#

At that moment, I thought that was an issue in cEOS… checking logs I couldnt see any confirmation but I decided to give it a go with vEOS that is more feature rich. So I turned up my GCP lab and followed the same steps with gnmi-gateway. I updated the targets.json with the details of one of my vEOS devices. And run again:

~/gnmi/gnmi-gateway release$ sudo docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Initializing target gcp-r1 ([192.168.249.4:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Connecting"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Target gcp-r1: Subscribing"}
{"level":"info","time":"2020-11-07T19:22:20Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-07T19:22:30Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:30.048410 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 552.330144ms
{"level":"info","time":"2020-11-07T19:22:40Z","message":"Target gcp-r1: Disconnected"}
E1107 19:22:40.603141 1 reconnect.go:114] client.Subscribe (target "gcp-r1") failed: client "gnmi" : client "gnmi" : Dialer(192.168.249.4:3333, 10s): context deadline exceeded; reconnecting in 1.080381816s

Again, same issue. Let’s see from vEOS perspective.

bash-4.2# tcpdump -i any tcp port 3333 -nnn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
18:52:31.874137 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 76: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [S], seq 4076065498, win 64240, options [mss 1460,sackOK,TS val 1752943121 ecr 0,nop,wscale 7], length 0
18:52:31.874579 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 76: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [S.], seq 3922060793, ack 4076065499, win 28960, options [mss 1460,sackOK,TS val 433503 ecr 1752943121,nop,wscale 7], length 0
18:52:31.875882 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [.], ack 1, win 502, options [nop,nop,TS val 1752943123 ecr 433503], length 0
18:52:31.876284 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 320: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [P.], seq 1:253, ack 1, win 502, options [nop,nop,TS val 1752943124 ecr 433503], length 252
18:52:31.876379 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 68: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [.], ack 253, win 235, options [nop,nop,TS val 433504 ecr 1752943124], length 0
18:52:31.929448 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 89: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [P.], seq 1:22, ack 253, win 235, options [nop,nop,TS val 433517 ecr 1752943124], length 21
18:52:31.930028 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [.], ack 22, win 502, options [nop,nop,TS val 1752943178 ecr 433517], length 0
18:52:31.930090 In 1e:3d:5b:13:d8:fe ethertype IPv4 (0x0800), length 68: 10.128.0.4.56546 > 192.168.249.4.3333: Flags [F.], seq 253, ack 22, win 502, options [nop,nop,TS val 1752943178 ecr 433517], length 0
18:52:31.931603 Out 50:00:00:04:00:00 ethertype IPv4 (0x0800), length 68: 192.168.249.4.3333 > 10.128.0.4.56546: Flags [R.], seq 22, ack 254, win 235, options [nop,nop,TS val 433517 ecr 1752943178], length 0

So again in GCP, tcp is established but then TCP RST. As vEOS is my last resort, I tried to dig into that TCP connection. I downloaded a pcap to analyze with wireshark so get a better visual clue…

So, somehow, gnmi-gateway is trying to negotiate TLS!!! As per my understanding, my targets.json was configured with “NoTLS”: “yes” so that should be avoid, shouldn’t be?

At that moment, I wanted to know how to identfiy TLS/SSL packets using tcpdump as it is not always that easy to get quickly a pcap in wireshark. So I found the answer here:

bash-4.2# tcpdump -i any "tcp port 3333 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:47:01.367197 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50486 > 192.168.249.4.dec-notes: Flags [P.], seq 2715923852:2715924104, ack 2576249027, win 511, options [nop,nop,TS val 1194424180 ecr 1250876], length 252
19:47:02.405870 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50488 > 192.168.249.4.dec-notes: Flags [P.], seq 680803294:680803546, ack 3839769659, win 511, options [nop,nop,TS val 1194425218 ecr 1251136], length 252
19:47:04.139458 In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50490 > 192.168.249.4.dec-notes: Flags [P.], seq 3963338234:3963338486, ack 1760248652, win 511, options [nop,nop,TS val 1194426952 ecr 1251569], length 252

Not something easy to remember šŸ™

Ok, I wanted to be sure that gnmi was functional in vEOS and by a quick internet look up, I found this project gnmic! Great job by the author!

So I configured the tool and tested with my vEOS. And worked (without needing TLS)

~/gnmi/gnmi-gateway release$ gnmic -a 192.168.249.4:3333 -u xxx -p xxx --insecure --insecure get \
--path "/interfaces/interface[name=*]/subinterfaces/subinterface[index=*]/ipv4/addresses/address/config/ip"
Get Response:
[
{
"time": "1970-01-01T00:00:00Z",
"updates": [
{
"Path": "interfaces/interface[name=Management1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.249.4]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.249.4"
}
},
{
"Path": "interfaces/interface[name=Ethernet2]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.13.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.13.1"
}
},
{
"Path": "interfaces/interface[name=Ethernet3]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.1.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.1.1"
}
},
{
"Path": "interfaces/interface[name=Ethernet1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.12.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.12.1"
}
},
{
"Path": "interfaces/interface[name=Loopback1]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=10.0.0.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "10.0.0.1"
}
},
{
"Path": "interfaces/interface[name=Loopback2]/subinterfaces/subinterface[index=0]/ipv4/addresses/address[ip=192.168.0.1]/config/ip",
"values": {
"interfaces/interface/subinterfaces/subinterface/ipv4/addresses/address/config/ip": "192.168.0.1"
}
}
]
}
]
~/gnmi/gnmi-gateway release$

So, I kind of I was sure that my issue was configuring gnmi-gateway. I tried to troubleshoot it: removed the NoTLS, using the debugging mode, build the code, read the Go code for Target (too complex for my Goland knowledge šŸ™ )

So at the end, I gave up and opened an issue with gnmi-gateway author. And he answered super quick with the solution!!! I misunderstood the meaning of “NoTLS” šŸ™

So I followed his instructions to configure TLS in my gnmi cEOS config

security pki certificate generate self-signed r01.crt key r01.key generate rsa 2048 validity 30000 parameters common-name r01
!
management api gnmi
transport grpc GRPC
ssl profile SELFSIGNED
port 3333
!
...
!
management security
ssl profile SELFSIGNED
certificate r01.crt key r01.key
!
end

and all worked straightaway!

~/storage/technology/gnmi-gateway release$ docker run -it --rm -p 59100:59100 -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json --name gnmi-gateway-01 --network gnmi-net gnmi-gateway:latest
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Initializing target r01 ([172.20.0.2:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Connecting"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Subscribing"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Connected"}
{"level":"info","time":"2020-11-08T09:39:15Z","message":"Target r01: Synced"}
{"level":"info","time":"2020-11-08T09:39:16Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2020-11-08T09:39:45Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-08T09:40:15Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}

So I can start prometheus

~/storage/technology/gnmi-gateway release$ docker run \
-it --rm \
-p 9090:9090 \
-v $(pwd)/examples/gnmi-prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
--name prometheus-01 \
--network gnmi-net \
prom/prometheus
Unable to find image 'prom/prometheus:latest' locally
latest: Pulling from prom/prometheus
76df9210b28c: Pull complete
559be8e06c14: Pull complete
66945137dd82: Pull complete
8cbce0960be4: Pull complete
f7bd1c252a58: Pull complete
6ad12224c517: Pull complete
ee9cd36fa25a: Pull complete
d73034c1b9c3: Pull complete
b7103b774752: Pull complete
2ba5d8ece07a: Pull complete
ab11729a0297: Pull complete
1549b85a3587: Pull complete
Digest: sha256:b899dbd1b9017b9a379f76ce5b40eead01a62762c4f2057eacef945c3c22d210
Status: Downloaded newer image for prom/prometheus:latest
level=info ts=2020-11-08T09:40:26.622Z caller=main.go:315 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:353 msg="Starting Prometheus" version="(version=2.22.1, branch=HEAD, revision=00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:358 build_context="(go=go1.15.3, user=root@516b109b1732, date=20201105-14:02:25)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:359 host_details="(Linux 5.9.0-1-amd64 #1 SMP Debian 5.9.1-1 (2020-10-17) x86_64 b0fadf4a4c80 (none))"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:360 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-11-08T09:40:26.623Z caller=main.go:361 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-11-08T09:40:26.641Z caller=main.go:712 msg="Starting TSDB ā€¦"
level=info ts=2020-11-08T09:40:26.641Z caller=web.go:516 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-11-08T09:40:26.668Z caller=head.go:642 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2020-11-08T09:40:26.669Z caller=head.go:656 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=103.51Āµs
level=info ts=2020-11-08T09:40:26.669Z caller=head.go:662 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2020-11-08T09:40:26.672Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-11-08T09:40:26.672Z caller=head.go:719 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=123.684Āµs wal_replay_duration=2.164743ms total_replay_duration=3.357021ms
level=info ts=2020-11-08T09:40:26.675Z caller=main.go:732 fs_type=2fc12fc1
level=info ts=2020-11-08T09:40:26.676Z caller=main.go:735 msg="TSDB started"
level=info ts=2020-11-08T09:40:26.676Z caller=main.go:861 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-11-08T09:40:26.684Z caller=main.go:892 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=7.601103ms remote_storage=22.929Āµs web_handler=623ns query_engine=1.64Āµs scrape=5.517391ms scrape_sd=359.447Āµs notify=18.349Āµs notify_sd=3.921Āµs rules=15.744Āµs
level=info ts=2020-11-08T09:40:26.685Z caller=main.go:684 msg="Server is ready to receive web requests."

Now we can open prometheus UI and verify if we are consuming data from cEOS r01.

Yeah! it is there.

So all working at then. It has a nice experience. At the end of the day, I want to know more about gNMI/protobuffer, etc. The cold thing here is you can get telemetry and configuration management of your devices. So using gnmi-gateway (that is more for a high availability env like Netflix) and gnmic are great tools to get your head around.

Other lab I want to try is this eos-gnmi-telemetry-grafana.

The to-do list always keeps growing.

cumulus-basics

Today finally I have managed to get a very basic cumulus setup. It is annoying because I tried several months ago and found some issues with libvirt (and I opened a ticket but didnt follow up) and gave up.

Now it works. I just want to use KVM-QEMU and Vagrant, that I have already installed in my system. So based on the link, I just created a folder and copied the vagrant file. Then “vagrant up” and wait.

/cumulus/1s2l$ vagrant up
Bringing machine 'spine01' up with 'libvirt' providerā€¦
Bringing machine 'leaf01' up with 'libvirt' providerā€¦
Bringing machine 'leaf02' up with 'libvirt' providerā€¦
==> leaf01: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and installā€¦
leaf01: Box Provider: libvirt
leaf01: Box Version: 4.2.0
==> leaf01: Loading metadata for box 'CumulusCommunity/cumulus-vx'
leaf01: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
==> leaf01: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
leaf01: Downloading: https://vagrantcloud.com/CumulusCommunity/boxes/cumulus-vx/versions/4.2.0/providers/libvirt.box
Download redirected to host: d2cd9e7ca6hntp.cloudfront.net
==> leaf01: Successfully added box 'CumulusCommunity/cumulus-vx' (v4.2.0) for 'libvirt'!
==> spine01: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and installā€¦
spine01: Box Provider: libvirt
spine01: Box Version: 4.2.0
==> leaf01: Uploading base box image as volume into Libvirt storageā€¦
==> spine01: Loading metadata for box 'CumulusCommunity/cumulus-vx'
spine01: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
Progress: 0%==> spine01: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
Progress: 0%==> leaf02: Box 'CumulusCommunity/cumulus-vx' could not be found. Attempting to find and installā€¦
leaf02: Box Provider: libvirt
leaf02: Box Version: 4.2.0
Progress: 1%==> leaf02: Loading metadata for box 'CumulusCommunity/cumulus-vx'
leaf02: URL: https://vagrantcloud.com/CumulusCommunity/cumulus-vx
==> leaf02: Adding box 'CumulusCommunity/cumulus-vx' (v4.2.0) for provider: libvirt
==> leaf01: Creating image (snapshot of base box volume).
==> spine01: Creating image (snapshot of base box volume).
==> leaf02: Creating image (snapshot of base box volume).
==> leaf01: Creating domain with the following settingsā€¦
==> leaf01: -- Name: 1s2l_leaf01
==> leaf02: Creating domain with the following settingsā€¦
==> spine01: Creating domain with the following settingsā€¦
==> leaf02: -- Name: 1s2l_leaf02
==> spine01: -- Name: 1s2l_spine01
==> leaf01: -- Domain type: kvm
==> leaf02: -- Domain type: kvm
==> spine01: -- Domain type: kvm
==> leaf01: -- Cpus: 1
==> leaf02: -- Cpus: 1
==> spine01: -- Cpus: 1
==> leaf01: -- Feature: acpi
==> leaf02: -- Feature: acpi
==> spine01: -- Feature: acpi
==> leaf01: -- Feature: apic
==> leaf01: -- Feature: pae
==> leaf01: -- Memory: 768M
==> leaf02: -- Feature: apic
==> spine01: -- Feature: apic
==> spine01: -- Feature: pae
....
....

You can check the VMs are up:

/cumulus/1s2l$ vagrant status
Current machine states:
spine01 running (libvirt)
leaf01 running (libvirt)
leaf02 running (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.
/cumulus/1s2l$

And we can login and create some network interfaces as per documentation:

/cumulus/1s2l$ vagrant ssh leaf01
Linux leaf01 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4u5 (2020-07-10) x86_64
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
vagrant@leaf01:mgmt:~$ net add interface swp1,swp2,swp3
vagrant@leaf01:mgmt:~$ net commit
--- /etc/network/interfaces 2020-07-15 01:15:58.000000000 +0000
+++ /run/nclu/ifupdown2/interfaces.tmp 2020-10-31 14:12:30.826000000 +0000
@@ -5,15 +5,24 @@
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet dhcp
vrf mgmt
+auto swp1
+iface swp1
+
+auto swp2
+iface swp2
+
+auto swp3
+iface swp3
+
auto mgmt
iface mgmt
address 127.0.0.1/8
address ::1/128
vrf-table auto
net add/del commands since the last "net commit"
User Timestamp Command
------- -------------------------- --------------------------------
vagrant 2020-10-31 14:12:27.070219 net add interface swp1,swp2,swp3
vagrant@leaf01:mgmt:~$

And after configuring the interfaces in the three VMs, we have LLDP working:

/cumulus/1s2l$ vagrant ssh leaf01
Linux leaf01 4.19.0-cl-1-amd64 #1 SMP Cumulus 4.19.94-1+cl4u5 (2020-07-10) x86_64
Welcome to Cumulus VX (TM)
Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com
The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
Last login: Sat Oct 31 14:12:04 2020 from 10.255.1.1
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$ net show lldp
LocalPort Speed Mode RemoteHost RemotePort
--------- ----- ------- ---------- ----------
swp1 1G Default spine01 swp1
swp2 1G Default leaf02 swp2
swp3 1G Default leaf02 swp3
vagrant@leaf01:mgmt:~$
vagrant@leaf01:mgmt:~$ net show system
Hostnameā€¦ā€¦ā€¦ leaf01
Buildā€¦ā€¦ā€¦ā€¦ Cumulus Linux 4.2.0
Uptimeā€¦ā€¦ā€¦.. 0:06:09.180000
Modelā€¦ā€¦ā€¦ā€¦ Cumulus VX
Memoryā€¦ā€¦ā€¦.. 669MB
Vendor Nameā€¦ā€¦ Cumulus Networks
Part Numberā€¦ā€¦ 4.2.0
Base MAC Address. 52:54:00:17:87:07
Serial Numberā€¦. 52:54:00:17:87:07
Product Nameā€¦.. VX
vagrant@leaf01:mgmt:~$ exit

So I am happy because now I have something to play with and try to build an MPLS lab with cumulus. At some point I would like to try some quaga/frr lab.

I am pretty sure that in the past, I didnt have to type my password every single time I run a vagrant command….

Ok, we can shutdown the VMs and start the work for the next time:

/cumulus/1s2l$ vagrant halt spine01 leaf01 leaf02
==> leaf02: Halting domainā€¦
==> leaf01: Halting domainā€¦
==> spine01: Halting domainā€¦
/cumulus/1s2l$
/cumulus/1s2l$
/cumulus/1s2l$ vagrant status
Current machine states:
spine01 shutoff (libvirt)
leaf01 shutoff (libvirt)
leaf02 shutoff (libvirt)
This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run vagrant status NAME.
/cumulus/1s2l$