MPLS Segment Routing – Arista Lab

We have been able to create some nice MPLS labs using GNS3 and Cisco IOS. In my current employer, we use Arista so I wanted to create a lab environment with Arista kit to simulate a MPLS Segment Routing network. Keeping in mind that I try to run everything on my laptop, using GNS3 + Arista is not an option. You need to use the Arista vEOS image in GNS3 and it demands 2GB RAM per device and 1 CPU. In the past, I think I just managed to start two vEOS VMs before my laptop gave up. But Arista offers a version of EOS for containers.

So, what’s the difference between a virtual machine (VM) and a container? Well, searching the internet is going to give you many all answers. In my very simplify way:

  • VM: needs an hypervisor to simulate hardware. It uses kernel and user space. It has a full OS. So it is like simulation a whole server/pc (imagine a standalone house)
  • Container: runs in user space. Set of processes that are isolated from the rest of the system. Containers provide a way to virtualize an OS so that multiple workloads can run on a single OS instance (imagine an apartment in a building)

You just need to register in Arista web page to download a cEOS image.

Regarding MPLS Segment Routing (or SPRING for Juniper) it is an evolution of the standard MPLS, that was originally developed to improve the routing performance in core networks: avoid to make a routing look-ups per packet in core devices was very expensive in 80/90s (my very simplify way). MPLS started to being deployed around end 90s and became a defacto technology in all service providers. More info here.

Segment Routing is still based in labels, but adds improvements as it doesnt need a protocol for label exchange (one less thing to worry about). As well, it is based in “source routing” as the sources chooses the path and encodes it in the packet.

There are many sources in the internet that can explain MPLS SR better than me like all these:

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/seg_routing/configuration/xe-3s/segrt-xe-3s-book/intro-seg-routing.html

https://www.segment-routing.net/tutorials/

As we are going to use Arista, I based my learning in these presentations:

https://ripe77.ripe.net/presentations/16-20181015-SegmentRouting.pdf

https://www.netnod.se/sites/default/files/2018-03/Peter%20Lundqvist_Arista_8.pdf

And reading more Arista docs.

All the code and how to build the lab is here:

https://github.com/thomarite/ceos-testing

So what we need and what we are going to use in this lab:

  • IPv4 (yeah, I should start working in IPv6…)
  • IGP: we use ISIS
  • Label Distribution: ISIS-SR
  • BGP: using loobacks as best practices and using IGP for building a full-mesh
  • L3/2VPN: EVPN
  • All devices are PE

So let’s build the basic IP connectivity for r01:

!
hostname r01
!
interface Ethernet1
no switchport
ip address 10.0.10.1/30
!
interface Ethernet2
no switchport
ip address 10.0.12.1/30
!
interface Loopback1
description CORE Loopback
ip address 10.0.0.1/32
!
ip routing
!

Now let’s build our IGP with ISIS. We are going to use our Lo1 IP as network ID for each router. As well, we will keep it simple and define all routers as ISIS L2. We dont need anything fancy. We just want ISIS to build our iBGP peering. We will enable ISIS in the core interfaces (in this simple lab, all links and loopbacks)

!
router isis CORE
net 49.0000.0001.0010.0000.0000.0001.00  <-- BASED IN Lo1 !!!
is-type level-2
log-adjacency-changes
set-overload-bit on-startup wait-for-bgp timeout 180
!
interface Ethernet1
no switchport
ip address 10.0.10.1/30
isis enable CORE
isis metric 40
isis network point-to-point
!
interface Ethernet2
no switchport
ip address 10.0.12.1/30
isis enable CORE
isis metric 50
isis network point-to-point
!
interface Loopback1
description CORE Loopback
ip address 10.0.0.1/32
isis enable CORE
isis metric 1
!

It is seems there is a bug in the cEOS I am using as “show isis neighbors” fails but the routing is actually correct. Let’s see from r22:

r22#show ip route
VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked
Gateway of last resort is not set
I L2 10.0.0.1/32 [115/131] via 10.0.10.9, Ethernet1
I L2 10.0.0.2/32 [115/91] via 10.0.10.9, Ethernet1
I L2 10.0.0.3/32 [115/91] via 10.0.23.1, Ethernet2
I L2 10.0.0.4/32 [115/51] via 10.0.23.1, Ethernet2
I L2 10.0.0.5/32 [115/41] via 10.0.10.9, Ethernet1
C 10.0.0.6/32 is directly connected, Loopback1
I L2 10.0.10.0/30 [115/130] via 10.0.10.9, Ethernet1
I L2 10.0.10.4/30 [115/90] via 10.0.23.1, Ethernet2
C 10.0.10.8/30 is directly connected, Ethernet1
I L2 10.0.12.0/30 [115/140] via 10.0.23.1, Ethernet2
I L2 10.0.13.0/30 [115/90] via 10.0.10.9, Ethernet1
C 10.0.23.0/30 is directly connected, Ethernet2
r22#
r22# show logging
...
Log Buffer:
May 24 16:18:22 r22 SuperServer: %SYS-5-SYSTEM_RESTARTED: System restarted
May 24 16:24:29 r22 ConfigAgent: %SYS-5-CONFIG_E: Enter configuration mode from console by root on vty4 (UnknownIpAddr)
May 24 16:24:29 r22 ConfigAgent: %SYS-5-CONFIG_I: Configured from console by root on vty4 (UnknownIpAddr)
May 24 16:24:29 r22 ConfigAgent: %SYS-5-CONFIG_STARTUP: Startup config saved from system:/running-config by root on vty4 (UnknownIpAddr).
May 24 16:24:39 r22 Isis: %ISIS-4-ISIS_ADJCHG: L2 Neighbor State Change for SystemID 0000.0000.0004 on eth2 to UP
May 24 16:24:42 r22 Isis: %ISIS-4-ISIS_ADJCHG: L2 Neighbor State Change for SystemID 0000.0000.0005 on eth1 to UP
May 24 16:26:34 r22 ConfigAgent: %SYS-5-CONFIG_STARTUP: Startup config saved from system:/running-config by root on vty4 (UnknownIpAddr).
r22#
r22#show isis neighbors
% Internal error
% To see the details of this error, run the command 'show error 2'

Let’s build BGP, from r01 is like this:

!
router bgp 100
router-id 10.0.0.1
graceful-restart restart-time 300
graceful-restart
maximum-paths 2
neighbor AS100-CORE peer group
neighbor AS100-CORE remote-as 100
neighbor AS100-CORE next-hop-self
neighbor AS100-CORE update-source Loopback1
neighbor AS100-CORE timers 2 6
neighbor AS100-CORE additional-paths receive
neighbor AS100-CORE additional-paths send any
neighbor AS100-CORE password 7 Nmg+xbfVkywN7BBIllK5yw==
neighbor AS100-CORE send-community standard extended
neighbor AS100-CORE maximum-routes 0
neighbor 10.0.0.2 peer group AS100-CORE
neighbor 10.0.0.2 description R02
neighbor 10.0.0.3 peer group AS100-CORE
neighbor 10.0.0.3 description R11
neighbor 10.0.0.4 peer group AS100-CORE
neighbor 10.0.0.4 description R12
neighbor 10.0.0.5 peer group AS100-CORE
neighbor 10.0.0.5 description R21
neighbor 10.0.0.6 peer group AS100-CORE
neighbor 10.0.0.6 description R22
!

So once we have configured BGP in all routers, we should see a full mesh between all routers. This is from r22:

r22#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.0.0.6, local AS number 100
Neighbor Status Codes: m - Under maintenance
Description Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
R01 10.0.0.1 4 100 7 7 0 0 00:00:05 Estab 0 0
R02 10.0.0.2 4 100 7 7 0 0 00:00:05 Estab 0 0
R11 10.0.0.3 4 100 7 7 0 0 00:00:05 Estab 0 0
R12 10.0.0.4 4 100 6 7 0 0 00:00:04 Estab 0 0
R21 10.0.0.5 4 100 6 7 0 0 00:00:04 Estab 0 0
r22#

Now, enable MPLS and SR extension in ISIS:

!
mpls ip
!
mpls label range isis-sr 800000 65536
!
router isis CORE
  segment-routing mpls
    router-id 10.0.0.1  <-- based on Lo1 in each router
    no shutdown
!
interface Loopback1
  description CORE Loopback
  node-segment ipv4 index 1  <-- this has to be different in each node!!!
!

And you should see 5 ISIS-SR tunnels from each router. From r22:

r22#show isis segment-routing tunnel
Index Endpoint Nexthop Interface Labels TI-LFA
tunnel index

1 10.0.0.2/32 10.0.10.9 Ethernet1 [ 800002 ] -
2 10.0.0.3/32 10.0.23.1 Ethernet2 [ 800003 ] -
3 10.0.0.4/32 10.0.23.1 Ethernet2 [ 3 ] -
4 10.0.0.5/32 10.0.10.9 Ethernet1 [ 3 ] -
5 10.0.0.1/32 10.0.10.9 Ethernet1 [ 800001 ] -
r22#

As you can see above, the labels are based on the base index (800000) defined in the “mpls label range” command and the “node-segment index” defined in the loopback interface. So the label that identifies uniquely r01 is 800000 + 1 = 800001. The label “3” means you are a Penultime-Hop-P router and you remove the label to save a label look-up in the egress router.

Now, let’s configure EVPN for L2/L3VPN deployment in our MPLS network. From r01 should be:

!
service routing protocols model multi-agent --> you will have to reboot
!
router bgp 100
!
address-family evpn
neighbor default encapsulation mpls next-hop-self source-interface Loopback1
neighbor 10.0.0.2 activate
neighbor 10.0.0.3 activate
neighbor 10.0.0.4 activate
neighbor 10.0.0.5 activate
neighbor 10.0.0.6 activate
!

So once this is configured in all routers, we should see again a full mesh of EVPN BGP peers. From r12 this time:

r12#show bgp evpn summary
BGP summary information for VRF default
Router identifier 10.0.0.4, local AS number 100
Neighbor Status Codes: m - Under maintenance
Description Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
R01 10.0.0.1 4 100 1254 1251 0 0 00:03:27 Estab 1 1
R02 10.0.0.2 4 100 1111 1107 0 0 00:03:27 Estab 1 1
R11 10.0.0.3 4 100 961 962 0 0 00:03:27 Estab 1 1
R21 10.0.0.5 4 100 884 888 0 0 00:03:27 Estab 1 1
R22 10.0.0.6 4 100 814 811 0 0 00:03:27 Estab 1 1
r12#

Now, let’s create a L3VPN with CUST-A vrf. We define it in all routers. For r01 should be:

!
vrf instance CUST-A
rd 100:1
!
interface Loopback2
vrf CUST-A
ip address 192.168.0.1/32   <-- each device has a unique one
!
ip routing vrf CUST-A
!
router bgp 100
!
vrf CUST-A
rd 100:1
route-target import evpn 100:1
route-target export evpn 100:1
network 192.168.0.1/32

Let’s see if the routing works from r12

r12#
r12#show bgp evpn
BGP routing table information for VRF default
Router identifier 10.0.0.4, local AS number 100
Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup
% - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
Network Next Hop Metric LocPref Weight Path 
RD: 100:1 ip-prefix 192.168.0.1/32 10.0.0.1 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.2/32 10.0.0.2 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.3/32 10.0.0.3 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.5/32 10.0.0.5 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.6/32 10.0.0.6 - 100 0 i
r12#
r12#show ip route vrf CUST-A
VRF: CUST-A
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked
Gateway of last resort is not set
B I 192.168.0.1/32 [200/0] via 10.0.0.1/32, IS-IS SR tunnel index 5, label 116384
via 10.0.10.5, Ethernet1, label 800001
B I 192.168.0.2/32 [200/0] via 10.0.0.2/32, IS-IS SR tunnel index 2, label 116384
via 10.0.10.5, Ethernet1, label 800002
B I 192.168.0.3/32 [200/0] via 10.0.0.3/32, IS-IS SR tunnel index 3, label 100000
via 10.0.10.5, Ethernet1, label imp-null(3)
C 192.168.0.4/32 is directly connected, Loopback2
B I 192.168.0.5/32 [200/0] via 10.0.0.5/32, IS-IS SR tunnel index 4, label 116384
via 10.0.23.2, Ethernet2, label 800005
B I 192.168.0.6/32 [200/0] via 10.0.0.6/32, IS-IS SR tunnel index 1, label 116384
via 10.0.23.2, Ethernet2, label imp-null(3)
r12#

So, all looks good. EVPN table shows all the prefixes for rd 100:1 and the routing table for CUST-A shows all Lo2 defined in each router.

BTW, I am not able to ping inside the VRF, I think it is something related to the broadcast of ARP:

UPDATE: Arista confirms that cEOS-lab doesn’t support MPLS dataplane. I need to use vEOS (vagrant). So that means I dont think my laptop has enough resources to build this lab in vEOS πŸ™

r01#ping vrf CUST-A ip 192.168.0.6 interface loopback 2
PING 192.168.0.6 (192.168.0.6) from 192.168.0.1 lo2: 72(100) bytes of data.
--- 192.168.0.6 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 40ms
r01#

-- from other session in r01 --

r01#bash
bash-4.2# ip netns exec ns-CUST-A tcpdump -i lo2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo2, link-type EN10MB (Ethernet), capture size 262144 bytes
^C12:46:03.324918 02:00:00:00:00:00 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.6 tell 192.168.0.1, length 28
12:46:04.348750 02:00:00:00:00:00 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.6 tell 192.168.0.1, length 28
12:46:05.376723 02:00:00:00:00:00 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.6 tell 192.168.0.1, length 28
3 packets captured
3 packets received by filter
0 packets dropped by kernel
bash-4.2#

-- from other session in r22, we dont see anything --

r22#bash
bash-4.2# ip netns exec ns-CUST-A tcpdump -i lo2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo2, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
bash-4.2#

New Approach for Datacenter Networks and Stacks for Low Latency

In a irc channel this week, one guy posted a link about visualization latency in a data center switching network .

And it was really good video for understanding how congestion happens inside the switch infrastructure and a very original idea to overcome this problem!

I tried to get a bit more info about the video and ended in the page of that paper:

http://www.cs.ucl.ac.uk/news/article/sigcomm_best_paper_award_for_mark_handley/

And see if there was any implementation:

https://github.com/nets-cs-pub-ro/NDP/wiki

I am not a researcher but the idea is quite original and it seems you dont need to re-invent the wheel. In the github repo even there is an example in P4. P4 is going to be big, and Barefoot has already commercial solutions about it with their tofino chip. Let’s see what Intel does with it…

Based on a continuation paper, it seems there is no much traction from the big cloud providers, and it surprises me, these guys have the muscle to make this kind of things. I always heard that hardware is very expensive to built and software is not. So there are few player willing to invest in new ideas. Everytime you hear about unicorn companies, nearly all of them are software companies.

And another paper says it needs more tuning/debugging.

I don’t know if it will successful in the future but I think it was interesting watching the video and reading about the concept.

Lemon Polenta Cake

I had a couple of lemon in the fridge so I wanted to used them in a cake. I had a recipe that I wanted to try for some time so this was my chance.

Ingredients:

  • 250g ground almonds
  • 100g fine polenta / cornmeal
  • 50g coarse polenta
  • 1 teaspoon baking powder
  • 250g (good) butter + a bit for greasing
  • zest and juice of 3 (good) lemons (medium size)
  • 4 free range / organic medium eggs

Process

  • Preheat oven at 180C. Grease a 20cm round cake tin
  • Mix the almonds, both kinds of polenta and the baking powder together in a bowl and put aside.
  • Put the butter, lemon zest and sugar into another bowl and cream together. I used a wooden spoon. Get ready to sweat a bit. Once everything is mixed, start adding an egg at each time, mix well, and add another.
  • Add the initial bow with the polenta and almond to the mixture. Mix well until everything is combined. Add the lemon juice. Mix well.
  • Poor everything on the tin and smooth the top. Bake for 50-55 minutes. Check the top is brown.

It is a quick, moist and easy cake. And it is tasty!

First step into OpenBSD

This week Job Snijders advertised the latest version of openbsd. I have been always a dreamer of being a hacker (like the movies) and the best guys when I was in Uni were Linux users. I had no idea what Linux/Unix/BSD was at that time. At the end (by the 4th year in Uni) I managed to install Linux in my windows PC without destroying anything. And fortunately, I have been using it since then. Learning more and still fortunately, in the last 6 years, using it everyday at work too.

Still very very far away from being a hacker though πŸ™‚

In this time, I have read a bit about the BSD vs Linux threads about licenses, security, etc. And actually I was always keen to learn a bit. In Motorola, I had to use Solaris (even managed to get a certification!).

So this week, I tried to setup a VM in my debian laptop for using OpenBSD 6.7

I found and followed this link, so all credits for the author.

First I downloaded openbsd 6.7 (install67.iso) from here. There are many mirrors around the world. Prepare the file:

/var/lib/libvirt/images# ls -ltr
total 1386064
-rwxr--r-- 1 libvirt-qemu libvirt-qemu 996671488 Apr 6 2018 debian-VAGRANTSLASH-stretch64_vagrant_box_image_9.1.0.img
-rwxr--r-- 1 libvirt-qemu libvirt-qemu 950796288 Nov 29 23:17 centos-VAGRANTSLASH-7_vagrant_box_image_1905.1.img
-rw-r--r-- 1 ss ss 470118400 May 21 23:23 install67.iso
root@athens:/var/lib/libvirt/images# chown libvirt-qemu.libvirt-qemu install67.iso
/var/lib/libvirt/images# mv install67.iso openbsd67.iso

Now start the installation:

/var/lib/libvirt/images# virt-install \
--name=openbsd \
--virt-type=kvm \
--memory=2048,maxmemory=4096 \
--vcpus=2,maxvcpus=2 \
--cpu host \
--os-variant=openbsd5.8 \
--cdrom=/var/lib/libvirt/images/openbsd67.iso \
--network=bridge=virbr0,model=virtio \
--graphics=vnc \
--disk path=/var/lib/libvirt/images/openbsd67.qcow2,size=40,bus=virtio,format=qcow2

Starting install…
Allocating 'openbsd67.qcow2' | 40 GB 00:00:01

Something that confused my was that I was installing openbsd6.7 but the os-variant in the command must be obenbsd5.8. Anything else, fails.

In my setup, I have virt-viewer installed so it opened up and finished the installation using that.

I was surprised how quick was everything and didnt find any problem:

Once I logged in, I felt useless πŸ™‚ I used a bit the shell and tested I could ssh from my host pc to the openbsd vm.

So now, I can find a book of openbsd for dummies and get going!

So close virt-viewer and stop the VM:

/var/lib/libvirt/images# virsh
virsh # list
Id Name State
2 openbsd running
virsh #
virsh #
virsh # destroy openbsd
Domain openbsd destroyed
virsh # list
Id Name State
virsh #
virsh # list --all
Id Name State
openbsd shut off
virsh #

Test we can start up again:

# virsh
Welcome to virsh, the virtualisation interactive terminal.
Type: 'help' for help with commands
'quit' to quit
virsh # list --all
Id Name State
openbsd shut off
virsh # start openbsd
Domain openbsd started
virsh # list --all
Id Name State
3 openbsd running
virsh # exit
#
# virt-viewer

Deep Work

I have just finished “Deep Work” from Cal Newport. For a long time I have believed that multitasking is the best thing to be productive but with the years passing by I realised that like a computer, context switching is very expensive on me. You can’t really concentrate in some demanding thing and then try to be on top of small things and interrupted by everybody. I am happy that I am not use social media but still at work I am easily distracted by people demands, emails, etc.

And I am pretty sure that It is not just me. Most people suffer this. And to be honest, I want to improve, I want to make a more meaningful job with my time. And life (like Winifred Gallagher) Cal’s examples (himself and others) are really good. I liked quite a lot the one regarding Daniel Kilov and how to memorize a deck of cards. I think this is a good exercise to execute deep concentration in small chunks of time, that is actually the most probably outcome in (most people) normal day.

You can do it. But you need to work hard for it. The society, working environment and yourself are not going to make it easy.

I think with the lockdown period, it is a good moment to put these techniques in practice.

I need to pay less attention to the emails and slack. I dont have to be the quickest answering something… (that is so good for your ego….) I need to really prioritize my working hours and tasks to focus on with a time frame. I need to make myself accountable, stop blaming somebody else. And communicate, make my peers that I will focus in things I will not answer immediately (if you are my CTO, maybe not πŸ™‚ And as a manager, make my team members better: make them to take more ownership so they can deal with problems by themselves. And schedule times of the day to check emails and/or attend meetings.

One thing I have done, it is to put a pink sticker close to my screen saying “ONE THING AT EACH TIME”. I did this before reading the book as a reminder from a speech at work of a brilliant guy in his last day. And he said he learned that sentence from our CTO. That got burned in my mind. I have used it mainly from troubleshooting. It has been a critical tool that I have applied successfully many times since then. But I can be applied to more things as “Deep Work”

All very nice words. Let’s make it happen.

Okinawa

I have remembering my Karate for nearly the last two months. It has been a quite satisfying choice and has brought some good and old feelings.

As I have been adding katas to my set, I wanted to write and find some info in the web about the origin of Karate (and see if it matches my memory) and the main kata stiles I learned in my time. Not sure if it is still the same though.

I think for my black belt exam, apart from performing some katas in front of my teacher, I had to answer some questions about Karate history.

I dont know why, I can still remember bodhidharma, and Indian Buddhist monk, as the person considered for starting martial arts in Asia. Then it spread to southern China and then to Okinawa. There, it developed while some King forbade weapons so people needed other ways for self-defence. And finally, get to Japan after some conquering. Yeah, very short summary. Surely a better version here.

From my time, we had three stiles of “superior” katas: shuri-te, naha-te and tomari-te. I can’t forget the big picture of Gichin Funakoshi (Father of modern karate) in our tatami.

We used to consider naha-te katas the ones with a lot of “breathing” and “slow moves”. Katas with short and quick moves, were tomari-te katas.

And by chance, I was lucky to visit Okinawa (just for a weekend) when I was working in a project in Tokyo. It was a dream come true. Although I was expecting some spirituality there, I was lucky to attend a very important festival and laugh when some towns in the map where actually kata names!

Definitely, it is a very different place compared with Japan main islands.

Kaizen

I have finished reading this book about Kaizen. Many years ago I heard the term Kaizen for the superior productivity in Japan, mainly from Toyota as the world’s number one car producer. Somehow, I bought this used copy to learn about it.

First, I noticed the book was printed in 1986… I realised I rarely read “technical” stuff so “old”.

The first surprise was that it seems the concept of quality control was actually brought by USA to post WWII Japan. The two main people were W E Deming and J M Juran.

It is interesting that Japan was very eager to learn the productivity secrets from USA and at the end, they created their on version.

I like the focus in people. They need to be engaged and feel part of something. At the end of the day, everybody has to push together to get to great results. As well, it seems key the achievement of small changes and not massive ones. They set for long-term goals, mainly for customer satisfaction that is not just the person who buys the product. It is not all about profits. It seems the profits will come as a by-product (reduce cost, increase customer satisfaction, more sells -> more profits). So the vision is product-oriented instead of result-oriented.

The point that “if you dont have problems, how you can improve?!” is so true.

And one slogan to measure how good is your product: “would you buy what you are producing?”

I can see many concepts are already in place in technology. The “Kambam” board, the constant search for small improvements, etc. If you think devops culture is something really modern, doesn’t look like that.

In general, the approach is quite different from the Western world and has been successful. But the book mentions that you need the innovation side for keep improving. So again, as life, you need balance.

And at the end, Kaizen becomes like a way of life. Or it is like I see it.

I am curious how the author would see Kaizen and Japan nowadays.

SRv6

This year, in my employer, I completed the migration to a MPLS SR Arista core network from a Brocade MPLS LDP one. Our backbone is still pure IPv4 so anything IPv6 is not going to be added. But this week, via an APNIC blog post I read about SRv6. And it looks quite interesting. So I went to the first post to go a bit deeper about what SRv6 is. Based on the statements of the blog, really big networks are already using this technology and quite a lot of support from the open source community too. I missed Arista in that list though.

So I tried to find some “real” proof of this SRv6 is some pcap files to see the format and get a bit better view. I could find at lest a source with some. The examples are not like the ones mentioned in the APNIC blog post but just for taking a look, it is enough:

So I can see inside the IPv6 header, the SRv6 Header as defined in the rfc.

I dont really understand the second IPv6 header (Dst: b::2). From the first IPv6 header, the destination “f1::” has to be the first instruction SID1. I can see how it mentions it contains a SRH (Next Header: 43). And inside the routing header, we can see it is SR type (Type: 4). I assume that Address[0] and Address[1] are SID2 and SID3.

Would be cool to lab a SRv6 scenario.

MS Paint

I think the only thing I miss from Microsoft Windows, it is “paint”. I can’t find anything in Linux that is simple and reliable. I have used pinta, gimp and others but they break or they are too pro.

So, if you have internet connection. This is your friend

And if you want something a bit more pro like Visio, then this one

I recommend both a lot!

I know you have GIMP, but I just a simple tool to draw red squares!

Troubleshooting a DCHP Relay connection

Today I have had “fun” troubleshooting an issue that looked easy at first sight. A colleague was trying to PXE boot some server from a network that we haven’t used for a while.

When the server boots up, asks for an IP via DHCP. As we have a centralized DHCP server infrastructure, we have configured DHCP relay in the firewall facing that server to send that request to the DHCP server.

First, let’s take a look at how DHCP relay works. This is a very good link. And this diagram from the mentioned link it is really useful:

One think I learned is the reply (DCHP Offer) doesnt have to use as destination IP the same IP it received as source in DHCP Discover. In the picture, it is packet 2a.

Checking in our environment, we confirm that:

Our server is in 10.94.240.x network. Our firewall is acting as DHCP relay, and send the DHCP Discovery (unicast) to our VIP DHCP Server IP.

The DHCP offer, uses as source the physical IP of the DHCP server and destination is the DHCP relay IP (so it is 10.94.240.1 – the firewall IP in 10.94.240.x network)

Ok, so everything looks fine? No really. The server receives the query, it answers… but we dont see a DCHP Request/ACK.

BTW, keep in mind that DHCP is UDP….

So, we need to see where the packets are lost.

This is a high level path flow between the client and server:

So we need to check this connection is three different firewall vendors….

The initial troubleshooting was just using the GUI tools from Palo/Fortigate. We couldn see anything…. but the server was constantly receiving DHCP Discover and sending DHCP Offer… I dont get it:

# tcpdump -i X udp port 67 or 66 -nn

14:58:06.969462 IP 10.81.25.1.67 > 10.81.251.47.67: BOOTP/DHCP, Request from 6c:2b:59:c1:32:73, length 300
14:58:06.969564 IP 10.81.251.201.67 > 10.94.240.1.67: BOOTP/DHCP, Reply, length 300

14:58:28.329048 IP 10.81.25.1.67 > 10.81.251.47.67: BOOTP/DHCP, Request from 6c:2b:59:c1:32:73, length 300
14:58:28.329157 IP 10.81.251.201.67 > 10.94.240.1.67: BOOTP/DHCP, Reply, length 300

Initially it took me a while to see the request/reply because I was assuming the dhcp request had source 10.94.240.1. So I was seeing only the Reply but not the Request. That was when I went to clarify my head about DHCP Relay and found the link.

So ok, we have the DHCP Request/Reply, but absolutely nothing in the Palo. Is the palo dropping the packets or is forwarding? No idea. The GUI says nothing, I took a packet capture and couldnt see that traffic neither…

Doesnt makes sense.

Let’s get back to basic.

Did I mention DHCP is UDP? So how a next generation firewall (like paloalto) with all the fancy features enable (we have nearly all of them enable…) treats a UDP connection? UDP is stateless… but the firewall is statefull… the firewall creates a flow with the first packet so it can track, any new packet is considered part of that flow. But why we dont see the flows? We actually have only one flow. The firewall has created that session and offloaded to hardware. So you dont see anything else in the control-plane / GUI. The GUI only shows the end of a connection/flow. And as our flow DHCP Relay hasnt’ terminated (it is UDP) and the firewall keeps receiving packets, it is considered life (the firewall doesnt really know what is going on). So for that reason we dont see the connection in the PaloUI. Ok, I got to that point after a while…. I need to proof that the packet from the server is reaching the firewall and it is leaving it too.

How can I do that? Well, I need to delete that flow so the firewall considers a new connection and the tcpdump can see the packets.

This is the a good link from paloalto to take captures. So I found my connection and the cleared it:

palo(active)> show session all filter destination 10.94.240.1

ID Application State Type Flag Src[Sport]/Zone/Proto (translated IP[Port])
Vsys Dst[Dport]/Zone (translated IP[Port])
135493 dhcp ACTIVE FLOW 10.81.251.201[67]/ZONE1/17 (10.81.251.201[67])
vsys1 10.94.240.1[67]/ZONE2 (10.94.240.1[67])
palo(active)>
palo(active)> clear session id 135493

And now, my packet capture in paloalto confirms that it is sending the packet to the next firewall (checking the destination MAC) !!!

Ok, so we confirm the first firewall in the return path was fine…. next one, it is fortigate.

BTW, we were checked and assumed that the routing is fine in all routers, firewalls, etc. Sometimes is not the case… so when things dont follow your thoughts, get back to the very basics….

We have exactly the same issue as in PaloAlto. I can’t see anything in the logs about receiving a dhcp offer from palo and forwarding it to the last firewall Cisco.

And again, we apply the same reasoning. We have an UDP connection, we have a next-generation firewall (with fancy ASIC). And one more thing, in this fortigate firewall, we allow intra-zone traffic, so it is not going to show anyway in the GUI monitor…

So we confirm that we have a flow and cleared it

forti # diag debug flow filter
vf: any
proto: any
Host addr: any
Host saddr: any
host daddr: 10.94.240.1-10.94.240.1
port: any
sport: any
dport: any
co1fw02 #
co1fw02 # diag sys session list
session info: proto=17 proto_state=00 duration=2243 expire=170 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=5
origin-shaper=
reply-shaper=
per_ip_shaper=
class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=8/8
state=may_dirty npu synced
statistic(bytes/packets/allow_err): org=86840/254/1 reply=0/0/0 tuples=2
tx speed(Bps/kbps): 36/0 rx speed(Bps/kbps): 0/0
orgin->sink: org pre->post, reply pre->post dev=39->35/35->39 gwy=10.81.25.1/0.0.0.0
hook=pre dir=org act=noop 10.81.251.201:67->10.94.240.1:67(0.0.0.0:0)
hook=post dir=reply act=noop 10.94.240.1:67->10.81.251.201:67(0.0.0.0:0)
misc=0 policy_id=4294967295 auth_info=0 chk_client_info=0 vd=0
serial=141b05fb tos=ff/ff app_list=0 app=0 url_cat=0
rpdb_link_id = 00000000
dd_type=0 dd_mode=0
npu_state=0x001000
npu info: flag=0x81/0x00, offload=6/0, ips_offload=0/0, epid=8/0, ipid=8/0, vlan=0x00f5/0x0000
vlifid=0/0, vtag_in=0x0000/0x0000 in_npu=0/0, out_npu=0/0, fwd_en=0/0, qid=0/0
no_ofld_reason:
total session 1
forti #
forti # diag sys session clear

In other session, I have a packet capture in the expected egress interface:

forti # diagnose sniffer packet Zone3 'host 10.94.240.1'
interfaces=[Zone3]
filters=[host 10.94.240.1]
301.555231 10.81.251.201.67 -> 10.94.240.1.67: udp 300
316.545677 10.81.251.201.67 -> 10.94.240.1.67: udp 300

Fantastic, we have confirmation that the second firewall receives and forwards the DHCP Reply!!!

Ok, now the last stop, Cisco ASA. This is an old firewall, I think it could be my father or Darth Vader.

I dont have the fancy tools for packet capture like Palo/Fortigate…. so I went to the basic “debug” commands and “packet-tracer”.

First, this was the dhcp config in Cisco:

vader/pri/act# show run | i dhcp
dhcprelay server 10.81.251.47 EGRESS
dhcprelay enable SERVERS-ZONE
dhcprelay timeout 60

And, the ACL allows all IP traffic in those interfaces… and couldnt see any deny in the logs.

So, I enabled all debugging things I could find for dhcp:

vader/pri/act# show debug
debug dhcpc detail enabled at level 1
debug dhcpc error enabled at level 1
debug dhcpc packet enabled at level 1
debug dhcpd packet enabled at level 1
debug dhcpd event enabled at level 1
debug dhcpd ddns enabled at level 1
debug dhcprelay error enabled at level 1
debug dhcprelay packet enabled at level 1
debug dhcprelay event enabled at level 200
vader/pri/act# DHCPD: Relay msg received, fip=ANY, fport=0 on SERVERS-ZONE interface
DHCPRA: relay binding found for client f48e.38c7.1b6e.
DHCPD: setting giaddr to 10.94.240.1.
dhcpd_forward_request: request from f48e.38c7.1b6e forwarded to 10.81.251.47.
DHCPD: Relay msg received, fip=ANY, fport=0 on SERVERS-ZONE interface
DHCPRA: relay binding found for client 6c2b.59c1.3273.
DHCPD: setting giaddr to 10.94.240.1.
dhcpd_forward_request: request from 6c2b.59c1.3273 forwarded to 10.81.251.47.
vader/pri/act#

So, the debugging doesnt says anything regarding the packet coming back from Fortigate… Not looking good I am afraid. I wasnt running out of ideas about debug commands. I coudn’t increase an log level neither….

Let’s give a go to packet tracer… doesnt looks good:

vader/pri/act# packet-tracer input EGRESS udp 10.81.251.201 67 10.94.240.1 67
Phase: 1
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
MAC Access list
Phase: 2
Type: ACCESS-LIST
Subtype:
Result: DROP
Config:
Implicit Rule
Additional Information:
Result:
input-interface: EGRESS
input-status: up
input-line-status: up
Action: drop
Drop-reason: (acl-drop) Flow is denied by configured rule

So, we are sure our ACL is totally open but the firewall is dropping the packet coming from fortigate. Why? How to fix it?

Ok, get back to basics. Focus in Cisco config. It uses as DHCP relay server, 10.81.251.47 (VIP). But the DHCP reply is coming from the physical IP 10.81.251.201….. maybe Cisco doesnt like that…. Let’s try to add the physical IPs as a new DHCP server:

vader/pri/act# sri dhcp
dhcprelay server 10.81.251.47 EGRESS
dhcprelay server 10.81.251.201 EGRESS
dhcprelay server 10.81.251.202 EGRESS

Let’s check packet tracer again:

vader/pri/act# packet-tracer input EGRESS udp 10.81.251.201 67 10.94.240.1 67
Phase: 1
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
MAC Access list
Phase: 2
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
Phase: 3
Type: IP-OPTIONS
Subtype:
Result: ALLOW
Config:
Additional Information:
Phase: 4
Type:
Subtype:
Result: ALLOW
Config:
Additional Information:
Phase: 5
Type:
Subtype:
Result: ALLOW
Config:
Additional Information:
Phase: 6
Type: VPN
Subtype: ipsec-tunnel-flow
Result: ALLOW
Config:
Additional Information:
Phase: 7
Type: FLOW-CREATION
Subtype:
Result: ALLOW
Config:
Additional Information:
New flow created with id 340328245, packet dispatched to next module
Result:
input-interface: EGRESS
input-status: up
input-line-status: up
Action: allow
vader/pri/act#

Good, that’s a good sign finally!!!

I think I nearly cried after seeing this in the dhcp logs in our server:

May 12 16:16:27 dhcp1 dhcpd[2561]: DHCPDISCOVER from f4:8e:38:c7:1b:6e via 10.94.240.1
May 12 16:16:28 dhcp1 dhcpd[2561]: DHCPOFFER on 10.94.240.50 to f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 0 class decls to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 0 deleted host decls to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 0 new dynamic host decls to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 1 leases to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: DHCPREQUEST for 10.94.240.50 (10.81.251.202) from f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1
May 12 16:16:28 dhcp1 dhcpd[2561]: DHCPACK on 10.94.240.50 to f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1

So at the end, finally fixed…. it took too many hours.

Notes:

  • DHCP Realy: It is not that obvious the flow regarding IPs.
  • UDP and firewalls, debugging it is a bit more challenging.
  • Cisco ASA dhcprelay server IPs…. VIPs and non-VIPs please.

All this would be easier/quicker with TCP πŸ˜›