BGP-Free Core

This week I have been following a discussion in NANOG about LDPv6 (there are lot of emails but it is VERY interesting) and I realized that I didnt recognize the term “BGP-Free Core”. So I searched about it. It seems it wasnt an obscure subject and funny enough I have used that design in my MPLS labs in GNS3… So what is BGP-Free core? These are the links I read:

https://blog.ipspace.net/2012/01/bgp-free-service-provider-core-in.html

And this is my favourite.

As in my basic MPLS lab, we only use BGP between PEs, and the P router only does IGP and LDP, it doesnt have to know anything about VRFs.

So for that reason, you need to increase the MTU in your links (4bytes per MPLS label) and link usage increases for the extra overhead.

So it is important to know stuff but as well how to name that stuff πŸ˜›

Docker MTU + Docker tcpdump

I am troubleshooting an issue in a docker setup with some Arista cEOS where I can’t ping inside a VRF. First I though it was a MTU issue as when you use MPLS, there is an extra tag in the L2 frame.

…But my pings weren’t that big.

Still wanted to increase the MTU because that’s the expected thing to do in your WAN links if you run MPLS and want your users in different VRFs to be able to use the full 1500 bytes.

After some searching, It seems you can change the default value using the config file as per this link:

$ ip link show docker0
9: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:be:73:8c:d3 brd ff:ff:ff:ff:ff:ff
$ cat /etc/docker/daemon.json
{
"data-root": "/home/somebody/storage/docker",
"mtu": 1600
}
$ sudo service docker restart
..
$ ip link show docker0
9: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:fb:c0:cf:a2 brd ff:ff:ff:ff:ff:ff

And restart docker. But still had mtu 1500. Checking another link it seems I actually need to create a container so the bridge come up with the new value

$ docker run -d busybox top
...
9: docker0: mtu 1600 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:fb:c0:cf:a2 brd ff:ff:ff:ff:ff:ff

Funny thing, once I started my lab again (using docker-topo) still got MTU 1500!!!

Will have to dig a bit why docker-topo doesnt take the docker mtu 1600 from the config file.

Solution: docker-topo is creating user-defined bridges, so it needs to be told that the mtu is different. The “mtu:1600” in the docker config it is only for the default bridge so when you start the busybox, it is attached to the default bridge and you see 1600.

The other thing I was curious was if I could tcpdump the networks created by docker.

Yes, you can!

# docker network ls

# ifconfig 

# tcpdump -i br-xxxx 

MPLS Segment Routing – Arista Lab

We have been able to create some nice MPLS labs using GNS3 and Cisco IOS. In my current employer, we use Arista so I wanted to create a lab environment with Arista kit to simulate a MPLS Segment Routing network. Keeping in mind that I try to run everything on my laptop, using GNS3 + Arista is not an option. You need to use the Arista vEOS image in GNS3 and it demands 2GB RAM per device and 1 CPU. In the past, I think I just managed to start two vEOS VMs before my laptop gave up. But Arista offers a version of EOS for containers.

So, what’s the difference between a virtual machine (VM) and a container? Well, searching the internet is going to give you many all answers. In my very simplify way:

  • VM: needs an hypervisor to simulate hardware. It uses kernel and user space. It has a full OS. So it is like simulation a whole server/pc (imagine a standalone house)
  • Container: runs in user space. Set of processes that are isolated from the rest of the system. Containers provide a way to virtualize an OS so that multiple workloads can run on a single OS instance (imagine an apartment in a building)

You just need to register in Arista web page to download a cEOS image.

Regarding MPLS Segment Routing (or SPRING for Juniper) it is an evolution of the standard MPLS, that was originally developed to improve the routing performance in core networks: avoid to make a routing look-ups per packet in core devices was very expensive in 80/90s (my very simplify way). MPLS started to being deployed around end 90s and became a defacto technology in all service providers. More info here.

Segment Routing is still based in labels, but adds improvements as it doesnt need a protocol for label exchange (one less thing to worry about). As well, it is based in “source routing” as the sources chooses the path and encodes it in the packet.

There are many sources in the internet that can explain MPLS SR better than me like all these:

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/seg_routing/configuration/xe-3s/segrt-xe-3s-book/intro-seg-routing.html

https://www.segment-routing.net/tutorials/

As we are going to use Arista, I based my learning in these presentations:

https://ripe77.ripe.net/presentations/16-20181015-SegmentRouting.pdf

https://www.netnod.se/sites/default/files/2018-03/Peter%20Lundqvist_Arista_8.pdf

And reading more Arista docs.

All the code and how to build the lab is here:

https://github.com/thomarite/ceos-testing

So what we need and what we are going to use in this lab:

  • IPv4 (yeah, I should start working in IPv6…)
  • IGP: we use ISIS
  • Label Distribution: ISIS-SR
  • BGP: using loobacks as best practices and using IGP for building a full-mesh
  • L3/2VPN: EVPN
  • All devices are PE

So let’s build the basic IP connectivity for r01:

!
hostname r01
!
interface Ethernet1
no switchport
ip address 10.0.10.1/30
!
interface Ethernet2
no switchport
ip address 10.0.12.1/30
!
interface Loopback1
description CORE Loopback
ip address 10.0.0.1/32
!
ip routing
!

Now let’s build our IGP with ISIS. We are going to use our Lo1 IP as network ID for each router. As well, we will keep it simple and define all routers as ISIS L2. We dont need anything fancy. We just want ISIS to build our iBGP peering. We will enable ISIS in the core interfaces (in this simple lab, all links and loopbacks)

!
router isis CORE
net 49.0000.0001.0010.0000.0000.0001.00  <-- BASED IN Lo1 !!!
is-type level-2
log-adjacency-changes
set-overload-bit on-startup wait-for-bgp timeout 180
!
interface Ethernet1
no switchport
ip address 10.0.10.1/30
isis enable CORE
isis metric 40
isis network point-to-point
!
interface Ethernet2
no switchport
ip address 10.0.12.1/30
isis enable CORE
isis metric 50
isis network point-to-point
!
interface Loopback1
description CORE Loopback
ip address 10.0.0.1/32
isis enable CORE
isis metric 1
!

It is seems there is a bug in the cEOS I am using as “show isis neighbors” fails but the routing is actually correct. Let’s see from r22:

r22#show ip route
VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked
Gateway of last resort is not set
I L2 10.0.0.1/32 [115/131] via 10.0.10.9, Ethernet1
I L2 10.0.0.2/32 [115/91] via 10.0.10.9, Ethernet1
I L2 10.0.0.3/32 [115/91] via 10.0.23.1, Ethernet2
I L2 10.0.0.4/32 [115/51] via 10.0.23.1, Ethernet2
I L2 10.0.0.5/32 [115/41] via 10.0.10.9, Ethernet1
C 10.0.0.6/32 is directly connected, Loopback1
I L2 10.0.10.0/30 [115/130] via 10.0.10.9, Ethernet1
I L2 10.0.10.4/30 [115/90] via 10.0.23.1, Ethernet2
C 10.0.10.8/30 is directly connected, Ethernet1
I L2 10.0.12.0/30 [115/140] via 10.0.23.1, Ethernet2
I L2 10.0.13.0/30 [115/90] via 10.0.10.9, Ethernet1
C 10.0.23.0/30 is directly connected, Ethernet2
r22#
r22# show logging
...
Log Buffer:
May 24 16:18:22 r22 SuperServer: %SYS-5-SYSTEM_RESTARTED: System restarted
May 24 16:24:29 r22 ConfigAgent: %SYS-5-CONFIG_E: Enter configuration mode from console by root on vty4 (UnknownIpAddr)
May 24 16:24:29 r22 ConfigAgent: %SYS-5-CONFIG_I: Configured from console by root on vty4 (UnknownIpAddr)
May 24 16:24:29 r22 ConfigAgent: %SYS-5-CONFIG_STARTUP: Startup config saved from system:/running-config by root on vty4 (UnknownIpAddr).
May 24 16:24:39 r22 Isis: %ISIS-4-ISIS_ADJCHG: L2 Neighbor State Change for SystemID 0000.0000.0004 on eth2 to UP
May 24 16:24:42 r22 Isis: %ISIS-4-ISIS_ADJCHG: L2 Neighbor State Change for SystemID 0000.0000.0005 on eth1 to UP
May 24 16:26:34 r22 ConfigAgent: %SYS-5-CONFIG_STARTUP: Startup config saved from system:/running-config by root on vty4 (UnknownIpAddr).
r22#
r22#show isis neighbors
% Internal error
% To see the details of this error, run the command 'show error 2'

Let’s build BGP, from r01 is like this:

!
router bgp 100
router-id 10.0.0.1
graceful-restart restart-time 300
graceful-restart
maximum-paths 2
neighbor AS100-CORE peer group
neighbor AS100-CORE remote-as 100
neighbor AS100-CORE next-hop-self
neighbor AS100-CORE update-source Loopback1
neighbor AS100-CORE timers 2 6
neighbor AS100-CORE additional-paths receive
neighbor AS100-CORE additional-paths send any
neighbor AS100-CORE password 7 Nmg+xbfVkywN7BBIllK5yw==
neighbor AS100-CORE send-community standard extended
neighbor AS100-CORE maximum-routes 0
neighbor 10.0.0.2 peer group AS100-CORE
neighbor 10.0.0.2 description R02
neighbor 10.0.0.3 peer group AS100-CORE
neighbor 10.0.0.3 description R11
neighbor 10.0.0.4 peer group AS100-CORE
neighbor 10.0.0.4 description R12
neighbor 10.0.0.5 peer group AS100-CORE
neighbor 10.0.0.5 description R21
neighbor 10.0.0.6 peer group AS100-CORE
neighbor 10.0.0.6 description R22
!

So once we have configured BGP in all routers, we should see a full mesh between all routers. This is from r22:

r22#show ip bgp summary
BGP summary information for VRF default
Router identifier 10.0.0.6, local AS number 100
Neighbor Status Codes: m - Under maintenance
Description Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
R01 10.0.0.1 4 100 7 7 0 0 00:00:05 Estab 0 0
R02 10.0.0.2 4 100 7 7 0 0 00:00:05 Estab 0 0
R11 10.0.0.3 4 100 7 7 0 0 00:00:05 Estab 0 0
R12 10.0.0.4 4 100 6 7 0 0 00:00:04 Estab 0 0
R21 10.0.0.5 4 100 6 7 0 0 00:00:04 Estab 0 0
r22#

Now, enable MPLS and SR extension in ISIS:

!
mpls ip
!
mpls label range isis-sr 800000 65536
!
router isis CORE
  segment-routing mpls
    router-id 10.0.0.1  <-- based on Lo1 in each router
    no shutdown
!
interface Loopback1
  description CORE Loopback
  node-segment ipv4 index 1  <-- this has to be different in each node!!!
!

And you should see 5 ISIS-SR tunnels from each router. From r22:

r22#show isis segment-routing tunnel
Index Endpoint Nexthop Interface Labels TI-LFA
tunnel index

1 10.0.0.2/32 10.0.10.9 Ethernet1 [ 800002 ] -
2 10.0.0.3/32 10.0.23.1 Ethernet2 [ 800003 ] -
3 10.0.0.4/32 10.0.23.1 Ethernet2 [ 3 ] -
4 10.0.0.5/32 10.0.10.9 Ethernet1 [ 3 ] -
5 10.0.0.1/32 10.0.10.9 Ethernet1 [ 800001 ] -
r22#

As you can see above, the labels are based on the base index (800000) defined in the “mpls label range” command and the “node-segment index” defined in the loopback interface. So the label that identifies uniquely r01 is 800000 + 1 = 800001. The label “3” means you are a Penultime-Hop-P router and you remove the label to save a label look-up in the egress router.

Now, let’s configure EVPN for L2/L3VPN deployment in our MPLS network. From r01 should be:

!
service routing protocols model multi-agent --> you will have to reboot
!
router bgp 100
!
address-family evpn
neighbor default encapsulation mpls next-hop-self source-interface Loopback1
neighbor 10.0.0.2 activate
neighbor 10.0.0.3 activate
neighbor 10.0.0.4 activate
neighbor 10.0.0.5 activate
neighbor 10.0.0.6 activate
!

So once this is configured in all routers, we should see again a full mesh of EVPN BGP peers. From r12 this time:

r12#show bgp evpn summary
BGP summary information for VRF default
Router identifier 10.0.0.4, local AS number 100
Neighbor Status Codes: m - Under maintenance
Description Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
R01 10.0.0.1 4 100 1254 1251 0 0 00:03:27 Estab 1 1
R02 10.0.0.2 4 100 1111 1107 0 0 00:03:27 Estab 1 1
R11 10.0.0.3 4 100 961 962 0 0 00:03:27 Estab 1 1
R21 10.0.0.5 4 100 884 888 0 0 00:03:27 Estab 1 1
R22 10.0.0.6 4 100 814 811 0 0 00:03:27 Estab 1 1
r12#

Now, let’s create a L3VPN with CUST-A vrf. We define it in all routers. For r01 should be:

!
vrf instance CUST-A
rd 100:1
!
interface Loopback2
vrf CUST-A
ip address 192.168.0.1/32   <-- each device has a unique one
!
ip routing vrf CUST-A
!
router bgp 100
!
vrf CUST-A
rd 100:1
route-target import evpn 100:1
route-target export evpn 100:1
network 192.168.0.1/32

Let’s see if the routing works from r12

r12#
r12#show bgp evpn
BGP routing table information for VRF default
Router identifier 10.0.0.4, local AS number 100
Route status codes: s - suppressed, * - valid, > - active, # - not installed, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup
% - Pending BGP convergence
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
Network Next Hop Metric LocPref Weight Path 
RD: 100:1 ip-prefix 192.168.0.1/32 10.0.0.1 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.2/32 10.0.0.2 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.3/32 10.0.0.3 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.5/32 10.0.0.5 - 100 0 i 
RD: 100:1 ip-prefix 192.168.0.6/32 10.0.0.6 - 100 0 i
r12#
r12#show ip route vrf CUST-A
VRF: CUST-A
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - BGP, B I - iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS level 2,
O3 - OSPFv3, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route, V - VXLAN Control Service,
DH - DHCP client installed default route, M - Martian,
DP - Dynamic Policy Route, L - VRF Leaked
Gateway of last resort is not set
B I 192.168.0.1/32 [200/0] via 10.0.0.1/32, IS-IS SR tunnel index 5, label 116384
via 10.0.10.5, Ethernet1, label 800001
B I 192.168.0.2/32 [200/0] via 10.0.0.2/32, IS-IS SR tunnel index 2, label 116384
via 10.0.10.5, Ethernet1, label 800002
B I 192.168.0.3/32 [200/0] via 10.0.0.3/32, IS-IS SR tunnel index 3, label 100000
via 10.0.10.5, Ethernet1, label imp-null(3)
C 192.168.0.4/32 is directly connected, Loopback2
B I 192.168.0.5/32 [200/0] via 10.0.0.5/32, IS-IS SR tunnel index 4, label 116384
via 10.0.23.2, Ethernet2, label 800005
B I 192.168.0.6/32 [200/0] via 10.0.0.6/32, IS-IS SR tunnel index 1, label 116384
via 10.0.23.2, Ethernet2, label imp-null(3)
r12#

So, all looks good. EVPN table shows all the prefixes for rd 100:1 and the routing table for CUST-A shows all Lo2 defined in each router.

BTW, I am not able to ping inside the VRF, I think it is something related to the broadcast of ARP:

UPDATE: Arista confirms that cEOS-lab doesn’t support MPLS dataplane. I need to use vEOS (vagrant). So that means I dont think my laptop has enough resources to build this lab in vEOS πŸ™

r01#ping vrf CUST-A ip 192.168.0.6 interface loopback 2
PING 192.168.0.6 (192.168.0.6) from 192.168.0.1 lo2: 72(100) bytes of data.
--- 192.168.0.6 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 40ms
r01#

-- from other session in r01 --

r01#bash
bash-4.2# ip netns exec ns-CUST-A tcpdump -i lo2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo2, link-type EN10MB (Ethernet), capture size 262144 bytes
^C12:46:03.324918 02:00:00:00:00:00 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.6 tell 192.168.0.1, length 28
12:46:04.348750 02:00:00:00:00:00 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.6 tell 192.168.0.1, length 28
12:46:05.376723 02:00:00:00:00:00 (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.6 tell 192.168.0.1, length 28
3 packets captured
3 packets received by filter
0 packets dropped by kernel
bash-4.2#

-- from other session in r22, we dont see anything --

r22#bash
bash-4.2# ip netns exec ns-CUST-A tcpdump -i lo2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo2, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
bash-4.2#

New Approach for Datacenter Networks and Stacks for Low Latency

In a irc channel this week, one guy posted a link about visualization latency in a data center switching network .

And it was really good video for understanding how congestion happens inside the switch infrastructure and a very original idea to overcome this problem!

I tried to get a bit more info about the video and ended in the page of that paper:

http://www.cs.ucl.ac.uk/news/article/sigcomm_best_paper_award_for_mark_handley/

And see if there was any implementation:

https://github.com/nets-cs-pub-ro/NDP/wiki

I am not a researcher but the idea is quite original and it seems you dont need to re-invent the wheel. In the github repo even there is an example in P4. P4 is going to be big, and Barefoot has already commercial solutions about it with their tofino chip. Let’s see what Intel does with it…

Based on a continuation paper, it seems there is no much traction from the big cloud providers, and it surprises me, these guys have the muscle to make this kind of things. I always heard that hardware is very expensive to built and software is not. So there are few player willing to invest in new ideas. Everytime you hear about unicorn companies, nearly all of them are software companies.

And another paper says it needs more tuning/debugging.

I don’t know if it will successful in the future but I think it was interesting watching the video and reading about the concept.

SRv6

This year, in my employer, I completed the migration to a MPLS SR Arista core network from a Brocade MPLS LDP one. Our backbone is still pure IPv4 so anything IPv6 is not going to be added. But this week, via an APNIC blog post I read about SRv6. And it looks quite interesting. So I went to the first post to go a bit deeper about what SRv6 is. Based on the statements of the blog, really big networks are already using this technology and quite a lot of support from the open source community too. I missed Arista in that list though.

So I tried to find some “real” proof of this SRv6 is some pcap files to see the format and get a bit better view. I could find at lest a source with some. The examples are not like the ones mentioned in the APNIC blog post but just for taking a look, it is enough:

So I can see inside the IPv6 header, the SRv6 Header as defined in the rfc.

I dont really understand the second IPv6 header (Dst: b::2). From the first IPv6 header, the destination “f1::” has to be the first instruction SID1. I can see how it mentions it contains a SRH (Next Header: 43). And inside the routing header, we can see it is SR type (Type: 4). I assume that Address[0] and Address[1] are SID2 and SID3.

Would be cool to lab a SRv6 scenario.

Troubleshooting a DCHP Relay connection

Today I have had “fun” troubleshooting an issue that looked easy at first sight. A colleague was trying to PXE boot some server from a network that we haven’t used for a while.

When the server boots up, asks for an IP via DHCP. As we have a centralized DHCP server infrastructure, we have configured DHCP relay in the firewall facing that server to send that request to the DHCP server.

First, let’s take a look at how DHCP relay works. This is a very good link. And this diagram from the mentioned link it is really useful:

One think I learned is the reply (DCHP Offer) doesnt have to use as destination IP the same IP it received as source in DHCP Discover. In the picture, it is packet 2a.

Checking in our environment, we confirm that:

Our server is in 10.94.240.x network. Our firewall is acting as DHCP relay, and send the DHCP Discovery (unicast) to our VIP DHCP Server IP.

The DHCP offer, uses as source the physical IP of the DHCP server and destination is the DHCP relay IP (so it is 10.94.240.1 – the firewall IP in 10.94.240.x network)

Ok, so everything looks fine? No really. The server receives the query, it answers… but we dont see a DCHP Request/ACK.

BTW, keep in mind that DHCP is UDP….

So, we need to see where the packets are lost.

This is a high level path flow between the client and server:

So we need to check this connection is three different firewall vendors….

The initial troubleshooting was just using the GUI tools from Palo/Fortigate. We couldn see anything…. but the server was constantly receiving DHCP Discover and sending DHCP Offer… I dont get it:

# tcpdump -i X udp port 67 or 66 -nn

14:58:06.969462 IP 10.81.25.1.67 > 10.81.251.47.67: BOOTP/DHCP, Request from 6c:2b:59:c1:32:73, length 300
14:58:06.969564 IP 10.81.251.201.67 > 10.94.240.1.67: BOOTP/DHCP, Reply, length 300

14:58:28.329048 IP 10.81.25.1.67 > 10.81.251.47.67: BOOTP/DHCP, Request from 6c:2b:59:c1:32:73, length 300
14:58:28.329157 IP 10.81.251.201.67 > 10.94.240.1.67: BOOTP/DHCP, Reply, length 300

Initially it took me a while to see the request/reply because I was assuming the dhcp request had source 10.94.240.1. So I was seeing only the Reply but not the Request. That was when I went to clarify my head about DHCP Relay and found the link.

So ok, we have the DHCP Request/Reply, but absolutely nothing in the Palo. Is the palo dropping the packets or is forwarding? No idea. The GUI says nothing, I took a packet capture and couldnt see that traffic neither…

Doesnt makes sense.

Let’s get back to basic.

Did I mention DHCP is UDP? So how a next generation firewall (like paloalto) with all the fancy features enable (we have nearly all of them enable…) treats a UDP connection? UDP is stateless… but the firewall is statefull… the firewall creates a flow with the first packet so it can track, any new packet is considered part of that flow. But why we dont see the flows? We actually have only one flow. The firewall has created that session and offloaded to hardware. So you dont see anything else in the control-plane / GUI. The GUI only shows the end of a connection/flow. And as our flow DHCP Relay hasnt’ terminated (it is UDP) and the firewall keeps receiving packets, it is considered life (the firewall doesnt really know what is going on). So for that reason we dont see the connection in the PaloUI. Ok, I got to that point after a while…. I need to proof that the packet from the server is reaching the firewall and it is leaving it too.

How can I do that? Well, I need to delete that flow so the firewall considers a new connection and the tcpdump can see the packets.

This is the a good link from paloalto to take captures. So I found my connection and the cleared it:

palo(active)> show session all filter destination 10.94.240.1

ID Application State Type Flag Src[Sport]/Zone/Proto (translated IP[Port])
Vsys Dst[Dport]/Zone (translated IP[Port])
135493 dhcp ACTIVE FLOW 10.81.251.201[67]/ZONE1/17 (10.81.251.201[67])
vsys1 10.94.240.1[67]/ZONE2 (10.94.240.1[67])
palo(active)>
palo(active)> clear session id 135493

And now, my packet capture in paloalto confirms that it is sending the packet to the next firewall (checking the destination MAC) !!!

Ok, so we confirm the first firewall in the return path was fine…. next one, it is fortigate.

BTW, we were checked and assumed that the routing is fine in all routers, firewalls, etc. Sometimes is not the case… so when things dont follow your thoughts, get back to the very basics….

We have exactly the same issue as in PaloAlto. I can’t see anything in the logs about receiving a dhcp offer from palo and forwarding it to the last firewall Cisco.

And again, we apply the same reasoning. We have an UDP connection, we have a next-generation firewall (with fancy ASIC). And one more thing, in this fortigate firewall, we allow intra-zone traffic, so it is not going to show anyway in the GUI monitor…

So we confirm that we have a flow and cleared it

forti # diag debug flow filter
vf: any
proto: any
Host addr: any
Host saddr: any
host daddr: 10.94.240.1-10.94.240.1
port: any
sport: any
dport: any
co1fw02 #
co1fw02 # diag sys session list
session info: proto=17 proto_state=00 duration=2243 expire=170 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=5
origin-shaper=
reply-shaper=
per_ip_shaper=
class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=8/8
state=may_dirty npu synced
statistic(bytes/packets/allow_err): org=86840/254/1 reply=0/0/0 tuples=2
tx speed(Bps/kbps): 36/0 rx speed(Bps/kbps): 0/0
orgin->sink: org pre->post, reply pre->post dev=39->35/35->39 gwy=10.81.25.1/0.0.0.0
hook=pre dir=org act=noop 10.81.251.201:67->10.94.240.1:67(0.0.0.0:0)
hook=post dir=reply act=noop 10.94.240.1:67->10.81.251.201:67(0.0.0.0:0)
misc=0 policy_id=4294967295 auth_info=0 chk_client_info=0 vd=0
serial=141b05fb tos=ff/ff app_list=0 app=0 url_cat=0
rpdb_link_id = 00000000
dd_type=0 dd_mode=0
npu_state=0x001000
npu info: flag=0x81/0x00, offload=6/0, ips_offload=0/0, epid=8/0, ipid=8/0, vlan=0x00f5/0x0000
vlifid=0/0, vtag_in=0x0000/0x0000 in_npu=0/0, out_npu=0/0, fwd_en=0/0, qid=0/0
no_ofld_reason:
total session 1
forti #
forti # diag sys session clear

In other session, I have a packet capture in the expected egress interface:

forti # diagnose sniffer packet Zone3 'host 10.94.240.1'
interfaces=[Zone3]
filters=[host 10.94.240.1]
301.555231 10.81.251.201.67 -> 10.94.240.1.67: udp 300
316.545677 10.81.251.201.67 -> 10.94.240.1.67: udp 300

Fantastic, we have confirmation that the second firewall receives and forwards the DHCP Reply!!!

Ok, now the last stop, Cisco ASA. This is an old firewall, I think it could be my father or Darth Vader.

I dont have the fancy tools for packet capture like Palo/Fortigate…. so I went to the basic “debug” commands and “packet-tracer”.

First, this was the dhcp config in Cisco:

vader/pri/act# show run | i dhcp
dhcprelay server 10.81.251.47 EGRESS
dhcprelay enable SERVERS-ZONE
dhcprelay timeout 60

And, the ACL allows all IP traffic in those interfaces… and couldnt see any deny in the logs.

So, I enabled all debugging things I could find for dhcp:

vader/pri/act# show debug
debug dhcpc detail enabled at level 1
debug dhcpc error enabled at level 1
debug dhcpc packet enabled at level 1
debug dhcpd packet enabled at level 1
debug dhcpd event enabled at level 1
debug dhcpd ddns enabled at level 1
debug dhcprelay error enabled at level 1
debug dhcprelay packet enabled at level 1
debug dhcprelay event enabled at level 200
vader/pri/act# DHCPD: Relay msg received, fip=ANY, fport=0 on SERVERS-ZONE interface
DHCPRA: relay binding found for client f48e.38c7.1b6e.
DHCPD: setting giaddr to 10.94.240.1.
dhcpd_forward_request: request from f48e.38c7.1b6e forwarded to 10.81.251.47.
DHCPD: Relay msg received, fip=ANY, fport=0 on SERVERS-ZONE interface
DHCPRA: relay binding found for client 6c2b.59c1.3273.
DHCPD: setting giaddr to 10.94.240.1.
dhcpd_forward_request: request from 6c2b.59c1.3273 forwarded to 10.81.251.47.
vader/pri/act#

So, the debugging doesnt says anything regarding the packet coming back from Fortigate… Not looking good I am afraid. I wasnt running out of ideas about debug commands. I coudn’t increase an log level neither….

Let’s give a go to packet tracer… doesnt looks good:

vader/pri/act# packet-tracer input EGRESS udp 10.81.251.201 67 10.94.240.1 67
Phase: 1
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
MAC Access list
Phase: 2
Type: ACCESS-LIST
Subtype:
Result: DROP
Config:
Implicit Rule
Additional Information:
Result:
input-interface: EGRESS
input-status: up
input-line-status: up
Action: drop
Drop-reason: (acl-drop) Flow is denied by configured rule

So, we are sure our ACL is totally open but the firewall is dropping the packet coming from fortigate. Why? How to fix it?

Ok, get back to basics. Focus in Cisco config. It uses as DHCP relay server, 10.81.251.47 (VIP). But the DHCP reply is coming from the physical IP 10.81.251.201….. maybe Cisco doesnt like that…. Let’s try to add the physical IPs as a new DHCP server:

vader/pri/act# sri dhcp
dhcprelay server 10.81.251.47 EGRESS
dhcprelay server 10.81.251.201 EGRESS
dhcprelay server 10.81.251.202 EGRESS

Let’s check packet tracer again:

vader/pri/act# packet-tracer input EGRESS udp 10.81.251.201 67 10.94.240.1 67
Phase: 1
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
MAC Access list
Phase: 2
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
Phase: 3
Type: IP-OPTIONS
Subtype:
Result: ALLOW
Config:
Additional Information:
Phase: 4
Type:
Subtype:
Result: ALLOW
Config:
Additional Information:
Phase: 5
Type:
Subtype:
Result: ALLOW
Config:
Additional Information:
Phase: 6
Type: VPN
Subtype: ipsec-tunnel-flow
Result: ALLOW
Config:
Additional Information:
Phase: 7
Type: FLOW-CREATION
Subtype:
Result: ALLOW
Config:
Additional Information:
New flow created with id 340328245, packet dispatched to next module
Result:
input-interface: EGRESS
input-status: up
input-line-status: up
Action: allow
vader/pri/act#

Good, that’s a good sign finally!!!

I think I nearly cried after seeing this in the dhcp logs in our server:

May 12 16:16:27 dhcp1 dhcpd[2561]: DHCPDISCOVER from f4:8e:38:c7:1b:6e via 10.94.240.1
May 12 16:16:28 dhcp1 dhcpd[2561]: DHCPOFFER on 10.94.240.50 to f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 0 class decls to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 0 deleted host decls to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 0 new dynamic host decls to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: Wrote 1 leases to leases file.
May 12 16:16:28 dhcp1 dhcpd[2561]: DHCPREQUEST for 10.94.240.50 (10.81.251.202) from f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1
May 12 16:16:28 dhcp1 dhcpd[2561]: DHCPACK on 10.94.240.50 to f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1

So at the end, finally fixed…. it took too many hours.

Notes:

  • DHCP Realy: It is not that obvious the flow regarding IPs.
  • UDP and firewalls, debugging it is a bit more challenging.
  • Cisco ASA dhcprelay server IPs…. VIPs and non-VIPs please.

All this would be easier/quicker with TCP πŸ˜›

TCP Congestion Control and Recovery

I have reading this new post from Cloudflare about their congestion control implementations for QUIC.

Reading the article I wanted to check the TCP CCA (Congestion Control Algorithm) available in my laptop (Debian 1o Testing).

So I searched a bit and found a couple of useful links like this:

For checking your current TCP CCA:

# sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic

$ cat /proc/sys/net/ipv4/tcp_congestion_control
cubic

For checking the available TCP CCAs:

# sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic

As well, you can see via “ss” the CCA per connection:

$ ss -ti
...
tcp   ESTAB      0       0                                     192.168.1.158:60238                     169.54.204.232:https       
	 cubic wscale:7,7 rto:320 rtt:116.813/2.428 ato:40 mss:1448 pmtu:1500 rcvmss:1448 advmss:1448 cwnd:10 bytes_sent:4366 bytes_acked:4367 bytes_received:7038 segs_out:98 segs_in:183 data_segs_out:91 data_segs_in:93 send 991.7Kbps lastsnd:1260 lastrcv:1260 lastack:1140 pacing_rate 2.0Mbps delivery_rate 102.2Kbps delivered:92 app_limited busy:10632ms rcv_space:14480 rcv_ssthresh:64088 minrtt:113.391
...

If you want to change your TCP CCA, this is a good link:

Check the modules installed:

$ ls -la /lib/modules/$(uname -r)/kernel/net/ipv4

Check the kernel config:

$ grep TCP_CONG /boot/config-$(uname -r)
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
CONFIG_TCP_CONG_CDG=m
CONFIG_TCP_CONG_BBR=m
CONFIG_DEFAULT_TCP_CONG="cubic"

We can see that “cubic” is the default TCP CCA and we have for example BBR available as a module.

So let’s change to BBR (rfc, github, blog) based on this link:

Check the kernel supports BBR:

$ cat /boot/config-$(uname -r) | grep 'CONFIG_TCP_CONG_BBR'
CONFIG_TCP_CONG_BBR=m
$ cat /boot/config-$(uname -r) | grep 'CONFIG_NET_SCH_FQ'
CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_NET_SCH_FQ=m

Enable TCP BBR:

# vi /etc/sysctl.conf
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

Apply the changes:

# sysctl --system

And check:

$ cat /proc/sys/net/ipv4/tcp_congestion_control
bbr

So we have moved from CUBIC to BBR. Let’s see how is the experience in the following days.

GNS3: Load-Balancing with Route Reflectors in a MPLS L3VPN network

I read once about how to do load-balancing when using Route-Reflectors (RR) in a MPLS L3VPN network. It is a insteresting topic because RRs only reflect the best prefixes to the its clients. So how we make the RR to send more than one?

So I built a GNS3 lab to work on this subject:

https://github.com/thomarite/mpls-rr

This is our scenario:

  • We have one customer vrf “CUST-A” with three locations: TY, LD and NY.
  • We are using BGP for PE-CE routing. Each site will use a different private ASN. Our SP is ASN 100.
  • TY has two connection to our SP so we want to make use of both of them.
  • We have a RR SP2 that is in line. So we need a full-mesh iBGP from all PE to SP2.
  • Our SP IGP is OSFP.
  • The goal is to make all other PE connected to CUST-A sites to be able to load-balance to TY site prefixes 192.168.11.0/24 and 192.168.12.0/24 using TY-SP1 and TY-SP3.

We start building the whole network as standard. This is very similar as stated in our first lab:

This is RR SP2 config:

!
ip vrf CUST-A
 rd 100:1 
 route-target export 1:100
 route-target import 1:100
!
interface Loopback0
 ip address 10.0.2.1 255.255.255.255
!         
interface GigabitEthernet1/0
 description to SP1-PE
 ip address 10.0.12.2 255.255.255.0
 negotiation auto
 mpls ip
!
interface GigabitEthernet2/0
 description to SP3-PE
 ip address 10.0.23.2 255.255.255.0
 negotiation auto
 mpls ip
!
interface FastEthernet3/0
 description TO-LD-SP4
 ip address 10.0.24.2 255.255.255.0
 duplex auto
 speed auto
 mpls ip
!
router ospf 1
 log-adjacency-changes
 network 10.0.2.0 0.0.0.255 area 0
 network 10.0.12.0 0.0.0.255 area 0
 network 10.0.23.0 0.0.0.255 area 0
 network 10.0.24.0 0.0.0.255 area 0
!
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 neighbor 10.0.1.1 remote-as 100
 neighbor 10.0.1.1 update-source Loopback0
 neighbor 10.0.1.1 route-reflector-client
 neighbor 10.0.3.1 remote-as 100
 neighbor 10.0.3.1 update-source Loopback0
 neighbor 10.0.3.1 route-reflector-client
 neighbor 10.0.4.1 remote-as 100
 neighbor 10.0.4.1 update-source Loopback0
 neighbor 10.0.4.1 route-reflector-client
 neighbor 10.0.5.1 remote-as 100
 neighbor 10.0.5.1 update-source Loopback0
 neighbor 10.0.5.1 route-reflector-client
 no auto-summary
 !
 address-family vpnv4
  neighbor 10.0.1.1 activate
  neighbor 10.0.1.1 send-community both
  neighbor 10.0.1.1 route-reflector-client
  neighbor 10.0.3.1 activate
  neighbor 10.0.3.1 send-community both
  neighbor 10.0.3.1 route-reflector-client
  neighbor 10.0.4.1 activate
  neighbor 10.0.4.1 send-community both
  neighbor 10.0.4.1 route-reflector-client
  neighbor 10.0.5.1 activate
  neighbor 10.0.5.1 send-community both
  neighbor 10.0.5.1 route-reflector-client
 exit-address-family
 !
 address-family ipv4 vrf CUST-A
  no synchronization
 exit-address-family
!
!
mpls ldp router-id Loopback0 force

The configs for the SP PE follow the same patern, this is TY-SP1:

!
ip vrf CUST-A
 rd 100:1 
 route-target export 1:100
 route-target import 1:100
!
interface Loopback0
 ip address 10.0.1.1 255.255.255.255
!
interface FastEthernet0/0
 description to HQ
 ip vrf forwarding CUST-A
 ip address 172.16.100.254 255.255.255.0
 duplex half
!
interface GigabitEthernet1/0
 description to SP2-P
 ip address 10.0.12.1 255.255.255.0
 negotiation auto
 mpls ip
!
router ospf 1
 log-adjacency-changes
 network 10.0.1.0 0.0.0.255 area 0
 network 10.0.12.0 0.0.0.255 area 0
!
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 neighbor 10.0.2.1 remote-as 100
 neighbor 10.0.2.1 update-source Loopback0
 no auto-summary
 !
 address-family vpnv4
  neighbor 10.0.2.1 activate
  neighbor 10.0.2.1 send-community both
 exit-address-family
 !
 address-family ipv4 vrf CUST-A
  neighbor 172.16.100.1 remote-as 65001
  neighbor 172.16.100.1 activate
  neighbor 172.16.100.1 soft-reconfiguration inbound
  no synchronization
 exit-address-family
!
mpls ldp router-id Loopback0 force
!

Let’ see if LD-CE1 can ping our TY-C1

LD-CE1#traceroute 192.168.12.1 source 172.16.30.1 

Type escape sequence to abort.
Tracing the route to 192.168.12.1

  1 172.16.101.254 8 msec 20 msec 8 msec
  2 10.0.24.2 [MPLS: Labels 18/23 Exp 0] 40 msec 40 msec 36 msec
  3 172.16.200.254 [MPLS: Label 23 Exp 0] 12 msec 32 msec 28 msec
  4 172.16.200.1 60 msec 40 msec 40 msec
  5 192.168.12.1 [AS 65001] 40 msec 60 msec 60 msec
LD-CE1#
LD-CE1#
LD-CE1#ping 192.168.11.1 source 172.16.30.1       

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.1, timeout is 2 seconds:
Packet sent with a source address of 172.16.30.1 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 44/54/72 ms
LD-CE1#
LD-CE1#
LD-CE1#
LD-CE1#sh
LD-CE1#show ip rou
LD-CE1#show ip route 
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

B    192.168.12.0/24 [20/0] via 172.16.101.254, 01:19:31
     172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
C       172.16.30.1/32 is directly connected, Loopback0
C       172.16.101.0/24 is directly connected, FastEthernet0/0
B    192.168.11.0/24 [20/0] via 172.16.101.254, 01:19:31
LD-CE1#

So, what do we see when everything is configured?

From SP2-RR, we see all BGP peers up to PEs and in the vpnv4 table we can see the TY prefixes 192.168.11.0/24 and 192.168.12.0/24. But only the path from TY-SP1 is preferred….

SP2#show ip ospf neighbor 

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.0.4.1          1   FULL/DR         00:00:39    10.0.24.1       FastEthernet3/0
10.0.3.1          1   FULL/DR         00:00:39    10.0.23.1       GigabitEthernet2/0
10.0.1.1          1   FULL/BDR        00:00:37    10.0.12.1       GigabitEthernet1/0
SP2#
SP2#
SP2#show ip bgp summary 
BGP router identifier 10.0.2.1, local AS number 100
BGP table version is 1, main routing table version 1

Neighbor        V          AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.0.1.1        4        100      98     111        1    0    0 01:25:16        0
10.0.3.1        4        100      93     108        1    0    0 01:25:05        0
10.0.4.1        4        100      96     114        1    0    0 00:55:06        0
10.0.5.1        4        100      29      32        1    0    0 00:28:02        0
SP2#
SP2#show ip bgp vpnv4 all 
BGP table version is 9, local router ID is 10.0.2.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*>i172.16.30.1/32   10.0.4.1                 0    100      0 65002 i
*>i192.168.11.0     10.0.1.1                 0    100      0 65001 i
* i                 10.0.3.1                 0    100      0 65001 i
*>i192.168.12.0     10.0.1.1                 0    100      0 65001 i
* i                 10.0.3.1                 0    100      0 65001 i
SP2#

Let confirm that the PE only receive the best prefix from the RR. So, from LD-SP4, we can see the paths to TY 192.168.11/12 via TY-SP1 only:

LD-SP4#show ip bgp vpnv4 all 
BGP table version is 18, local router ID is 10.0.4.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*> 172.16.30.1/32   172.16.101.1             0             0 65002 i
*>i192.168.11.0     10.0.1.1                 0    100      0 65001 i
*>i192.168.12.0     10.0.1.1                 0    100      0 65001 i
LD-SP4#

How do we make RR-SP2 to learn and advertise TY-SP1 and TY-SP3 paths. We need to use different RD in TY-SP1/3 respectively.

We have RD 100:1 assigned to CUST-A in all PEs. We are going to change that in TY-SP1/3 so RR will see two different VPNv4 prefixes for the same destination.

Let’s change TY-SP1 RD 100:1 to 100:101 and TY-SP3 to 100:102. Watch out as all routing config related to VRF CUST-A will disappear.

And what about the RT config? Do we have to change anything? Actually, we need to keep it the same (we need to retype it), nothing changes here. Keep in mind that RT is used to import/export vpnv4 prefixes into the VRF. The RD is not used to import/export so for that reason (as we are going to see) we could actually use any RD for a VRF in a PE.

Let’s see the changes for TY-SP1:

TY-SP1(config)#ip vrf CUST-A
TY-SP1(config-vrf)#no rd 100:1
% "rd 100:1" for VRF CUST-A scheduled for deletion
TY-SP1(config-vrf)#
*Apr 27 22:28:48.347: %BGP-5-ADJCHANGE: neighbor 172.16.100.1 vpn vrf CUST-A Down Neighbor deleted
TY-SP1(config-vrf)#rd 100:101
% Deletion of "rd" in progress; wait for it to complete
TY-SP1(config-vrf)#
TY-SP1(config-vrf)#rd 100:101
TY-SP1(config-vrf)#route-target export 100:1
TY-SP1(config-vrf)#route-target import 100:1
TY-SP1(config-vrf)#exit
TY-SP1(config)#router bgp 100
TY-SP1(config-router)#address-family ipv4 vrf CUST-A 
TY-SP1(config-router-af)#  neighbor 172.16.100.1 remote-as 65001
TY-SP1(config-router-af)#  neighbor 172.16.100.1 activate
TY-SP1(config-router-af)#  neighbor 172.16.100.1 soft-reconfiguration inbound
TY-SP1(config-router-af)#
*Apr 27 22:33:50.571: %BGP-5-ADJCHANGE: neighbor 172.16.100.1 vpn vrf CUST-A Up 
TY-SP1(config-router-af)#

So after repeating the same step in TY-SP3 (using RD 100:102), let’s see what happens in RR-SP2:

SP2#show ip bgp vpnv4 all 
BGP table version is 51, local router ID is 10.0.2.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*>i172.16.30.1/32   10.0.4.1                 0    100      0 65002 i
* i192.168.11.0     10.0.3.1                 0    100      0 65001 i
*>i                 10.0.1.1                 0    100      0 65001 i
* i192.168.12.0     10.0.3.1                 0    100      0 65001 i
*>i                 10.0.1.1                 0    100      0 65001 i
Route Distinguisher: 100:101
*>i192.168.11.0     10.0.1.1                 0    100      0 65001 i
*>i192.168.12.0     10.0.1.1                 0    100      0 65001 i
Route Distinguisher: 100:102
*>i192.168.11.0     10.0.3.1                 0    100      0 65001 i
*>i192.168.12.0     10.0.3.1                 0    100      0 65001 i
SP2#

Now we can see VPNv4 for 100:101 (TY-SP1) and 100:102 (TY-SP2)!!!

Ok, let’s what the other PE are seeing. In our case, let’s check LD-SP4:

LD-SP4#show ip bgp vpnv4 all 
BGP table version is 18, local router ID is 10.0.4.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*> 172.16.30.1/32   172.16.101.1             0             0 65002 i
* i192.168.11.0     10.0.3.1                 0    100      0 65001 i
*>i                 10.0.1.1                 0    100      0 65001 i
* i192.168.12.0     10.0.3.1                 0    100      0 65001 i
*>i                 10.0.1.1                 0    100      0 65001 i
Route Distinguisher: 100:101
*>i192.168.11.0     10.0.1.1                 0    100      0 65001 i
*>i192.168.12.0     10.0.1.1                 0    100      0 65001 i
Route Distinguisher: 100:102
*>i192.168.11.0     10.0.3.1                 0    100      0 65001 i
*>i192.168.12.0     10.0.3.1                 0    100      0 65001 i
LD-SP4#
LD-SP4#
LD-SP4#show ip route vrf CUST-A

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

B    192.168.12.0/24 [200/0] via 10.0.1.1, 00:01:05
     172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
B       172.16.30.1/32 [20/0] via 172.16.101.1, 00:33:47
C       172.16.101.0/24 is directly connected, FastEthernet0/0
B    192.168.11.0/24 [200/0] via 10.0.1.1, 00:01:05
LD-SP4#

So, LD-SP4 is receiving the VPNv4 100:101 and 100:102 from RR-SP2!!! That’s good, but we are still seeing the path to TY 192.168.11/12 prefixes via TY-SP1 (10.0.1.1) only.

So why BGP ECMP is not working? Because we have to enable it.

LD-SP4(config)#router bgp 100
LD-SP4(config-router)#address-family ipv4 vrf CUST-A
LD-SP4(config-router-af)#maximum-paths eibgp 2
LD-SP4(config-router-af)#
*Apr 27 22:58:25.447: BGP: VPNv4 Unicast multipath configuration changed
*Apr 27 22:58:25.447: BGP-VPN(4):  MPLS label changed for prefix 100:1:192.168.11.0/24
*Apr 27 22:58:25.447: BGP-VPN(4): multipath from neighbor 10.0.2.1 nexthop 10.0.3.1 new outlabel 24
*Apr 27 22:58:25.447: vpn: free local label 1048577 for remote prefix CUST-A:192.168.11.0/24
*Apr 27 22:58:25.447: vpn: get path labels: 100:1:192.168.11.0/255.255.255.0
*Apr 27 22:58:25.451: vpn(4): inlabel=nolabel, outlabel=22, outlabel owner=BGP
*Apr 27 22:58:25.451: vpn(4): Announce labels to IPRM CUST-A:192.168.11.0/24 gw 10.0.1.1 inlabel=nolabel, outlabel=22
*Apr 27 22:58:25.451: BGP-VPN(4):  MPLS label changed for prefix 100:1:192.168.12.0/24
*Apr 27 22:58:25.451: BGP-VPN(4): multipath from neighbor 10.0.2.1 nexthop 10.0.3.1 new outlabel 23
*Apr 27 22:58:25.451: vpn: free local label 1048577 for remote prefix CUST-A:192.168.12.0/24
*Apr 27 22:58:25.451: vpn: get path labels: 100:1:192.168.12.0/255.255.255.0
*
LD-SP4(config-router-af)#endApr 27 22:58:25.451: vpn(4): inlabel=nolabel, outlabel=21, outlabel owner=BGP
*Apr 27 22:58:25.451: vpn(4): Announce labels to IPRM CUST-A:192.168.12.0/24 gw 10.0.1.1 inlabel=nolabel, outlabel=21
*Apr 27 22:58:25.455: vpn: get path labels: 100:1:192.168.11.0/255.255.255.0
*Apr 27 22:58:25.459: vpn(4): inlabel=nolabel, outlabel=24, outlabel owner=BGP
*Apr 27 22:58:25.459: vpn(4): Announce labels to IPRM CUST-A:192.168.11.0/24 gw 10.0.3.1 inlabel=nolabel, outlabel=24
*Apr 27 22:58:25.459: vpn(4): get path labels; 100:1:192.168.11.0/24 nexthop 10.0.3.1, not bestpath
*Apr 27 22:58:25.475: vpn: get path labels: 100:1:192.168.12.0/255.255.255.0
*Apr 27 22:58:25.475: vpn(4): inlabel=nolabel, outlabel=23, outlabel owner=BGP
*Apr 27 22:58:25.475: vpn(4): Announce labels to IPRM CUST-A:192.168.12.0/24 gw 10.0.3.1 inlabel=nolabel, outlabel=23
*Apr 27 22:58:25.479: vpn(4): get path labels; 100:1:192.168.12.0/24 nexthop 10.0.3.1, not bestpath
LD-SP4(config-router-af)#end
LD-SP4#
*Apr 27 22:58:27.411: %SYS-5-CONFIG_I: Configured from console by console
LD-SP4#
LD-SP4#
LD-SP4#show ip route vrf CUST-A

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

B    192.168.12.0/24 [200/0] via 10.0.3.1, 00:00:07
                     [200/0] via 10.0.1.1, 00:02:18
     172.16.0.0/16 is variably subnetted, 2 subnets, 2 masks
B       172.16.30.1/32 [20/0] via 172.16.101.1, 00:35:00
C       172.16.101.0/24 is directly connected, FastEthernet0/0
B    192.168.11.0/24 [200/0] via 10.0.3.1, 00:00:07
                     [200/0] via 10.0.1.1, 00:02:18
LD-SP4#

We finally got it! Our PE LD-SP4 is able to see two paths to TY prefixes!

In summary:

  • We need to change the VRF RD in the PE we want to be participant in load-balancing
  • We need to enable EIBGP ECMP

GNS3: PE-CE OSPF, Down Bit and External LSA

This is a continuation of the other post abount installing and configuring a basic MPLS L3VPN network in GNS3.

Normally, we always have a routing protocol running between the customer CPE and the provider PE. OSPF was very common and I used to be give for granted the routing loop avoidance in a dual-home CPE, I knew the idea but never really hammered it in my head. Until a couple of months ago that I hit an issue during the migration of my employer MPLS network to a new vendor. The new vendor didnt implemented the OSPF Down bit. /o\

Summary: If an LSA arrives at a PE with the down bit set, that will never be redistributed into BGP. This prevents the route from leaking in from one PE back into another PE.

The RFC for using OSPF in PE-CE in MPLS VPNs is here:

Note: Down-Bit is only used in LSA3!

It was frustrating but it was a good excuse too because it pushed me (and I could justify) to move our PE-CE to BGP.

In general I always read these blogs when I want to refresh my OSPF Down Bit. So all merits are for them:

http://dtdccie.blogspot.com/2016/03/ospf-down-bit-set.html

https://mellowd.co.uk/ccie/ospf-as-the-pe-ce-routing-protocols-deep-dive-part-1-of-2/

https://mellowd.co.uk/ccie/ospf-as-the-pe-ce-routing-protocols-deep-dive-part-3-of-3-loop-prevention/

So with this background, I built a GNS3 lab to show OSPF Down-Bit in action:

https://github.com/thomarite/mpls-down-bit

The big picture is: CE (HQ, BRANCH) routers are running OSPF with the PE (SP1/3/4) routers. The PE routers redistribute these OSPF routes into BGP and then converts them to VPNv4 NLRI. These VPNv4 NLRI are advetised to other PE routers via BGP. The PE also converts these VPNv4 routes back into OSPF and then off to the CE router.

Now in more detail, let’s see where we can have a routing loop:

  • 1) HQ sends a LSA1 to SP1 with Lo:172.16.10.1/32 and the connected network to PE 172.16.100.0/24
HQ#show ip ospf database router internal self-originate 

            OSPF Router with ID (172.16.110.1) (Process ID 1)

		Router Link States (Area 10)

  Now in min table 
  Table index: 42 min 17 sec
  LS age: 321
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 172.16.110.1
  Advertising Router: 172.16.110.1
  LS Seq Number: 80000003
  Checksum: 0x7247
  Length: 48
  AS Boundary Router
  Number of Links: 2

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 172.16.10.1
     (Link Data) Network Mask: 255.255.255.255
      Number of TOS metrics: 0
       TOS 0 Metrics: 1
          
    Link connected to: a Transit Network
     (Link ID) Designated Router address: 172.16.100.1
     (Link Data) Router Interface address: 172.16.100.1
      Number of TOS metrics: 0
       TOS 0 Metrics: 1
  • 2) SP1 received the new OSPF route from HQ (172.16.10.1/32) and it is redistributed into BGP so other PEs can receive it (SP3 and SP4) as a VPNv4. The connected 172.16.100.0/24 is as well redistributed into BGP
SP1#show ip ospf database router internal adv-router 172.16.110.1

            OSPF Router with ID (10.0.1.1) (Process ID 1)

            OSPF Router with ID (172.16.100.254) (Process ID 10)

		Router Link States (Area 10)

  Routing Bit Set on this LSA
  Now in min table 
  Table index: 45 min 42 sec
  LS age: 648
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 172.16.110.1
  Advertising Router: 172.16.110.1
  LS Seq Number: 80000003
  Checksum: 0x7247
  Length: 48
  AS Boundary Router
  Number of Links: 2

    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 172.16.10.1
     (Link Data) Network Mask: 255.255.255.255
      Number of TOS metrics: 0
       TOS 0 Metrics: 1

    Link connected to: a Transit Network
     (Link ID) Designated Router address: 172.16.100.1
     (Link Data) Router Interface address: 172.16.100.1
      Number of TOS metrics: 0
       TOS 0 Metrics: 1


SP1# 
SP1#show ip route vrf CUST-A

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
B       172.16.200.0/24 [200/0] via 10.0.3.1, 00:41:47
B       172.16.201.0/24 [200/0] via 10.0.4.1, 00:41:47
B       172.16.20.1/32 [200/2] via 10.0.3.1, 00:41:47
O       172.16.10.1/32 [110/2] via 172.16.100.1, 00:43:58, FastEthernet0/0
O E1    172.16.110.1/32 [110/21] via 172.16.100.1, 00:43:58, FastEthernet0/0
C       172.16.100.0/24 is directly connected, FastEthernet0/0
SP1#
SP1#show ip bgp vpnv4 all 
BGP table version is 14, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*> 172.16.10.1/32   172.16.100.1             2         32768 ?
* i172.16.20.1/32   10.0.4.1                 2    100      0 ?
*>i                 10.0.3.1                 2    100      0 ?
*> 172.16.100.0/24  0.0.0.0                  0         32768 ?
*> 172.16.110.1/32  172.16.100.1            21         32768 ?
* i172.16.200.0/24  10.0.4.1                 2    100      0 ?
*>i                 10.0.3.1                 0    100      0 ?
*>i172.16.201.0/24  10.0.4.1                 0    100      0 ?
* i                 10.0.3.1                 2    100      0 ?
SP1#

  • It is important to notice how the VPNv4 for 172.16.10.1/32 is built in SP1. Based on the rfc section 4.2.6 “Handling LSAs from the CE” we see the following:
When a PE router receives, from a CE router, any LSA with the DN bit [OSPF-DN] set, the information from that LSA MUST NOT be used by the route calculation. If a Type 5 LSA is received from the CE, and if it has an OSPF route tag value equal to the VPN Route Tag (see Section 4.2.5.2), then the information from that LSA MUST NOT be used by the route calculation.

Otherwise, the PE must examine the corresponding VRF.For every address prefix that was installed in the VRF by one of its associated OSPF instances, the PE must create a VPN-IPv4 route in BGP. Each such route will have some of the following Extended Communities attributes:

– The OSPF Domain Identifier Extended Communities attribute. If the OSPF instance that installed the route has a non-NULL primary Domain Identifier, this MUST be present; if that OSPF instance has only a NULL Domain Identifier, it MAY be omitted. This attribute is encoded with a two-byte type field, and its type is 0005, 0105, or 0205. For backward compatibility, the type 8005 MAY be used as well and is treated as if it were 0005. If the OSPF instance has a NULL Domain Identifier, and the OSPF Domain Identifier Extended Communities attribute is present, then the attribute’s value field must be all zeroes, and its type field may be any of 0005, 0105, 0205, or 8005.

– OSPF Route Type Extended Communities Attribute. This attribute MUST be present. It is encoded with a two-byte type field, and its type is 0306. To ensure backward compatibility, the type 8000 SHOULD be accepted as well and treated as if it were type 0306. The remaining six bytes of the Attribute are encoded as follows:

     Area Number – Route Type – Options

So the very first paragraph is our answer when we reach SP3 (when dealing with a LSA3) and there is no loop. And the second paragrah is our answer when delaling with a LS5 and avoid a loop (more of this later). So this is our VPNv4 for 172.16.10.1/32

SP1#
SP1#show ip bgp vpnv4 rd 100:1 172.16.10.1/32 
BGP routing table entry for 100:1:172.16.10.1/32, version 5
Paths: (1 available, best #1, table CUST-A)
  Advertised to update-groups:
        2
  Local
    172.16.100.1 from 0.0.0.0 (10.0.1.1)
      Origin incomplete, metric 2, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.10:2:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out 21/nolabel
SP1#

So the extended communities generated from being a OSPF prefix are OSPF DOMAIN ID, OSPF Route Type (RT) and OSPF ROUTER ID.

I haven’t configured “ospf domain ID” in any router so Cisco IOS is generating one for itself (although it should be NULL) in OSPF DOMAIN ID.

For OSPF RT, we have are 10 (0.0.0.10) and LSA2 (although it should be LSA1). ROUTER ID is the expected one.

  • 3) SP2 is just a P router so it is transparent here. Doesnt know anything about BGP, VPNv4, etc. It just does LDP and IGP.
SP2#show ip bgp summary 
% BGP not active

SP2#show ip route ospf 
     10.0.0.0/8 is variably subnetted, 7 subnets, 2 masks
O       10.0.3.1/32 [110/2] via 10.0.23.1, 00:45:04, GigabitEthernet2/0
O       10.0.1.1/32 [110/2] via 10.0.12.1, 00:44:54, GigabitEthernet1/0
O       10.0.4.1/32 [110/3] via 10.0.23.1, 00:44:54, GigabitEthernet2/0
O       10.0.34.0/24 [110/2] via 10.0.23.1, 00:44:54, GigabitEthernet2/0
SP2#
  • 4) SP3 received the new VPNv4, it is redistributed from BGP to OSPF as a LSA3 (The MPLS backbone is a super OSPF area 0). If we pay attention to the details of the LSA3 (Summary) from HQ prefix 172.16.10.1/32 “show ip ospf database summary 172.16.10.1” we can see two details. First, the two LSA are one from SP3 (advert router 172.16.200.254) and the other from SP4 (advert router 172.16.201.254). Second, both show “Downward” in the options field. As stated earlier, this is directed by the rfc for any PE sending a LSA3. So, if iBGP has AD of 200 and OSPF has AD of 110. How come we have installed the BGP prefix in the routing table for 172.16.10.1/32 instead of the OSPF prefix coming from SP4. As per the standard mentioned earlier, if a PE router receives an OSPF prefix with the down bit enabled (“Downward”), the PE router ignores that prefix. The “Downward” bit is saying the prefix is coming from another PE in the same area so if you accept it, you will trigger a routing loop. Keep in mind that SP4 is doing the same thing as we see below in the commands for SP3. If SP3 accepts the OSPF prefix from SP4 for reaching 172.16.10.1/32 (HQ), SP4 is doing the same thing, accepting the SP3 prefix for reaching 172.16.10.1/32 (HQ). So SP3 would send traffic to SP4, and SP4 would return it back to SP3. When both SP3/SP4 learn the OSPF prefix from each other, they will stop redistributing the BGP prefix (that is coming from SP1/HQ) into OSPF so we reach a point where there is no more LSA3 for 172.16.10.1! and the process starts again. As well SP3/4 will redistribute the OPSF prefix learned from the other SP into BGP. So we are back to the intial stage, SP3/SP4 only have the BGP prefix for 172.16.10.1 (from SP2 or SP3/4), as it is the best route, it is redistributed to OSPF, and you know what happens next.
SP3#show ip bgp vpnv4 all 
BGP table version is 13, local router ID is 10.0.3.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*>i172.16.10.1/32   10.0.1.1                 2    100      0 ?
* i172.16.20.1/32   10.0.4.1                 2    100      0 ?
*>                  172.16.200.1             2         32768 ?
*>i172.16.100.0/24  10.0.1.1                 0    100      0 ?
*>i172.16.110.1/32  10.0.1.1                21    100      0 ?
* i172.16.200.0/24  10.0.4.1                 2    100      0 ?
*>                  0.0.0.0                  0         32768 ?
* i172.16.201.0/24  10.0.4.1                 0    100      0 ?
*>                  172.16.200.1             2         32768 ?
SP3#
SP3#
SP3#show ip route vrf CUST-A

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
C       172.16.200.0/24 is directly connected, FastEthernet0/0
O       172.16.201.0/24 [110/2] via 172.16.200.1, 00:45:46, FastEthernet0/0
O       172.16.20.1/32 [110/2] via 172.16.200.1, 00:45:46, FastEthernet0/0
B       172.16.10.1/32 [200/2] via 10.0.1.1, 00:43:35
B       172.16.110.1/32 [200/21] via 10.0.1.1, 00:43:35
B       172.16.100.0/24 [200/0] via 10.0.1.1, 00:43:35
SP3#
SP3#show ip ospf database         

            OSPF Router with ID (10.0.3.1) (Process ID 1)

		Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
10.0.1.1        10.0.1.1        1076        0x80000003 0x00D9F2 2
10.0.2.1        10.0.2.1        1132        0x80000004 0x00D79A 3
10.0.3.1        10.0.3.1        1105        0x80000004 0x0083C1 3
10.0.4.1        10.0.4.1        1095        0x80000003 0x00D0C5 2

		Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.12.2       10.0.2.1        1132        0x80000002 0x00FFFA
10.0.23.1       10.0.3.1        1105        0x80000002 0x009F4E
10.0.34.2       10.0.4.1        1095        0x80000002 0x002BB3

            OSPF Router with ID (172.16.200.254) (Process ID 10)

		Router Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum Link count
172.16.20.1     172.16.20.1     1105        0x80000004 0x00750C 3
172.16.200.254  172.16.200.254  1116        0x80000003 0x0059C2 1
172.16.201.254  172.16.201.254  1121        0x80000003 0x005DBA 1

		Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.200.254  172.16.200.254  1116        0x80000002 0x00F4E4
172.16.201.254  172.16.201.254  1121        0x80000002 0x00EBEA

		Summary Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.10.1     172.16.200.254  1116        0x80000002 0x000C61
172.16.10.1     172.16.201.254  1121        0x80000002 0x000567
172.16.100.0    172.16.200.254  1116        0x80000002 0x002AEA
172.16.100.0    172.16.201.254  1121        0x80000002 0x0023F0

		Type-5 AS External Link States

Link ID         ADV Router      Age         Seq#       Checksum Tag
172.16.110.1    172.16.200.254  1116        0x80000002 0x005FD9 3489661028
172.16.110.1    172.16.201.254  1121        0x80000002 0x0058DF 3489661028
SP3#  
SP3#
SP3#
SP3#show ip ospf database  summary 172.16.10.1

            OSPF Router with ID (10.0.3.1) (Process ID 1)

            OSPF Router with ID (172.16.200.254) (Process ID 10)

		Summary Net Link States (Area 10)

  LS age: 1127
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 172.16.10.1 (summary Network Number)
  Advertising Router: 172.16.200.254
  LS Seq Number: 80000002
  Checksum: 0xC61
  Length: 28
  Network Mask: /32
	TOS: 0 	Metric: 2 

  LS age: 1132
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 172.16.10.1 (summary Network Number)
  Advertising Router: 172.16.201.254
  LS Seq Number: 80000002
  Checksum: 0x567
  Length: 28
  Network Mask: /32
	TOS: 0 	Metric: 2 

SP3# 

Like we did in SP1, let’s see how SP3 deals with the VPNv4 for 172.16.10.1/32.

Based on th rfc “4.2.8” VPNv4 Routes received via BGP, we need to check “4.2.8.1 External Routes” (LSA5/7) and “4.2.8.2 Summary Routes” (LSA3) and the VPNv4 received:

SP3#show ip bgp vpnv4 rd 100:1 172.16.10.1/32 
BGP routing table entry for 100:1:172.16.10.1/32, version 8
Paths: (1 available, best #1, table CUST-A)
  Not advertised to any peer
  Local
    10.0.1.1 (metric 3) from 10.0.1.1 (10.0.1.1)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.10:2:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out nolabel/21
SP3#

The DOMAIN ID has to match as we haven’t defined it. OSPF RT, is telling that is coming from OSPF area 10 and non-external. So SP3 can generate a LSA3 for 172.16.10.1/32 as we have OSPF area 10 defined too.

  • 5) From SP4 perspective. Same view as SP3. SP4 ignores LSA3 with Down-bit.
SP4#show ip bgp vpnv4 all 
BGP table version is 13, local router ID is 10.0.4.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*>i172.16.10.1/32   10.0.1.1                 2    100      0 ?
* i172.16.20.1/32   10.0.3.1                 2    100      0 ?
*>                  172.16.201.1             2         32768 ?
*>i172.16.100.0/24  10.0.1.1                 0    100      0 ?
*>i172.16.110.1/32  10.0.1.1                21    100      0 ?
* i172.16.200.0/24  10.0.3.1                 0    100      0 ?
*>                  172.16.201.1             2         32768 ?
* i172.16.201.0/24  10.0.3.1                 2    100      0 ?
*>                  0.0.0.0                  0         32768 ?
SP4#
SP4#
SP4#show ip ospf database summary 172.16.10.1

            OSPF Router with ID (10.0.4.1) (Process ID 1)

            OSPF Router with ID (172.16.201.254) (Process ID 10)

		Summary Net Link States (Area 10)

  LS age: 1489
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 172.16.10.1 (summary Network Number)
  Advertising Router: 172.16.200.254
  LS Seq Number: 80000003
  Checksum: 0xA62
  Length: 28
  Network Mask: /32
	TOS: 0 	Metric: 2 

  LS age: 1475
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 172.16.10.1 (summary Network Number)
  Advertising Router: 172.16.201.254
  LS Seq Number: 80000003
  Checksum: 0x368
  Length: 28
  Network Mask: /32
	TOS: 0 	Metric: 2 

SP4#  
SP4#show ip route vrf CUST-A

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
O       172.16.200.0/24 [110/2] via 172.16.201.1, 01:31:12, FastEthernet3/0
C       172.16.201.0/24 is directly connected, FastEthernet3/0
O       172.16.20.1/32 [110/2] via 172.16.201.1, 01:31:12, FastEthernet3/0
B       172.16.10.1/32 [200/2] via 10.0.1.1, 01:28:57
B       172.16.110.1/32 [200/21] via 10.0.1.1, 01:28:57
B       172.16.100.0/24 [200/0] via 10.0.1.1, 01:28:57
SP4#
  • 6) And Finally, BRANCH. It can see the prefix 172.16.10.1/32 (HQ) via two paths as we would expect. And without routing loops (the routes has been installed for over 1h 30minutes). BRANCH doesnt react to the Down-Bit so it accepts the LSA3 from SP2/3 and install the OSPF prefix.
BRANCH#show ip route                 
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
C       172.16.200.0/24 is directly connected, FastEthernet0/0
C       172.16.201.0/24 is directly connected, FastEthernet3/0
C       172.16.20.0/24 is directly connected, Loopback0
O IA    172.16.10.1/32 [110/3] via 172.16.201.254, 01:30:38, FastEthernet3/0
                       [110/3] via 172.16.200.254, 01:30:39, FastEthernet0/0
O E1    172.16.110.1/32 [110/22] via 172.16.201.254, 01:30:34, FastEthernet3/0
                        [110/22] via 172.16.200.254, 01:30:34, FastEthernet0/0
O IA    172.16.100.0/24 [110/2] via 172.16.201.254, 01:30:38, FastEthernet3/0
                        [110/2] via 172.16.200.254, 01:30:39, FastEthernet0/0
BRANCH#
BRANCH#
BRANCH#
BRANCH#show ip ospf database summary 172.16.10.1

            OSPF Router with ID (172.16.20.1) (Process ID 1)

		Summary Net Link States (Area 10)

  Routing Bit Set on this LSA
  LS age: 1599
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 172.16.10.1 (summary Network Number)
  Advertising Router: 172.16.200.254
  LS Seq Number: 80000003
  Checksum: 0xA62
  Length: 28
  Network Mask: /32
	TOS: 0 	Metric: 2 

  Routing Bit Set on this LSA
  LS age: 1587
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 172.16.10.1 (summary Network Number)
  Advertising Router: 172.16.201.254
  LS Seq Number: 80000003
  Checksum: 0x368
  Length: 28
  Network Mask: /32
	TOS: 0 	Metric: 2 

BRANCH#  

So, we have seen the Down-bit in action for LSA3. But what about the external LSA: LSA5 and LSA7? How we avoid routing loops for them?

In this case, we have the “tag” field. This is explained in the rfc too.

  • 1) In the same scenario, we have HQ router advertising 172.16.110.1/32 as LSA5 External.
HQ#
HQ#show ip interface brief 
Interface                  IP-Address      OK? Method Status                Protocol
FastEthernet0/0            172.16.100.1    YES NVRAM  up                    up      
GigabitEthernet1/0         unassigned      YES NVRAM  administratively down down    
GigabitEthernet2/0         unassigned      YES NVRAM  administratively down down    
FastEthernet3/0            unassigned      YES NVRAM  administratively down down    
FastEthernet3/1            unassigned      YES NVRAM  administratively down down    
Loopback0                  172.16.10.1     YES NVRAM  up                    up      
Loopback1                  172.16.110.1    YES NVRAM  up                    up      
HQ#
HQ#
HQ#
HQ#show ip ospf database          

            OSPF Router with ID (172.16.110.1) (Process ID 1)

		Router Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum Link count
172.16.100.254  172.16.100.254  1270        0x80000005 0x00D7D1 1
172.16.110.1    172.16.110.1    1272        0x80000005 0x006E49 2

		Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.100.1    172.16.110.1    1272        0x80000004 0x007824

		Summary Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.20.1     172.16.100.254  1270        0x80000004 0x00586D
172.16.200.0    172.16.100.254  1270        0x80000004 0x00947E
172.16.201.0    172.16.100.254  1270        0x80000004 0x008988

		Type-5 AS External Link States

Link ID         ADV Router      Age         Seq#       Checksum Tag
172.16.110.1    172.16.110.1    1272        0x80000004 0x007253 0
HQ# 
HQ#
HQ#show ip ospf database external 

            OSPF Router with ID (172.16.110.1) (Process ID 1)

		Type-5 AS External Link States

  LS age: 1276
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 172.16.110.1 (External Network Number )
  Advertising Router: 172.16.110.1
  LS Seq Number: 80000004
  Checksum: 0x7253
  Length: 36
  Network Mask: /32
	Metric Type: 1 (Comparable directly to link state metric)
	TOS: 0 
	Metric: 20 
	Forward Address: 0.0.0.0
	External Route Tag: 0

HQ#
  • 2) SP1 sees 172.16.110.1/32 as OSPF E1. And redistribute it into BGP and creates a VPNv4
SP1#
SP1#show ip route vrf CUST-A       

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
B       172.16.200.0/24 [200/0] via 10.0.3.1, 02:00:18
B       172.16.201.0/24 [200/0] via 10.0.4.1, 02:00:18
B       172.16.20.1/32 [200/2] via 10.0.3.1, 02:00:18
O       172.16.10.1/32 [110/2] via 172.16.100.1, 02:02:29, FastEthernet0/0
O E1    172.16.110.1/32 [110/21] via 172.16.100.1, 02:02:29, FastEthernet0/0
C       172.16.100.0/24 is directly connected, FastEthernet0/0
SP1#
SP1#
SP1#       
SP1#show ip ospf database 

            OSPF Router with ID (10.0.1.1) (Process ID 1)

		Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
10.0.1.1        10.0.1.1        1303        0x80000005 0x00D5F4 2
10.0.2.1        10.0.2.1        1350        0x80000006 0x00D39C 3
10.0.3.1        10.0.3.1        1554        0x80000006 0x007FC3 3
10.0.4.1        10.0.4.1        1352        0x80000005 0x00CCC7 2

		Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.12.2       10.0.2.1        1350        0x80000004 0x00FBFC
10.0.23.1       10.0.3.1        1554        0x80000004 0x009B50
10.0.34.2       10.0.4.1        1352        0x80000004 0x0027B5

            OSPF Router with ID (172.16.100.254) (Process ID 10)

		Router Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum Link count
172.16.100.254  172.16.100.254  1400        0x80000005 0x00D7D1 1
172.16.110.1    172.16.110.1    1405        0x80000005 0x006E49 2

		Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.100.1    172.16.110.1    1405        0x80000004 0x007824

		Summary Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.20.1     172.16.100.254  1400        0x80000004 0x00586D
172.16.200.0    172.16.100.254  1400        0x80000004 0x00947E
172.16.201.0    172.16.100.254  1400        0x80000004 0x008988

		Type-5 AS External Link States

Link ID         ADV Router      Age         Seq#       Checksum Tag
172.16.110.1    172.16.110.1    1405        0x80000004 0x007253 0
SP1#  
SP1#
SP1#
SP1#show ip ospf database external 

            OSPF Router with ID (10.0.1.1) (Process ID 1)

            OSPF Router with ID (172.16.100.254) (Process ID 10)

		Type-5 AS External Link States

  Routing Bit Set on this LSA
  LS age: 1409
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 172.16.110.1 (External Network Number )
  Advertising Router: 172.16.110.1
  LS Seq Number: 80000004
  Checksum: 0x7253
  Length: 36
  Network Mask: /32
	Metric Type: 1 (Comparable directly to link state metric)
	TOS: 0 
	Metric: 20 
	Forward Address: 0.0.0.0
	External Route Tag: 0

SP1#
SP1#show ip bgp vpnv4 all 
BGP table version is 14, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*> 172.16.10.1/32   172.16.100.1             2         32768 ?
* i172.16.20.1/32   10.0.4.1                 2    100      0 ?
*>i                 10.0.3.1                 2    100      0 ?
*> 172.16.100.0/24  0.0.0.0                  0         32768 ?
*> 172.16.110.1/32  172.16.100.1            21         32768 ?
* i172.16.200.0/24  10.0.4.1                 2    100      0 ?
*>i                 10.0.3.1                 0    100      0 ?
*>i172.16.201.0/24  10.0.4.1                 0    100      0 ?
* i                 10.0.3.1                 2    100      0 ?
SP1#
SP1#show ip bgp vpnv4 rd 100:1 172.16.110.1/32                   
BGP routing table entry for 100:1:172.16.110.1/32, version 7
Paths: (1 available, best #1, table CUST-A)
  Advertised to update-groups:
        2
  Local
    172.16.100.1 from 0.0.0.0 (10.0.1.1)
      Origin incomplete, metric 21, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.0:5:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out 23/nolabel
SP1#


  • 3) Again SP2, is transparent.
  • 4) SP3 receives the VPNv4 for 172.16.110.1/32 from SP1. Installs it into BGP and then redistribute to OSPF. If we compare the ospf database output of SP1 with SP3. We see that SP3 has a different value for “tag” in 172.16.110.1/32. So that tags is created by SP3 when redistributing the BGP prefix to OSPF (based on the extended communities in the VPNv4 prefix). As per the rfc, the tag is generated based on the ASN (100). As are all our SPs are in the same ASN, the tag will be the same in all of PE generating the LSA from the VPNv4.
SP3#show ip bgp vpnv4  all 
BGP table version is 13, local router ID is 10.0.3.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*>i172.16.10.1/32   10.0.1.1                 2    100      0 ?
* i172.16.20.1/32   10.0.4.1                 2    100      0 ?
*>                  172.16.200.1             2         32768 ?
*>i172.16.100.0/24  10.0.1.1                 0    100      0 ?
*>i172.16.110.1/32  10.0.1.1                21    100      0 ?
* i172.16.200.0/24  10.0.4.1                 2    100      0 ?
*>                  0.0.0.0                  0         32768 ?
* i172.16.201.0/24  10.0.4.1                 0    100      0 ?
*>                  172.16.200.1             2         32768 ?
SP3#
SP3#
SP3#show ip route vrf CUST-A 

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
C       172.16.200.0/24 is directly connected, FastEthernet0/0
O       172.16.201.0/24 [110/2] via 172.16.200.1, 02:06:43, FastEthernet0/0
O       172.16.20.1/32 [110/2] via 172.16.200.1, 02:06:43, FastEthernet0/0
B       172.16.10.1/32 [200/2] via 10.0.1.1, 02:04:33
B       172.16.110.1/32 [200/21] via 10.0.1.1, 02:04:33
B       172.16.100.0/24 [200/0] via 10.0.1.1, 02:04:33
SP3#
SP3#
SP3#show ip ospf database 

            OSPF Router with ID (10.0.3.1) (Process ID 1)

		Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
10.0.1.1        10.0.1.1        1556        0x80000005 0x00D5F4 2
10.0.2.1        10.0.2.1        1602        0x80000006 0x00D39C 3
10.0.3.1        10.0.3.1        1804        0x80000006 0x007FC3 3
10.0.4.1        10.0.4.1        1602        0x80000005 0x00CCC7 2

		Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.12.2       10.0.2.1        1602        0x80000004 0x00FBFC
10.0.23.1       10.0.3.1        1804        0x80000004 0x009B50
10.0.34.2       10.0.4.1        1602        0x80000004 0x0027B5

            OSPF Router with ID (172.16.200.254) (Process ID 10)

		Router Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum Link count
172.16.20.1     172.16.20.1     1640        0x80000006 0x00710E 3
172.16.200.254  172.16.200.254  1625        0x80000005 0x0055C4 1
172.16.201.254  172.16.201.254  1626        0x80000005 0x0059BC 1

		Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.200.254  172.16.200.254  1625        0x80000004 0x00F0E6
172.16.201.254  172.16.201.254  1626        0x80000004 0x00E7EC

		Summary Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.10.1     172.16.200.254  1625        0x80000004 0x000863
172.16.10.1     172.16.201.254  1626        0x80000004 0x000169
172.16.100.0    172.16.200.254  1625        0x80000004 0x0026EC
172.16.100.0    172.16.201.254  1626        0x80000004 0x001FF2

		Type-5 AS External Link States

Link ID         ADV Router      Age         Seq#       Checksum Tag
172.16.110.1    172.16.200.254  1625        0x80000004 0x005BDB 3489661028
172.16.110.1    172.16.201.254  1626        0x80000004 0x0054E1 3489661028
SP3#  
  • 5) So let’s see with details the VPNv4 prefix for 172.16.10.1/32 (OSPF LSA3) and 172.16.110.1/32 (OSPF LSA5). Both originated by HQ.
SP3#show ip bgp vpnv4 rd 100:1 172.16.10.1/32 
BGP routing table entry for 100:1:172.16.10.1/32, version 8
Paths: (1 available, best #1, table CUST-A)
  Not advertised to any peer
  Local
    10.0.1.1 (metric 3) from 10.0.1.1 (10.0.1.1)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.10:2:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out nolabel/21
SP3#
SP3#show ip bgp vpnv4 rd 100:1 172.16.110.1/32
BGP routing table entry for 100:1:172.16.110.1/32, version 11
Paths: (1 available, best #1, table CUST-A)
  Not advertised to any peer
  Local
    10.0.1.1 (metric 3) from 10.0.1.1 (10.0.1.1)
      Origin incomplete, metric 21, localpref 100, valid, internal, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.0:5:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out nolabel/23
SP3#
  • 6) So SP3, based on the Extended communities, knows the VPNv4 prefix 172.16.110.1/32 was an OSPF LSA5 and it creates a tag. Keep in mind that SP4 is doing exactly the same thing as SP3:
SP4#
SP4#show ip route vrf CUST-A                   

Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     172.16.0.0/16 is variably subnetted, 6 subnets, 2 masks
O       172.16.200.0/24 [110/2] via 172.16.201.1, 02:18:34, FastEthernet3/0
C       172.16.201.0/24 is directly connected, FastEthernet3/0
O       172.16.20.1/32 [110/2] via 172.16.201.1, 02:18:34, FastEthernet3/0
B       172.16.10.1/32 [200/2] via 10.0.1.1, 02:16:19
B       172.16.110.1/32 [200/21] via 10.0.1.1, 02:16:19
B       172.16.100.0/24 [200/0] via 10.0.1.1, 02:16:19
SP4#
SP4#
SP4#
SP4#show ip ospf database   

            OSPF Router with ID (10.0.4.1) (Process ID 1)

		Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
10.0.1.1        10.0.1.1        253         0x80000006 0x00D3F5 2
10.0.2.1        10.0.2.1        310         0x80000007 0x00D19D 3
10.0.3.1        10.0.3.1        504         0x80000007 0x007DC4 3
10.0.4.1        10.0.4.1        301         0x80000006 0x00CAC8 2

		Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.12.2       10.0.2.1        310         0x80000005 0x00F9FD
10.0.23.1       10.0.3.1        504         0x80000005 0x009951
10.0.34.2       10.0.4.1        301         0x80000005 0x0025B6

            OSPF Router with ID (172.16.201.254) (Process ID 10)

		Router Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum Link count
172.16.20.1     172.16.20.1     315         0x80000007 0x006F0F 3
172.16.200.254  172.16.200.254  347         0x80000006 0x0053C5 1
172.16.201.254  172.16.201.254  315         0x80000006 0x0057BD 1

		Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.200.254  172.16.200.254  347         0x80000005 0x00EEE7
172.16.201.254  172.16.201.254  315         0x80000005 0x00E5ED

		Summary Net Link States (Area 10)

Link ID         ADV Router      Age         Seq#       Checksum
172.16.10.1     172.16.200.254  347         0x80000005 0x000664
172.16.10.1     172.16.201.254  315         0x80000005 0x00FE6A
172.16.100.0    172.16.200.254  347         0x80000005 0x0024ED
172.16.100.0    172.16.201.254  315         0x80000005 0x001DF3

		Type-5 AS External Link States

Link ID         ADV Router      Age         Seq#       Checksum Tag
172.16.110.1    172.16.200.254  347         0x80000005 0x0059DC 3489661028
172.16.110.1    172.16.201.254  315         0x80000005 0x0052E2 3489661028
SP4#   
SP4#
SP4#
SP4#show ip ospf database external 172.16.110.1

            OSPF Router with ID (10.0.4.1) (Process ID 1)

            OSPF Router with ID (172.16.201.254) (Process ID 10)

		Type-5 AS External Link States

  LS age: 350
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 172.16.110.1 (External Network Number )
  Advertising Router: 172.16.200.254
  LS Seq Number: 80000005
  Checksum: 0x59DC
  Length: 36
  Network Mask: /32
	Metric Type: 1 (Comparable directly to link state metric)
	TOS: 0 
	Metric: 21 
	Forward Address: 0.0.0.0
	External Route Tag: 3489661028

  LS age: 319
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 172.16.110.1 (External Network Number )
  Advertising Router: 172.16.201.254
  LS Seq Number: 80000005
  Checksum: 0x52E2
  Length: 36
  Network Mask: /32
	Metric Type: 1 (Comparable directly to link state metric)
	TOS: 0 
	Metric: 21 
	Forward Address: 0.0.0.0
	External Route Tag: 3489661028

SP4#   
SP4#
SP4#
SP4#show ip bgp vpnv4 rd 100:1 172.16.10.1/32
BGP routing table entry for 100:1:172.16.10.1/32, version 8
Paths: (1 available, best #1, table CUST-A)
  Not advertised to any peer
  Local
    10.0.1.1 (metric 4) from 10.0.1.1 (10.0.1.1)
      Origin incomplete, metric 2, localpref 100, valid, internal, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.10:2:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out nolabel/21
SP4#
SP4#
SP4#show ip bgp vpnv4 rd 100:1 172.16.110.1/32
BGP routing table entry for 100:1:172.16.110.1/32, version 11
Paths: (1 available, best #1, table CUST-A)
  Not advertised to any peer
  Local
    10.0.1.1 (metric 4) from 10.0.1.1 (10.0.1.1)
      Origin incomplete, metric 21, localpref 100, valid, internal, best
      Extended Community: RT:1:100 OSPF DOMAIN ID:0x0005:0x0000000A0200 
        OSPF RT:0.0.0.0:5:0 OSPF ROUTER ID:172.16.100.254:0
      mpls labels in/out nolabel/23
SP4#
  • 7) As you can see, SP3 and SP4 are generating the same “tag” 3489661028 for the LSA5 172.16.110.1/32 (because they are in the same ASN 100). So as the receiving LSA for the other SP in the same Area 10 has the same tag, SP3/SP4 ignore the LSA. And again, the BGP prefix is installed in the routing table instead of the OSPF AD110 172.16.110.1/32 and we dont have a routing loop.

Outages part 1

Cloudflare had an outage last week. And this time, I felt quite identify with that situation as it could happen to me:

https://blog.cloudflare.com/cloudflare-dashboard-and-api-outage-on-april-15-2020/

Conclusions

  • Design: When you aim for HA, even a single patch panel is a SPOF no matther how much redundancy you have in your transit providers, routers, switches, firewalls, etc etc. So, look for SPOF!
  • Documentation: For DC stuff, in my current employer we use patchmanager. It is supper handy for remote locations and it is our source of truth. Keep in mind that tool is as good as you keep it updated…. For example, for the PoPs we visit more often and we make more changes, we find more failures that we would like… For remote PoPs, as we know we are not going to come back for a couple of years, we are much more throrough. For network kit, we have RANCID+Git so we know always the lattest config and when changes where introduced (in 30m intervals at least).
  • Process: We follow a risk assesment for any change we plan to introduce. Then on Thursday we have a CAB metting to schedule what changes are going to happen during the weekend. The aim is to have several people from different teams to understand and have a say in what is going to happen. This has proobed very useful. Four pairs of eyes are better than half πŸ™‚ Still you need to be regirous in this process

Even having all this into account, you will have an outage. Have a retrospective, learn from it (no finger pointing) and apply it. Trully agile πŸ˜›