Design: When you aim for HA, even a single patch panel is a SPOF no matther how much redundancy you have in your transit providers, routers, switches, firewalls, etc etc. So, look for SPOF!
Documentation: For DC stuff, in my current employer we use patchmanager. It is supper handy for remote locations and it is our source of truth. Keep in mind that tool is as good as you keep it updated…. For example, for the PoPs we visit more often and we make more changes, we find more failures that we would like… For remote PoPs, as we know we are not going to come back for a couple of years, we are much more throrough. For network kit, we have RANCID+Git so we know always the lattest config and when changes where introduced (in 30m intervals at least).
Process: We follow a risk assesment for any change we plan to introduce. Then on Thursday we have a CAB metting to schedule what changes are going to happen during the weekend. The aim is to have several people from different teams to understand and have a say in what is going to happen. This has proobed very useful. Four pairs of eyes are better than half 🙂 Still you need to be regirous in this process
Even having all this into account, you will have an outage. Have a retrospective, learn from it (no finger pointing) and apply it. Trully agile 😛
When I was studying CCNP back in 2009, I found a lab for MPLS that was fantastic. It showed how to buid a MPLS L3VPN network from scratch. I managed to build that in my laptop with GNS3 at that time.
Now I want to review some MPLS features so I decided to install GNS3 and build that lab again. You can find it in my github account (that was gathering dust…):
I searched several pages to find out how to do it now a days. It seems it is mainly managed by python. This is what I had to do for Debian 10 (Testing)
Once I managed to run the program, I found some other issues.
First, you need to get the software to simulate the routers. I searched for recommended images for running MPLS and it seems c7200-adventerprisek9-mz.124-24.T2 was a good one. If you search for that, it will not be difficult to find somewhere to download it.
It took me a while but at the end I could create a lab with several routers, power them up and login to them.
As well, I modified GNS3 to use “terminator” as default terminal when connecting to the devices. That was handy.
There are many things you can configure with GNS3. Like a basic linux host to test (Alpine). I installed it as I think it will be usefull in the future:
There are many more things you can configure but for what I want, this is enough.
MPLS L3VPN
So once we have aworking GNS3 environment, we can get our hands dirty and create our MPLS L3VPN lab.
This is the diagram:
We are going to simulate a Service Provider (SP) network that is formed by SP1, SP2 and SP3. The customer network CUST-A is formed by HQ and Branch:
SP1 and SP3 will be PE routers (they will manage the L3VPN) (PE = Provider Edge)
SP2 will be just a P router (doesnt have visibility of any L3VPN, just handle labels) (P = Provider)
HQ and BRANCH are CPE routers (Customer Provider Edge). They interact with PE.
The SP network uses OSPF (area0 – backbone area) as IGP to build the iBGP full mesh (as100)
CUST-A is connected to our SP network in different locations. Internally is running its own routing and both locations are in the same OSPF area 10. So the prefixes learned in HQ and Branch should be seen as Inter-Area (IA). This is quite important.
So, lets get step by step:
1- IP addressing
We need to configure the IP connectivity in all links
SP1
---
!
interface Loopback0
ip address 10.0.1.1 255.255.255.255
!
interface GigabitEthernet1/0
description to SP2-P
ip address 10.0.12.1 255.255.255.0
!
interface FastEthernet0/0
description to HQ
ip address 172.16.100.254 255.255.255.0
!
SP2
---
!
interface Loopback0
ip address 10.0.2.1 255.255.255.255
!
interface GigabitEthernet1/0
description to SP1-PE
ip address 10.0.12.2 255.255.255.0
!
interface GigabitEthernet2/0
description to SP3-PE
ip address 10.0.23.2 255.255.255.0
!
SP3
---
!
interface Loopback0
ip address 10.0.3.1 255.255.255.255
!
interface GigabitEthernet1/0
description to SP2-P
ip address 10.0.23.1 255.255.255.0
!
interface FastEthernet0/0
description to BRANCH
ip address 172.16.200.254 255.255.255.0
!
HQ
---
!
interface Loopback0
ip address 172.16.10.1 255.255.255.0
!
interface FastEthernet0/0
description to SP1-PE
ip address 172.16.100.1 255.255.255.0
!
BRANCH
---
!
interface Loopback0
ip address 172.16.20.1 255.255.255.0
!
interface FastEthernet0/0
description to SP3-PE
ip address 172.16.200.1 255.255.255.0
!
Verify you can ping each directly connected router:
HQ#ping 172.16.100.254
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.100.254, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms
HQ#
2- Routing in SP
We are going to configure OSPF (area 0 – backbone) as IGP in our SP network. We only want the SP loopbacks and backbone links in OSPF.
SP1
---
router ospf 1
network 10.0.1.0 0.0.0.255 area 0
network 10.0.12.0 0.0.0.255 area 0
SP2
---
router ospf 1
network 10.0.2.0 0.0.0.255 area 0
network 10.0.12.0 0.0.0.255 area 0
network 10.0.23.0 0.0.0.255 area 0
SP3
---
router ospf 1
network 10.0.3.0 0.0.0.255 area 0
network 10.0.23.0 0.0.0.255 area 0
Verify that OSFP comes up in all expected links. If SP2 has two neighbors, all good:
SP2#show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface
10.0.3.1 1 FULL/DR 00:00:36 10.0.23.1 GigabitEthernet2/0
10.0.1.1 1 FULL/DR 00:00:34 10.0.12.1 GigabitEthernet1/0
SP2#
3- MPLS in SP
Now we are going to configure MPLS in SP. We are going to use LDP for label distribution. The configuration is pretty easy, just enable LDP using Lo0 as router ID and configure mpls only in backbone links.
SP1
---
mpls ldp router-id Loopback0 force
!
interface GigabitEthernet1/0
description to SP2-P
mpls ip
!
SP2
---
mpls ldp router-id Loopback0 force
!
interface GigabitEthernet1/0
description to SP1-PE
mpls ip
!
interface GigabitEthernet2/0
description to SP3-PE
mpls ip
!
SP3
---
mpls ldp router-id Loopback0 force
!
interface GigabitEthernet1/0
description to SP2-P
mpls ip
!
Check that LDP neighbors come up. If SP2 has two, all good.
For SP, you need each customer in a VRF so you can isolate them and the customer can use any IP addressing schema. You need to make those IP prefixes unique inside SP if you want to exchange them via a routing protocol. For doing that, you need to create VPNV4 addresses that are a combination of the customer IP prefix and a RD (Router Distinguisher – 8 bytes). Each VRF has a RD and is locally significant, you could configure each PE with CUST-A using a different RD, but as best practive we keep the same RD per VRF. Having each VRF with a different RD, eachc customer cand use the same private IP prefix but to the SP eyes, after building the VPNV4, the customer prefixes will be diffierent and there will no be leaking (if you dont configure it). For exporting/importing prefixes in a VRF, we use RT (Route Target). And that is defined during the VRF creating too. Keep in mind that we only define VRFs in PE routers (SP1 and SP3). The P routers (SP2) dont need to know. In our case we are going to use RD 100:1 and RT 1:100.
The config above says that for each VPNv4 prefix we export from CUST-A VRF we add RT 1:100. And for any VPNv4 prefix with RT:1:00 learned by the router (more about this later in point 6) we will import it in CUST-A VRF.
Now we can configure the links to customers in their own VRF:
SP1
---
!
interface FastEthernet0/0
description to HQ
ip vrf forwarding CUST-A
ip address 172.16.100.254 255.255.255.0
!
SP3
---
!
interface FastEthernet0/0
description to BRANCH
ip vrf forwarding CUST-A
ip address 172.16.200.254 255.255.255.0
!
Now, let’s check if we keep the IP connectivity with CUST-A from our PE routers.
SP1#ping vrf CUST-A 172.16.100.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.100.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/21/32 ms
SP1#
Keep in mind, that for CUST-A, all this is transparent.
5- Customer Routing
So now, we want our customer devices to exchange routes because they want reach each other (if not, why do you want a network 🙂 We are going to use OSPF in area 10 as routing protocol between CUST-A and SP. But SP is already using OSFP??? Yes, but keep in mind that we are using VRFs, and the OSPF implementation will be in the customer VRF. It will not interact with the SP OSPF Area 0. So we need to configure OSPF in the interfaces connecting to SP and interfaces we want to advertise (Lo0). And again, our SP2 (P) doesnt need to know anything about this.
HQ
!
router ospf 1
log-adjacency-changes
network 172.16.10.0 0.0.0.255 area 10
network 172.16.100.0 0.0.0.255 area 10
!
BRANCH
!
router ospf 1
log-adjacency-changes
network 172.16.20.0 0.0.0.255 area 10
network 172.16.200.0 0.0.0.255 area 10
SP1
!
router ospf 10 vrf CUST-A
network 172.16.100.0 0.0.0.255 area 10
!
SP3
!
router ospf 10 vrf CUST-A
network 172.16.200.0 0.0.0.255 area 10
!
Check that OSPF between CUST-A devices and SP comes up, and you are learning the CUST-A prefixes:
SP1#show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface
10.0.2.1 1 FULL/BDR 00:00:36 10.0.12.2 GigabitEthernet1/0
172.16.10.1 1 FULL/BDR 00:00:38 172.16.100.1 FastEthernet0/0
SP1#
SP1#show ip route vrf CUST-A
Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks
O 172.16.10.1/32 [110/2] via 172.16.100.1, 00:38:48, FastEthernet0/0
C 172.16.100.0/24 is directly connected, FastEthernet0/0
SP1#
In the output above, you will see that SP1 has two OSPF neighbors, one to SP2 (OSPF area 0 – backbone) and one to HQ (CUST-A VRF). As well, you will see that SP1 is learning Lo0 prefix from HQ via OSPF.
6- BGP
Now we have routing between SP1-HQ and SP3-BRANCH. But we dont have communication between HQ-BRANCH yet. And this is the goal at the end of the day.
So now, we need our PE routers to exchange the customer prefixes. We are going to use BGP/MP-BGP.
As we are in the same AS100 (Autonomous System) we are going to use iBGP (internal BGP). Following best practices, we are going to build our full mesh iBGP on loopbacks. iBGP relays on a IGP, and that is already configured via OSPF in our network.
SP1#show ip bgp summary
BGP router identifier 10.0.1.1, local AS number 100
BGP table version is 1, main routing table version 1
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.3.1 4 100 53 53 1 0 0 00:50:08 0
SP1#
So we have BGP between our PE routers. Now we need to configure the exchange of those VPNv4 routes so CUST-A devices can learn prefixes from its own network.
The above part is the MP-BGP part (MultiProtocol-BGP). Inside our BGP connection between SP1-SP3 we have enabled a type of exchange of prefixes for vpnv4.
But, we dont have VPNv4 prefixes yet.
SP1#show ip bgp vpnv4 all
BGP table version is 7, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
SP1#
Because they need to be in the BGP table first. We have routing between PE-CE (ospf area 10) but we dont have any kind of redistribution between OSPF-BGP. Let’s do that:
SP1#show ip bgp vpnv4 all
BGP table version is 9, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf CUST-A)
*> 172.16.10.1/32 172.16.100.1 2 32768 ?
*>i172.16.20.1/32 10.0.3.1 2 100 0 ?
*> 172.16.100.0/24 0.0.0.0 0 32768 ?
*>i172.16.200.0/24 10.0.3.1 0 100 0 ?
SP1#
So now, we can see in SP1 the prefixes from HQ and BRANCH routers!
Now, let’s check the CUST-A routing table from SP1 and HQ:
SP1#show ip route vrf CUST-A
Routing Table: CUST-A
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
B 172.16.200.0/24 [200/0] via 10.0.3.1, 01:08:19
B 172.16.20.1/32 [200/2] via 10.0.3.1, 00:01:33
O 172.16.10.1/32 [110/2] via 172.16.100.1, 01:15:49, FastEthernet0/0
C 172.16.100.0/24 is directly connected, FastEthernet0/0
SP1#
HQ#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
172.16.0.0/16 is variably subnetted, 4 subnets, 2 masks
O IA 172.16.200.0/24 [110/2] via 172.16.100.254, 01:07:09, FastEthernet0/0
O IA 172.16.20.1/32 [110/3] via 172.16.100.254, 00:00:22, FastEthernet0/0
C 172.16.10.0/24 is directly connected, Loopback0
C 172.16.100.0/24 is directly connected, FastEthernet0/0
HQ#
HQ#
So from SP1, we can see that is learning BRANCH 172.16.200.0/24 and 172.16.20.1/32 via iBGP (from SP3 loopback 10.0.3.1). The 172.16.10.1/32 (HQ loopback) via OSPF.
From HQ, we see it is learning BRANCH Prefixes too and they come up as O IA. This is very important, and it is material for another post about OSPF Down-Bit.
7- Conclusion
We have IP connectivity between our CUST-A devices across a MPLS L3VPN
HQ#ping 172.16.20.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.20.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 44/76/156 ms
HQ#
HQ#traceroute 172.16.20.1
Type escape sequence to abort.
Tracing the route to 172.16.20.1
1 172.16.100.254 16 msec 8 msec 12 msec
2 10.0.12.2 [MPLS: Labels 17/20 Exp 0] 72 msec 40 msec 60 msec
3 172.16.200.254 [MPLS: Label 20 Exp 0] 52 msec 8 msec 60 msec
4 172.16.200.1 40 msec 52 msec 40 msec
HQ#
We have built a MPLS L3VPN from the bottom up. There are many points that can be explained with much more detail but that wasnt the goal. Just wanted to build this MPLS network so I can do some hands-on troubleshooting and review a couple of concepts.
Ideally, at some point, I would like to build a MPLS Segment Routing GNS3 networks.
Today I had to troubleshoot a websocket issue. I had never dealt with this before. I was told that HAproxy config was fine that it was to be our NGFW doing something nasty at L7.
The connection directly to the server doing websocket was fine from my PC but for some requirement we need to put that server behing a HAproxy. From my PC to the haproxy that is doing “proxy” fore the websocket service failed…
Funny enough HAproxy and the websocket service were running in the same host.
As usual I took a look at the firewall logs. Nothing wrong there at first sight. I took a tcpdump from my pc when connecting to the websocket service and to the haproxy.
The service is very verbose and it is difficult to follow in the capture files as it spawns several connections. I went to the easy part, the capture to the haproxy was showing a lot of TCP retransmissions… The other trace to the websocket service was pretty clean.
Taking into account that the path from my PC to the haproxy server is the always the same (and I was going through a VPN) I could think it was a NGFW issue or something between HAproxy and the websocket service (that is a localhost connection).
As well, I was seeing weird things latency wise. Some TCP resets were taking more than 200ms to arrive to the server when the average RTT was 3ms.
I tried to take a tcpdump between the haproxy service and the websocket service just in case that packet loss was caused locally. The capture was chaos to follow. I had to understand better the sessions in HAproxy.
I changed direction and I went to the NGFW and created a rule that disabled any fancy security check for me to the haproxy server. I wanted to be sure the firewall was innocent.
It was. Same issue. I tried different browsers and always the same.
So I was nearly sure the problem was in HAproxy but I had to prove it. I kind of failed checking the backend connection (haproxy to websockt proxy) so I took again a look to the trace from my pc to haproxy. I was quite frustrated because there was so many connetions openned and then retransmissions started happening that I couldnt really see any problem.
By luck, I noticed that in the good trace (the one going directly to the websocket service) I could see a HTTP GET request for “socket” from my PC. Keep in mind that I have no idea how websocket works. I tried to find a similar request in the haproxy trace, and I saw the problem….
Rejected HTTP GET socket request
and this is a good connection:
Successful HTTP GET socket request
So at the end, HAproxy was at fault (we dont know how to fix it though yet) and my firewall (for once) it is innocent.
The summary, I got overwhelmed by the TCP retransmissions. I was lucky that I saw the GET socket and I assumed that had to be the way to get the websocket connection established. So I should have started investigating how a websocket connections is stablished. As well, I didnt manage to find the HAproxy logs, I am pretty sure I should have found the same answer. So I need to learn to check that.
I learned something new. As usual, it didnt come easy neither quick 🙂
I use gkrellm as my linux monitoring app. I have used it since I started but something I miss is I would like to know what app and destination IPs are causing a traffic spike in my laptop.
Searching a bit a come up with this page with several tools:
I am subscribed to Cloudflare blog as they are in general really good. And definitely, you always learn something new (and want to cry because you have so much to learn from these guys).
This time was a dissection of conntrack in iptables to improve their firewall performance.
At work, we use a vendor whose Network Operating System (NOS) is based in Linux. I am a network engineer so I was troubleshooting an issue inside a VRF. I couldn’t use much of the normal commands in the default VRF. So I opened a ticket with the vendor and learned a bit how the VRFs are implemented under the hoods. Obviously (not for me) they use Linux Namespaces, after googling the meaning of the commands they sent. My search brought me to the following links:
$ sudo ip netns list
$ sudo ip netns exec ns-INET ip link list
$ sudo ip netns exec ns-VRF1 arp -a
$ sudo ip netns exec ns-VRF1 route -n
$ sudo ip netns exec ns-VRF1 telnet -b src_ip dst_ip port
$ sudo ip netns exec ns-VRF1 tcpdump -i lo4 -nn tcp 179
$ sudo ip netns exec ns-VRF1 ss --tcp --info
$ sudo ip netns exec ns-VRF1 ss --tcp --info -nt src IP
As well, “ss” is such a useful command for troubleshooting and I always feel that I dont make the most of it: