Today I have had “fun” troubleshooting an issue that looked easy at first sight. A colleague was trying to PXE boot some server from a network that we haven’t used for a while.
When the server boots up, asks for an IP via DHCP. As we have a centralized DHCP server infrastructure, we have configured DHCP relay in the firewall facing that server to send that request to the DHCP server.
First, let’s take a look at how DHCP relay works. This is a very good link. And this diagram from the mentioned link it is really useful:
One think I learned is the reply (DCHP Offer) doesnt have to use as destination IP the same IP it received as source in DHCP Discover. In the picture, it is packet 2a.
Checking in our environment, we confirm that:
Our server is in 10.94.240.x network. Our firewall is acting as DHCP relay, and send the DHCP Discovery (unicast) to our VIP DHCP Server IP.
The DHCP offer, uses as source the physical IP of the DHCP server and destination is the DHCP relay IP (so it is 10.94.240.1 – the firewall IP in 10.94.240.x network)
Ok, so everything looks fine? No really. The server receives the query, it answers… but we dont see a DCHP Request/ACK.
BTW, keep in mind that DHCP is UDP….
So, we need to see where the packets are lost.
This is a high level path flow between the client and server:
So we need to check this connection is three different firewall vendors….
The initial troubleshooting was just using the GUI tools from Palo/Fortigate. We couldn see anything…. but the server was constantly receiving DHCP Discover and sending DHCP Offer… I dont get it:
# tcpdump -i X udp port 67 or 66 -nn 14:58:06.969462 IP 10.81.25.1.67 > 10.81.251.47.67: BOOTP/DHCP, Request from 6c:2b:59:c1:32:73, length 300 14:58:06.969564 IP 10.81.251.201.67 > 10.94.240.1.67: BOOTP/DHCP, Reply, length 300 14:58:28.329048 IP 10.81.25.1.67 > 10.81.251.47.67: BOOTP/DHCP, Request from 6c:2b:59:c1:32:73, length 300 14:58:28.329157 IP 10.81.251.201.67 > 10.94.240.1.67: BOOTP/DHCP, Reply, length 300
Initially it took me a while to see the request/reply because I was assuming the dhcp request had source 10.94.240.1. So I was seeing only the Reply but not the Request. That was when I went to clarify my head about DHCP Relay and found the link.
So ok, we have the DHCP Request/Reply, but absolutely nothing in the Palo. Is the palo dropping the packets or is forwarding? No idea. The GUI says nothing, I took a packet capture and couldnt see that traffic neither…
Doesnt makes sense.
Let’s get back to basic.
Did I mention DHCP is UDP? So how a next generation firewall (like paloalto) with all the fancy features enable (we have nearly all of them enable…) treats a UDP connection? UDP is stateless… but the firewall is statefull… the firewall creates a flow with the first packet so it can track, any new packet is considered part of that flow. But why we dont see the flows? We actually have only one flow. The firewall has created that session and offloaded to hardware. So you dont see anything else in the control-plane / GUI. The GUI only shows the end of a connection/flow. And as our flow DHCP Relay hasnt’ terminated (it is UDP) and the firewall keeps receiving packets, it is considered life (the firewall doesnt really know what is going on). So for that reason we dont see the connection in the PaloUI. Ok, I got to that point after a while…. I need to proof that the packet from the server is reaching the firewall and it is leaving it too.
How can I do that? Well, I need to delete that flow so the firewall considers a new connection and the tcpdump can see the packets.
This is the a good link from paloalto to take captures. So I found my connection and the cleared it:
palo(active)> show session all filter destination 10.94.240.1 ID Application State Type Flag Src[Sport]/Zone/Proto (translated IP[Port]) Vsys Dst[Dport]/Zone (translated IP[Port]) 135493 dhcp ACTIVE FLOW 10.81.251.201/ZONE1/17 (10.81.251.201) vsys1 10.94.240.1/ZONE2 (10.94.240.1) palo(active)> palo(active)> clear session id 135493
And now, my packet capture in paloalto confirms that it is sending the packet to the next firewall (checking the destination MAC) !!!
Ok, so we confirm the first firewall in the return path was fine…. next one, it is fortigate.
BTW, we were checked and assumed that the routing is fine in all routers, firewalls, etc. Sometimes is not the case… so when things dont follow your thoughts, get back to the very basics….
We have exactly the same issue as in PaloAlto. I can’t see anything in the logs about receiving a dhcp offer from palo and forwarding it to the last firewall Cisco.
And again, we apply the same reasoning. We have an UDP connection, we have a next-generation firewall (with fancy ASIC). And one more thing, in this fortigate firewall, we allow intra-zone traffic, so it is not going to show anyway in the GUI monitor…
So we confirm that we have a flow and cleared it
forti # diag debug flow filter vf: any proto: any Host addr: any Host saddr: any host daddr: 10.94.240.1-10.94.240.1 port: any sport: any dport: any co1fw02 # co1fw02 # diag sys session list session info: proto=17 proto_state=00 duration=2243 expire=170 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=5 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=8/8 state=may_dirty npu synced statistic(bytes/packets/allow_err): org=86840/254/1 reply=0/0/0 tuples=2 tx speed(Bps/kbps): 36/0 rx speed(Bps/kbps): 0/0 orgin->sink: org pre->post, reply pre->post dev=39->35/35->39 gwy=10.81.25.1/0.0.0.0 hook=pre dir=org act=noop 10.81.251.201:67->10.94.240.1:67(0.0.0.0:0) hook=post dir=reply act=noop 10.94.240.1:67->10.81.251.201:67(0.0.0.0:0) misc=0 policy_id=4294967295 auth_info=0 chk_client_info=0 vd=0 serial=141b05fb tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id = 00000000 dd_type=0 dd_mode=0 npu_state=0x001000 npu info: flag=0x81/0x00, offload=6/0, ips_offload=0/0, epid=8/0, ipid=8/0, vlan=0x00f5/0x0000 vlifid=0/0, vtag_in=0x0000/0x0000 in_npu=0/0, out_npu=0/0, fwd_en=0/0, qid=0/0 no_ofld_reason: total session 1 forti # forti # diag sys session clear
In other session, I have a packet capture in the expected egress interface:
forti # diagnose sniffer packet Zone3 'host 10.94.240.1' interfaces=[Zone3] filters=[host 10.94.240.1] 301.555231 10.81.251.201.67 -> 10.94.240.1.67: udp 300 316.545677 10.81.251.201.67 -> 10.94.240.1.67: udp 300
Fantastic, we have confirmation that the second firewall receives and forwards the DHCP Reply!!!
Ok, now the last stop, Cisco ASA. This is an old firewall, I think it could be my father or Darth Vader.
I dont have the fancy tools for packet capture like Palo/Fortigate…. so I went to the basic “debug” commands and “packet-tracer”.
First, this was the dhcp config in Cisco:
vader/pri/act# show run | i dhcp dhcprelay server 10.81.251.47 EGRESS dhcprelay enable SERVERS-ZONE dhcprelay timeout 60
And, the ACL allows all IP traffic in those interfaces… and couldnt see any deny in the logs.
So, I enabled all debugging things I could find for dhcp:
vader/pri/act# show debug debug dhcpc detail enabled at level 1 debug dhcpc error enabled at level 1 debug dhcpc packet enabled at level 1 debug dhcpd packet enabled at level 1 debug dhcpd event enabled at level 1 debug dhcpd ddns enabled at level 1 debug dhcprelay error enabled at level 1 debug dhcprelay packet enabled at level 1 debug dhcprelay event enabled at level 200 vader/pri/act# DHCPD: Relay msg received, fip=ANY, fport=0 on SERVERS-ZONE interface DHCPRA: relay binding found for client f48e.38c7.1b6e. DHCPD: setting giaddr to 10.94.240.1. dhcpd_forward_request: request from f48e.38c7.1b6e forwarded to 10.81.251.47. DHCPD: Relay msg received, fip=ANY, fport=0 on SERVERS-ZONE interface DHCPRA: relay binding found for client 6c2b.59c1.3273. DHCPD: setting giaddr to 10.94.240.1. dhcpd_forward_request: request from 6c2b.59c1.3273 forwarded to 10.81.251.47. vader/pri/act#
So, the debugging doesnt says anything regarding the packet coming back from Fortigate… Not looking good I am afraid. I wasnt running out of ideas about debug commands. I coudn’t increase an log level neither….
Let’s give a go to packet tracer… doesnt looks good:
vader/pri/act# packet-tracer input EGRESS udp 10.81.251.201 67 10.94.240.1 67 Phase: 1 Type: ACCESS-LIST Subtype: Result: ALLOW Config: Implicit Rule Additional Information: MAC Access list Phase: 2 Type: ACCESS-LIST Subtype: Result: DROP Config: Implicit Rule Additional Information: Result: input-interface: EGRESS input-status: up input-line-status: up Action: drop Drop-reason: (acl-drop) Flow is denied by configured rule
So, we are sure our ACL is totally open but the firewall is dropping the packet coming from fortigate. Why? How to fix it?
Ok, get back to basics. Focus in Cisco config. It uses as DHCP relay server, 10.81.251.47 (VIP). But the DHCP reply is coming from the physical IP 10.81.251.201….. maybe Cisco doesnt like that…. Let’s try to add the physical IPs as a new DHCP server:
vader/pri/act# sri dhcp dhcprelay server 10.81.251.47 EGRESS dhcprelay server 10.81.251.201 EGRESS dhcprelay server 10.81.251.202 EGRESS
Let’s check packet tracer again:
vader/pri/act# packet-tracer input EGRESS udp 10.81.251.201 67 10.94.240.1 67 Phase: 1 Type: ACCESS-LIST Subtype: Result: ALLOW Config: Implicit Rule Additional Information: MAC Access list Phase: 2 Type: ACCESS-LIST Subtype: Result: ALLOW Config: Implicit Rule Additional Information: Phase: 3 Type: IP-OPTIONS Subtype: Result: ALLOW Config: Additional Information: Phase: 4 Type: Subtype: Result: ALLOW Config: Additional Information: Phase: 5 Type: Subtype: Result: ALLOW Config: Additional Information: Phase: 6 Type: VPN Subtype: ipsec-tunnel-flow Result: ALLOW Config: Additional Information: Phase: 7 Type: FLOW-CREATION Subtype: Result: ALLOW Config: Additional Information: New flow created with id 340328245, packet dispatched to next module Result: input-interface: EGRESS input-status: up input-line-status: up Action: allow vader/pri/act#
Good, that’s a good sign finally!!!
I think I nearly cried after seeing this in the dhcp logs in our server:
May 12 16:16:27 dhcp1 dhcpd: DHCPDISCOVER from f4:8e:38:c7:1b:6e via 10.94.240.1 May 12 16:16:28 dhcp1 dhcpd: DHCPOFFER on 10.94.240.50 to f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1 May 12 16:16:28 dhcp1 dhcpd: Wrote 0 class decls to leases file. May 12 16:16:28 dhcp1 dhcpd: Wrote 0 deleted host decls to leases file. May 12 16:16:28 dhcp1 dhcpd: Wrote 0 new dynamic host decls to leases file. May 12 16:16:28 dhcp1 dhcpd: Wrote 1 leases to leases file. May 12 16:16:28 dhcp1 dhcpd: DHCPREQUEST for 10.94.240.50 (10.81.251.202) from f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1 May 12 16:16:28 dhcp1 dhcpd: DHCPACK on 10.94.240.50 to f4:8e:38:c7:1b:6e (cmc-111) via 10.94.240.1
So at the end, finally fixed…. it took too many hours.
- DHCP Realy: It is not that obvious the flow regarding IPs.
- UDP and firewalls, debugging it is a bit more challenging.
- Cisco ASA dhcprelay server IPs…. VIPs and non-VIPs please.
All this would be easier/quicker with TCP 😛