Sure thing! I marked 21 as the best answer. Thanks again. Have a great weekend!
Original Message:
Sent: 03-21-2025 10:21
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
I'm glad everything is sorted out now. That was an interesting problem!
Could you please mark post #13 or #21 as answer to help out people who might come across this thread so they don't have through lots of posts?
------------------------------
Nikolay Semov
Original Message:
Sent: 03-21-2025 10:08
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Yep.... diabling DHCP proxy fixed it... tested again with 1681 and everything worked. Wow. One check box resulted in months of grief.
Genuinely I cannot thank you enough. This was giving me gray hairs.
Original Message:
Sent: 03-21-2025 09:26
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Yes, DHCP proxy, DHCP relay, same thing. Very useful when you have a centralized DHCP server, out of broadcast reach, but really not necessary when the server is listening in the same broadcast domain as the clients. And, now we learned, sometimes terrible side effects.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-21-2025 09:04
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
I was looking but can't seem to find an option, unless DHCP proxy == DHCP relay. I'll ask around on the Cisco community and let you know.
Original Message:
Sent: 03-20-2025 19:28
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
See if there's an option on the WLC for it to NOT relay DHCP requests. Basically turn off everything DHCP-related for 1681 on the WLC. I think my original theory was right. On the SRX, the DHCP server interacts with the ARP table.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 19:15
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
That did it! Changing the DHCP server to the PA-220 fixed the issues. I guess I could live with that as a workaround.
I can't thank you enough for this. Genuinely. I've been fighting with this for a very long time and it never once occurred to me that DHCP might be the issue, especially since on the surface it appears to work.
I really wonder why this works. Genuinely, it doesn't make much sense to me.
Original Message:
Sent: 03-20-2025 18:46
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Here's a fun test ... link up 1681 to both the PA and the SRX, but turn off the DHCP server on the SRX; let the PA give out the IP addresses with the SRX as the gateway. Watch out for IP conflicts, of course. "clear dhcp server bindings" and "clear arp" on the SRX after the config is committed. Maybe ARP does get poisoned by DHCP at some point.
If that doesn't work, another fun test -- at an appropriate time -- shed ae1 links -- leave one of the yellow WLC ports active, the rest off (interface disable on the EX side should do, I think). Or maybe just the blue port active.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 18:25
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Edit - We know that configuring a static IP will resolve the issue, it seems like 100% of the time. Why would this be? What is different about static vs dynamic IP that results in this behavior?
It seems like everything is making it and yet nothing is making it. DHCP supposedly doesn't work because it's spamming Discover, but the client gets an IP. ARP supposedly also doesn't work because it's also spamming, but then the client has the gateway in its ARP table.
For the interfaces they are not that balanced to be honest. But I'm not sure how much of an issue this is, because it works consistently every single time on the PA-220.
MDCAS0> show interfaces ge-0/0/10 extensive | match Input
Input bytes : 4343711460 56 bps
Input packets: 7158556 0 pps
MDCAS0> show interfaces ge-0/0/11 extensive | match Input
Input bytes : 197365265768 3264 bps
Input packets: 159601957 1 pps
MDCAS0> show interfaces ge-0/0/12 extensive | match Input
Input bytes : 1122777686 0 bps
Input packets: 3920792 0 pps
MDCAS0> show interfaces ge-0/0/13 extensive | match Input
Input bytes : 2700329505 0 bps
Input packets: 9138510 0 pps
MDCAS0> show interfaces ge-0/0/14 extensive | match Input
Input bytes : 3432636085 0 bps
Input packets: 6330120 0 pps
Original Message:
Sent: 03-20-2025 18:10
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
It's not just DHCP ... It's spamming the ARP requests. Clearly the ARP replies are not making it to the wifi client. Something fishy going on for sure...
Also, for ae1 on MDCAS, I'm interested to see if the input stats are balanced. You'll have to get them from the individual member interfaces of ae1 rather than ae1 itself.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 17:54
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Sure thing...
[edit interfaces ge-0/0/36 unit 0 family ethernet-switching vlan]
- members [ VLAN160 VLAN2328 VLAN2531 VLAN2538 VLAN3400 VLAN1681 ];
+ members [ VLAN160 VLAN2328 VLAN2531 VLAN2538 VLAN3400 ];
[edit interfaces ge-0/0/39 unit 0 family ethernet-switching vlan]
- members [ VLAN160 VLAN2328 VLAN2531 VLAN2538 VLAN3400 VLAN1681 ];
+ members [ VLAN160 VLAN2328 VLAN2531 VLAN2538 VLAN3400 ];
[edit interfaces ae4 unit 0 family ethernet-switching vlan]
- members [ VLAN161 VLAN2329 VLAN3700 VLAN3732 ];
+ members [ VLAN161 VLAN2329 VLAN3700 VLAN3732 VLAN1681 ];
MDCBR2# show network dhcp interface ae2.1681
ae2.1681 {
server {
option {
dns {
primary 8.8.8.8;
secondary 8.8.4.4;
}
lease {
timeout 719;
}
gateway 192.168.1.254;
subnet-mask 255.255.255.0;
}
ip-pool 192.168.1.1-192.168.1.250;
mode auto;
}
}
MDCBR2# show network interface aggregate-ethernet ae2 layer3 units ae2.1681
ae2.1681 {
ipv6 {
neighbor-discovery {
router-advertisement {
enable no;
}
}
}
sdwan-link-settings {
upstream-nat {
enable no;
static-ip;
}
enable no;
}
ndp-proxy {
enabled no;
}
adjust-tcp-mss {
enable no;
}
ip {
G-W-INT-VLAN1681;
}
interface-management-profile IFM-Ping_Only;
tag 1681;
comment "VLAN1681 MDC-WLAN-TEST";
}
I made no changes to the WLC. Just cut over to the PA-220. And it works perfectly fine. Clients connect, clients get an IP address, everything works.
The ae1 links are very uniformly load balanced.
Link:
ge-0/0/10.0
Input : 0 0 0 0
Output: 19093 0 10099568 0
ge-0/0/11.0
Input : 0 0 0 0
Output: 19082 0 10093705 0
ge-0/0/12.0
Input : 0 0 0 0
Output: 19088 0 10096945 0
ge-0/0/13.0
Input : 0 0 0 0
Output: 19077 0 10091038 0
ge-0/0/14.0
Input : 0 0 0 0
Output: 19091 0 10098498 0
I don't understand this. Why does static IP work. Why is it that clients get an IP from the SRX but keep spamming DHCP discover? I think we've really made progress in narrowing it down to it somehow being a DHCP problem, since static IPs work perfectly fine. But as to what the fix is I don't know. I've never encountered a problem like this ever. I've never read about a problem like this ever.
Original Message:
Sent: 03-20-2025 17:15
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Could you dig more into the config for "lag enable" to confirm it really does put all 5 ports into LAG by default and not, say, just 4 of them, etc. It's conceivable, I suppose, that by some divine luck, most traffic that matters goes over an ae1 link that ends up in the LAG, but some might hash into a link that connects to the WLC but is not in the LAG. Another possible way to confirm it is to check the stats for the member interfaces of ae1 on MDCAS0 and see if input traffic is well-balanced across them which would indicate the WLC is indeed using all 5 ports for the LAG.
My theory about the MAC address confusion on the SRX didn't pan out. Clearly packets from the SRX to the MAC of the client are not making it to the client.
It appears that wifi 1681 is cursed.
I apologize if you previously mentioned somewhere that you tested this, but could you please try moving 1681 off the SRX and onto the PA without making any changes on the WLC?
If 1681 works with the PA, then the SRX is cursed. If it does the same thing on the PA, then I guess double and triple-check that 1681 is set up like the rest on the WLC. Though, to loop back to what I mentioned at the top, with the PA having a different MAC, your test traffic may select a different LAG member interface.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 16:42
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
edit - static IP did not originally work. Now it is working. I can ping out with a static IP on .100. The SPAN on the ae1, what it showed was the the client spamming DHCP Discover and getting no response. I don't understand this, because the client did get an IP, you can see it in ipconfig, and on the SRX.
(Note: The WLC interface in these output/capture is 192.168.1.252, as I changed it to that when I was testing the core switch and forgot to change it back.)
I think all the config that it does, is just putting 'config lag enable
' and then every single port goes into a LAG. Could be wrong on that but that's what it seems like.
I've got no idea what the '8' is, this is not actually a valid command...
(MDCWC1) >config interface port management 8
Incorrect usage. Use the '?' or <TAB> key to list commands.
(MDCWC1) >config interface port?
[no output]
I can certainly try again with the packet capture, but I'm not seeing a way to get much more detail. I could just do a SPAN on the AE, I guess. But on the WLC I just end up doing:
> debug packet logging acl ip 1 permit 192.168.1.1 any any any any
(usually also .2, .3, .4 as well due to DHCP bindings and new leases)
> debug packet logging acl ip 2 permit 192.168.1.253 any any any any
> debug packet logging acl ip 3 permit 192.168.1.254 any any any any
> debug packet logging format text2pcap
> debug packet logging enable all 10000
Let's use the 192.168.1.1 > 192.168.1.254 for an exmaple. As mentioned the traffic dies on the WLC. So ICMP echo replies show up on the packet capture. When I take a closer look, I can see that the source MAC address of the reply is the MAC address of the SRX, and the destination MAC address is the MAC address assigned to the dynamic interface of the WLC (vlan1681).
Now here's something interesting. And I don't know if this is a symptom or just completely unrelated/me being stupid. But when I try to capture ICMP packets by doing this exact same process for a subnet running on the PA-220 (or previously on a PA-850), literally nothing shows up. Even doing a 'debug packet logging acl ip 1 permit any any icmp any any
', it never yields any user ICMP traffic, only the occasional PRTG ICMP probe. The instant I switch it over to the SRX the ICMP packets start showing up. Of course, now that I have said this, I went and tried to pcap and now no ICMP packets show up :/
For the pcap I just did, see the attached file hexdump.txt, you can import it into Wireshark. I will do a SPAN on the ae1 and let you know.
I do have many DHCP pools configured at the [edit system services dhcp-local-server]
hierarchy, I went ahead and added that command.
For pcap purposes:
- Juniper interface: 00:10:db:ff:10:02
- WLC interface: 30:8b:b2:88:9c:63
- Client interface: 20:2b:20:7a:c7:13
Here is the ARP table on the MDCBR with a client associated:
> show arp interface reth2.1681
MAC Address Address Name Interface Flags
20:2b:20:7a:c7:13 192.168.1.1 192.168.1.1 reth2.1681 none
30:8b:b2:88:9c:63 192.168.1.252 192.168.1.252 reth2.1681 none
Total entries: 2
Here is the switch and kernel ARP tables on the MDCWC1 for the relevant addresses:
(MDCWC1) >show arp sw
Number of arp entries................................ 11
MAC Address IP Address Port VLAN Type
------------------- ---------------- ------ ------ ------
20:2B:20:7A:C7:13 192.168.1.1 8 1681 Client
00:10:DB:FF:10:02 192.168.1.254 8 1681 Host
(MDCWC1) >show arp ker
IP address HW type Flags HW address Mask Device
192.168.1.1 0x1 0x6 20:2b:20:7a:c7:13 * dtl0.1681
192.168.1.254 0x1 0x2 00:10:db:ff:10:02 * dtl0.1681
And for the heck of it, here's the ARP table from the wireless client (forgive the formatting as I am typing it verbatim):
Interface: 192.168.1.1 --- 0x14
Internet Address Physical Address Type
192.168.1.252 30-8b-b2-88-9c-63 dynamic
192.168.1.254 00-10-db-ff-10-02 dynamic
192.168.1.255 ff-ff-ff-ff-ff-ff static
Doing a 'monitor traffic interface reth2.1681 extensive no-resolve size 1518
' shows some DHCP and some ARP (please see attached file).
Original Message:
Sent: 03-20-2025 15:18
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Very interesting ...
I was going to say that I can't find the part of the WLC configuration that mentions anything about LAG across the five ports ... There's this:
interface port management 8
interface port vlan161 8
interface port vlan1681 8
[...]
But not sure what's this port 8. Anyway, the working wifi is obviously using the very same link bundle, so it ought to be configured somewhere, I just can't seem to find it in the config file ...
When you capture traffic on the WLC, is there a way to include more details, like the port involved or the VLAN interface? It'd be interesting to see packets entering the WLC and then leaving towards the AP.
What does seem to be working is that the wifi client is successfully getting IP 192.168.1.1 from the SRX. Looks like the WLC is relaying those requests, though, since your previous capture shows the DHCP request from 1.253 to 1.254. You may want to include "requested-ip-interface-match" under dhcp-local-server, just in case, since you have several.
I'm curious to see if the wifi client shows up with its own MAC address, or the MAC of the WLC ... Can you check the ARP table on MDCBR and see if 1.1 and 1.253 have separate MACs? It may be worth capturing the DHCP request and response on the SRX, too (you should be able to see it with traffic monitoring on reth2.1681 with details). What may be different from the PA firewalls is that DHCP on SRX creates an ARP table entry when addresses are assigned. I wonder if maybe SRX add 1.1 with MAC of WLC but then actual user traffic comes from MAC of the actual wifi client. If that's the case, then response traffic from the SRX would have the MAC of the WLC which means the packets would reach the WLC but that's where they would end. And it seems that's exactly what you're observing.
Maybe try static IP on your wifi client? (clear dhcp server binding and arp entry, if any, after making the IP static on the client; use some different IP, too, just in case).
The switch config seems alright. I guess you can check the ethernet-switching table on the switches for VLAN 1681 but I doubt that will show you anything strange. The same switches are pushing traffic to and from the PA firewalls.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 14:34
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
It's a CAPWAP AP, so the WLC is required to be trunked, the AP just needs to be on the management VLAN. So yes the AP will tunnel user traffic to the WLC, I assume over 1020. At least that is my understanding of it, this is not something I am super well-versed in.
While any wireless VLAN running on the Juniper will have the issue, let's focus on the VLAN 1681 - SSID 'mdc-test' - 192.168.1.0/24 - as that is separate from production. Currently the rest you see on the config are running off of the PA-220, which is why they are working. (however as the 1020 is a wired VLAN, it is okay to run off of the SRX)
And attached please see the output for ae1.
Original Message:
Sent: 03-20-2025 12:53
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Help me understand how the WLC works, as I'm not familiar with that configuration. Does the AP trunk user traffic to the WLC via VLAN1020, or is the AP directly connected to the WLC with a connection that's not shown on the diagram?
I see the the WLC is connected to MDCAS0 via ae1 (on the switch) but that has a number of VLANs. Which VLAN ends up carrying the user traffic that's having issues? (if there's more than one SSID / VLAN / etc. that's exhibiting the same problem, let's pick just one to focus on for troubleshooting)
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 12:20
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
I don't understand why my earlier comment with the attached LLDP neighbors output and some additional port configuration detail just disappeared and never posted.
I guess the WLC maybe doesn't work with LLDP or only does CDP or something, as it does not appear.
The SRX "should" have connectivity. It's firewall > core > access switch > wireless controller/AP.
The WLC can ping the SRX gateway, the SRX can ping the WLC interface.
SRX goes down to the core on various interfaces, then the core goes to the access switch via ae0, then the access switch gets to the WLC via ae1.
Original Message:
Sent: 03-20-2025 12:17
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
With the information provided, having only the diagram to show what's connected (as opposed to what's configured):
- The SRX is only connected to MDCCR over VLANs 201 and 998;
- MDCCR has VLANs 201 and 998 going only upstream towards Lumen.
So, essentially, your wireless controller and access point have no connectivity to the SRX at all.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 11:17
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Yes, my apologies. I should have included that to begin with. I also tried moving the gateway to the core switch, which ended with the same result - it seems like maybe the issue, is with all Juniper products.
Please see updated attached configurations, as the topology has changed from the original.
MDCBR = Firewall
MDCCR = Core Switch
MDCAS0 = Access switch 0
MDCWC1 = Wireless controller
MDCAP01 = Access point
Original Message:
Sent: 03-20-2025 10:50
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Alright, since there's a switch in the mix, and it's a bit difficult for an outsider with no knowledge of your network to easily decipher all the abbreviations and shorthand in the configuration, could you please provide some sort of a diagram of what's connected to what exactly (with specific interface names and numbers)?
------------------------------
Nikolay Semov
Original Message:
Sent: 03-20-2025 09:40
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Thank you - however this is not the issue. It all appears fine. The issue is with all traffic originating from the SRX going to the WLC, including internal only traffic such as pinging the gateway.
All wired traffic is okay, putting a wired client on a wireless VLAN, that is okay as well.
Original Message:
Sent: 03-17-2025 14:42
From: Nikolay Semov
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
Check DHCP client status on reth2.201 while seeing the "192.168.1.254 192.168.1.1 ICMP 74 Destination unreachable (Port unreachable)" messages. Also whether a default route is installed. My suspicion lies in the negotiation between the SRX and Lumen.
------------------------------
Nikolay Semov
Original Message:
Sent: 03-15-2025 10:47
From: TacticalDonut164
Subject: Cisco WLC 3504 failing to forward/dropping client traffic originating from SRX320
(disclaimer: homelab)
Hey guys,
I am having an issue with the interaction between the SRX320 and the Cisco WLC 3504. Wireless clients that have their gateway set and traffic handled on/by the SRX320, have their traffic either dropped or not forwarded by the WLC 3504.
From packet captures, I can see that traffic leaves the client, egresses through the WLC up to the SRX, from where sessions are created. The SRX then sends return traffic out to the WLC (take DHCP and ICMP traffic as an example), and it appears in packet captures conducted on the WLC:
192.168.1.253 192.168.1.254 DHCP 382 DHCP Request192.168.1.254 192.168.1.253 DHCP 325 DHCP ACK192.168.1.254 192.168.1.1 ICMP 74 Destination unreachable (Port unreachable)[previous message repeats many times]192.168.1.254 192.168.1.1 ICMP 78 Echo (ping) reply id=0xc00c, seq=2/512, ttl=64192.168.1.254 192.168.1.1 ICMP 78 Echo (ping) reply id=0xc00c, seq=3/768, ttl=648.8.8.8 192.168.1.1 ICMP 78 Echo (ping) reply id=0x2c26, seq=9/2304, ttl=1178.8.8.8 192.168.1.1 ICMP 78 Echo (ping) reply id=0x2c26, seq=10/2560, ttl=1178.8.8.8 192.168.1.1 ICMP 78 Echo (ping) reply id=0x2c26, seq=11/2816, ttl=1178.8.8.8 192.168.1.1 ICMP 78 Echo (ping) reply id=0x2c26, seq=12/3072, ttl=117
On the layer 2 side of this pcap, the source MAC is the SRX, the destination MAC is the dynamic interface for that subnet on the WLC.
However, from here, this traffic does not go anywhere. Either it is never forwarded to the AP, or the AP never forwards it to the client. This results in clients having zero layer 3 or higher connectivity to the gateway = clients cannot leave their subnet.
Layer 2 is perfectly fine. All devices in the chain have correct/valid ARP entries.
Of course, you might be thinking that this is an issue with the Cisco product, not the Juniper product. Normally I would absolutely agree. However, the confounder is that there are zero issues when the firewall is not an SRX, but instead any Palo Alto firewall. Which is why the current workaround is to run wireless subnets off of a spare PA-220, and therefore there is something special about how the SRX handles traffic that results in this behavior.
Before I post on the Cisco forum, I would like to see if there is anything I should be checking on the SRX. Because clearly there is something about how the SRX handles traffic that makes the WLC fail to handle it properly.
Thanks guys!
- Firewall: 2x Juniper SRX320-SYS-JB 23.4R2-S4.9 (config)
- Core switch: 1x Juniper EX3400-24P 23.4R2-S3.9 (config)
- Controller: 1x Cisco AIR-CT3504-K9 8.10.196.0 (config)
- AP: 1x Cisco C9130AXI-B