Wireless

IMPORTANT MODERATION NOTICE

This community is currently under full moderation, meaning  all posts will be reviewed before appearing in the community. Please expect a brief delay—there is no need to post multiple times. If your post is rejected, you'll receive an email outlining the reason(s). We've implemented full moderation to control spam. Thank you for your patience and participation.



  • 1.  Clients intermittently "hanging" using Mist AP32 and Juniper EX2300/EX4600

    Posted 02-21-2021 15:26
    [crossposted to Switching before joining Wireless, and I can't find a way to edit my Switching post to crosspost to Wireless]

    At least twice a day - generally busier times of the school day (10ish and 1ish) - clients will appear to have no internet for anywhere from a couple minutes to ~5 or sometimes up to 10 minutes. They have full bars on their WIFI connection, and may or may not have an IP address (sometimes self-assigned, but usually a valid IP), but can't use a web browser. Unfortunately, by the time the issue is reported, clients are working again, so I've not been able to do extensive testing from the client's perspective.

    Not all buildings are affected at the same time, though sometimes the internet "brownout" happens in another building within a minute or two of the first occurence in another building.

    Default storm-control enabled on all ports of my EX2300 edge switches and EX4600 core, and Mist AP32 access points with a mix of macOS/Windows/iOS/Android/Chromebook clients. Dual EX4600 virtualchassis at the core with 9 EX2300 edge VCs (1, 2, or 3 switches, depending on the size of the building), all connected back to the EX4600s via two 10G SFP+ fibre links in an 20G LACP aggregated interface. EX4600 is using 19.3R2.9 and EX2300s are using several different firmware versions from 18 to 19.3R2.9 to 20.2R1.10 to 20.4R1.12. Using DHCP Relay on the EX4600 from a Windows DHCP server and DNS is coming from a couple Linux BIND servers - both on different VLANs from the clients.

    All my EX2300 configs are done using interface-range statements like this one:
    set interfaces interface-range wap unit 0 family ethernet-switching storm-control default
    and the definition of EX2300 storm-control is boilerplate:
    set forwarding-options storm-control-profiles default all
    Mist APs complain that DHCP and DNS servers aren't working for periods of 5-10 minutes (sometimes longer), and clients have apparent 802.1X authentication timeouts or errors during this time (since passwords are saved on the clients, this is a red herring - the client isn't fat-fingering the password, they're just having communication issues).

    No error messages in the Juniper switch logs.

    DHCP server is on VLAN 10 (relayed through EX4600 to other VLANs), DNS on VLAN 12, most clients are on VLAN 4 (BYOD network), teachers are on VLAN 3, roughly 1 AppleTV per classroom on VLAN 6.

    I haven't yet been able to capture traffic during the "brownout" event.

    I'm wondering whether any of this sounds familiar to you.

    Is it possible it's some sort of unicast storm that's causing DNS and DHCP unicast packets to be undeliverable and resulting in the client experiencing this as "no internet"? Are there any gotchas with storm-control?

    I've heard that AppleTVs can still have issues with becoming the default gateway due to some weird long-standing bug with the bonjour sleep proxy service...but the AppleTVs are all on a different VLAN from the clients, so this is probably not the cause. This is also affecting Windows laptops and Chromebooks just as much as Apple devices.

    Perhaps something else entirely?

    Any assistance would be much appreciated!


  • 2.  RE: Clients intermittently "hanging" using Mist AP32 and Juniper EX2300/EX4600

    Posted 02-22-2021 09:13
    Have you tried setting trace options on the switches to up the logging level? Especially if you have some locations that this happens consistently or more often.
    Where is your authentication server?
    What does your SLE data say during that time? If it's consistently DNS, DHCP, and 802.1x errors it definitly sounds like clients are having problems reaching your servers.

    This sounds to me like something further upstream thats causing the communication outage. Does this affect wired clients or is it just wireless? Are the AP's loosing communication with the cloud?

    ------------------------------
    ALLYN CROWE
    ------------------------------



  • 3.  RE: Clients intermittently "hanging" using Mist AP32 and Juniper EX2300/EX4600

    Posted 02-22-2021 12:17
    I *DID* have trace options enabled, but they didn't show anything and JTAC told me to turn them off to lower switch CPU usage on the EX2300s - their theory was that the switch was experiencing a brownout due to pegging the processor. While processor usage has gone down from periodic 80-90% to regular 40-50%, it hasn't had any effect on the daily brownouts.

    We have 1 physical host running AD, DHCP, and NPS and two secondary VMs running the same services (with DHCP on a 1s delay). Both BIND servers are virtual. All servers (hosts or VMs running on them) are directly connected to the core EX4600 VC that all the EX2300 VCs connect to.

    I believe the DNS, DHCP, and resulting RADIUS errors are symptoms of UDP traffic problems. Wired clients do not seem to be affected - this only seems to happen on devices connected through the Mist AP32s. The APs do not seem to have communication issues - at least not reported in the admin interface.

    Again, we have full WIFI signal bars on the clients...just nowhere to go using IP. 

    And this affects clients on WPA2-PSK too, so it's not just RADIUS on the WPA2-Enterprise SSID. And I've turned off WIFI6, which hasn't made much difference (apart from a couple Windows clients who seem more reliable than when 6 was on).


  • 4.  RE: Clients intermittently "hanging" using Mist AP32 and Juniper EX2300/EX4600

    Posted 02-22-2021 12:47
    it definitely is strange. And yea makes sense to only run trace (and be tight in what you enable) when you're looking for something because It does peg the processor.

    And agreed the DHCP/DNS/RADIUS issues definitely sounds like a traffic issue not specific to those services. With wired clients on the same switches not affected maybe it's a port/asic level overload? what does the port load look like for the AP ports?

    Another next step could be turning off storm control on the AP ports in a building where it happens regularly. I haven't seen this kind of issue with storm control on though and you should see something in the switch logs saying storm control is kicking in.

    ------------------------------
    ALLYN CROWE
    ------------------------------