Junos OS

 View Only
last person joined: yesterday 

Ask questions and share experiences about Junos OS.

LAG ports flap after RSTP TOPO_CHANGE triggered by layer-3 IP Address

  • 1.  LAG ports flap after RSTP TOPO_CHANGE triggered by layer-3 IP Address

    Posted 04-03-2023 06:20

    Howdy,

    We have been having some internal network issues where my organization loses network connectivity to the outside world when a new device on VLAN100 is assigned an L3 IP Address, either through DHCP or Manually configured. A couple of weeks ago, our network infrastructure was rock solid! We have a couple of EX2300 Switches, and an SRX340 router, all interconnected and with RSTP enabled.

    On our network, we have a Top-of-Rack Switch (ToR), an Access Switch (AS) as well as our default gateway (Firewall). It is such a simple network layout and our spanning tree infrastructure is as simple as it can get! The main character for this discussion is the ToR Switch, which is connected to our Router through a LAG (ae0) interface.

    Apart from that connection, we also have fiber optic connections from the ToR switch onto our two (2) Access Switches (ASs) which have a direct connection to the Router as well, the reason why we have RSTP enabled. But this gets a bit more tricky and a strange scenario that we have been dealing with for the last couple of weeks.

    Our configuration is that we are running a global RSTP on all of our Switches, including our Router, which has layer2-switching enabled for the ports belonging to the LAG interface. On both the Router and ToR Switch, we have RSTP mode point-to-point configuration on the LAG interface (ae0) as well as the connections between the ToR Switch onto the ASs. The other switches that do not participate in RSTP have a mode of edge.

    Let's get now into our problem. Buckle up! A couple of weeks ago, I was working on assigning new devices (Virtual Devices) to our network infrastructure. For a more deep understanding, we have our hypervisor connected to one of the ports on the ToR Switch. Furthermore, we also count for 5 different VLANS, one of those being VLAN100, which is the one having all this struggle. When I created a VM and assigned the vNIC to our VLAN100 network, I was notified that after 10 seconds, everyone in my organization lost internet connectivity. After a like 5 minutes, everything was back and running by itself, I jumped into our logs and I saw this:

     

    root@ACIT-TOR-SW01> show log messages
    Apr  3 07:59:04 ACIT-TOR-SW01 clear-log[31759]: logfile cleared
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/46: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/46) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 707; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/46: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/44: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/44: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/45: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/45: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: lag_bundlestate_ifd_change: bundle ae0: bundle IFD minimum bandwidth or minimum links not met, Bandwidth (Current : Required) 0 : 1000000000 Number of links (Current : Required) 0 : 1
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/47: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_DOWN: ae0: Interface marked down due to lacp timeout on member ge-0/0/47
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/47: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/46) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/44) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 705; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 707; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/45) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 706; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/47) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 708; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/44) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 705; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/45) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 706; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/47) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 708; Ether boolean set error (22)
    Apr  3 07:59:28  ACIT-TOR-SW01 mib2d[17124]: SNMP_TRAP_LINK_DOWN: ifIndex 610, ifAdminStatus up(1), ifOperStatus down(2), ifName ae0
    
    user@ACIT-TOR-SW01> show log messages
    
    Apr 3 07:59:04 ACIT-TOR-SW01 clear-log[31759]: logfile cleared
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/46: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/46) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 707; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/46: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/44: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/44: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/45: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/45: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: lag_bundlestate_ifd_change: bundle ae0: bundle IFD minimum bandwidth or minimum links not met, Bandwidth (Current : Required) 0 : 1000000000 Number of links (Current : Required) 0 : 1
    Apr  3 07:59:27  ACIT-TOR-SW01 kernel: bundle ae0.0: bundle IFL state changed to UP
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACPD_TIMEOUT: ge-0/0/47: lacp current while timer expired current Receive State: CURRENT
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_DOWN: ae0: Interface marked down due to lacp timeout on member ge-0/0/47
    Apr  3 07:59:27  ACIT-TOR-SW01 lacpd[17161]: LACP_INTF_MUX_STATE_CHANGED: ae0: ge-0/0/47: Lacp state changed from COLLECTING_DISTRIBUTING to ATTACHED, actor port state : |EXP|-|-|-|IN_SYNC|AGG|SHORT|ACT|, partner port state : |-|-|DIS|COL|OUT_OF_SYNC|AGG|SHORT|ACT|
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/46) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/44) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 705; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 707; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/45) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 706; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: ETH: ifd (ge-0/0/47) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]: IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 dc-pfe[16887]:   ifd 708; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/44) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 705; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/45) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 706; Ether boolean set error (22)
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 ETH: ifd (ge-0/0/47) unknown boolean option 112
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0 IFFPC: 'IFD Ether boolean set' (opcode 55) failed
    Apr  3 07:59:27  ACIT-TOR-SW01 fpc0   ifd 708; Ether boolean set error (22)
    Apr  3 07:59:28  ACIT-TOR-SW01 mib2d[17124]: SNMP_TRAP_LINK_DOWN: ifIndex 610, ifAdminStatus up(1), ifOperStatus down(2), ifName ae0
    


    Our LAG interface between the ToR Switch and the Router started flapping and we started seeing this boolean error all over the place. We thought that there was a broadcast storm somewhere so I went and checked the RSTP log messages and found TOPO_CHANGES all across the network. 

    Here are our Router's logs that were recorded when the outage happened: 

    Apr  3 07:59:41  ACIT-RT01 l2cpd[2014]: TOPO_CH: for Instance 0 in  routing-instance default received on port ae0.0
    Apr  3 08:00:09  ACIT-RT01 l2cpd[2014]: TOPO_CH: for Instance 0 in  routing-instance default received on port ae0.0
    Apr  3 08:00:11  ACIT-RT01 l2cpd[2014]: TOPO_CH: for Instance 0 in  routing-instance default received on port ae0.0
    


    After a while, I decided to replicate the issue again, with the authorization I created a new VM and assigned an IP Address on VLAN100 through DHCP. Loh and behold, it happened again, TOPO_CHANGES, LACP_DOWN messages, our ae0 was flapping again. I went and tried something else, I brought up an old laptop of mine and connected it to our ToR switch, right after I made the correct configuration on one of the ToR interfaces to become an access port on (VLAN20). Bear in mind that VLAN20 is already in production and it has configuration on our router as well.  After plugging in the device on that previously configured interface with (RSTP edge mode), my laptop got a DHCP address and nothing happened, everything was working fine, but when I swap the interface to become an access port for VLAN100... Bob's your Uncle, the network went down.

    I started looking into why on VLAN100 our LAG interface (ae0) starts to flap, but on other interfaces, it doesn't happen. We do not have PVSTP but a global RSTP and our Root Bridge is the ToR Switch.

    Strangely enough, I tried the same approach of connecting my laptop to the Switch, but this time was on one of our Access Switches, for VLAN100. Nothing, everything worked fine. It is only triggered on the TOR Switch. I even updated all Switches to the with22.4R1.10 version, but the only thing that changed was more information on the logs.

    If anyone could comment on this issue that would be great. It's truly weird, and, again, the first problem we've ever had with an astounding product.

    This is my first post here so apologies if the inline logs are not conventional.

    Thanks!



    ------------------------------
    Nick
    ------------------------------