Switching

Expand all | Collapse all

Loop / flood in network after moving from EX to QFX

  • 1.  Loop / flood in network after moving from EX to QFX

    Posted 04-27-2020 00:29

    Hello,

    We have the following topology:

    CORE3 = EX4550-32T, 12.3R6.6

    CORE3-New = QFX5100, 18.1R3-S7.1

    CORE1 = EX4550-32F, 12.3R6.6


    CORE3 config:

    set forwarding-options analyzer VOIP input ingress interface xe-1/0/24.0
    set forwarding-options analyzer VOIP input ingress interface xe-0/0/24.0
    set forwarding-options analyzer VOIP input ingress interface xe-1/1/2.0 <--- UCS-B, P.25
    set forwarding-options analyzer VOIP input ingress interface xe-0/1/2.0 <--- UCS-A, P.25
    set forwarding-options analyzer VOIP input egress interface xe-1/0/24.0
    set forwarding-options analyzer VOIP input egress interface xe-0/0/24.0
    set forwarding-options analyzer VOIP input egress interface xe-1/1/2.0 <--- UCS-B, P.25
    set forwarding-options analyzer VOIP input egress interface xe-0/1/2.0 <--- UCS-A, P.25
    set forwarding-options analyzer VOIP output interface xe-0/0/27.0 <--- Recording Server (Output Mirror)

    Above topology works on CORE3.

    When we moved this topology and cables to a new switch "CORE3-New" , network got flooded, loss of pings, freezing of some switches, intermittent connectivity.

     

    We then deactivated the analyzer on CORE3-New:

    deactivate forwarding-options analyzer VOIP

    but still the same behavior.


    The solution *seems* to be configuring no-mac-learning on the mirror ports on the CORE3-New, we haven't tried that yet.

    Currently we had to rollback and move the mirror ports to the old switch CORE3 and everything works again.

    What is the difference between CORE3 and CORE3-New that might cause this behavior?

    The configuration is the same, no-mac-learning is not configured on CORE3 and still everything works.


    Any ideas?

     

     

    network.png

     


    #flood
    #EX
    #QFX
    #loop
    #maclearning


  • 2.  RE: Loop / flood in network after moving from EX to QFX

     
    Posted 04-27-2020 06:47

    @iNc0g there are difference in operation between older (often referred to as Legacy - now all EOL'd) products and newer, as they use different internal ASICs.  Older/Legacy are Marvell based, while new are Broadcom (or Juniper) based.

     

    This (no-mac-learning with mirroring) looks to be something different between these [somewhat equivalent] products.  I am sure this behavior of the QFX5100 also equally would apply to EX4600.

     

    FYI only.  HTH



  • 3.  RE: Loop / flood in network after moving from EX to QFX

    Posted 04-27-2020 07:44

    Hello,

     


    @iNc0g wrote:

     

    We then deactivated the analyzer on CORE3-New:

    deactivate forwarding-options analyzer VOIP

    but still the same behavior.

     


     

    While I don't have a solution for this particular problem, did You actually attempt break Your triangle topology to quell the flood while QFX was in place?

    L2 frames have no TTL and can be circulated indefinitely - You need to logically break the loop to stop L2 flood, 

    Which You did when put old EX back in, and then You probably start thinking that QFX somehow loops frames even without analyzer enabled.

     

     


    @iNc0g wrote:

     


    The solution *seems* to be configuring no-mac-learning on the mirror ports on the CORE3-New, we haven't tried that yet.

     

    Just wondering where this piece of advice comes from? Is it from JTAC or from some yet-unnamed source You found on the wider internet?

     

    HTH

    Thx

    Alex

     

     

     

     

     



  • 4.  RE: Loop / flood in network after moving from EX to QFX

    Posted 04-27-2020 13:58

    Hello iNc0g,

     

    Since you are migrating from EX4550 (legacy), to QFX5100 (ELS), did you make sure that all the configuration was updated to ELS style?

     

    Also, if the packets that the switch is receiving via xe-1/1/2/xe-0/1/2 have a destination mac learned through the trunk to core1 it is expected for the switch to forward that traffic, even if there is no dest mac the switch will simply flood the traffic throught the vlan, it will be the same behavior if you remove mac learning.

     

    Based in the diagram, you have the voice vlan in all the trunks, what loop prevention feature you have configured?

     

    If this solves your problem, please mark this post as "Accepted Solution".

     

     

     

     

     



  • 5.  RE: Loop / flood in network after moving from EX to QFX

     
    Posted 04-27-2020 19:42

    Hi, 

     

    Since you are moving to ELS and QFX in particular, please make sure this is not a loop caused by RSTP, whic differs from EX and Legacy. Here is a KB that explains the differences: https://kb.juniper.net/InfoCenter/index?page=content&id=KB33693&actp=METADATA

     

    I just wanted to make sure this is not a loop caused by the lack of the interfaces not being explicitly called under RSTP as a first step and also since the config is not completely there. I would request you to add the STP interface status to this thread as it might help identifying the underlying issue.



  • 6.  RE: Loop / flood in network after moving from EX to QFX

    Posted 04-30-2020 00:10

    Thank you all for your comments!

    CORE3:

    CORE3# run show configuration | display set | match stp
    set protocols rstp interface ae0.0 mode point-to-point <--- ae0 was the trunk connecting CORE3 to CORE1 
    
    CORE3# run show spanning-tree bridge
    
    STP bridge parameters
    Context ID                          : 0
    Enabled protocol                    : RSTP
      Root ID                           : 4096.64:64:9b:1f:91:81
      Root cost                         : 21000
      Root port                         : ae4.0 <-- trunk connecting CORE3 to CORE3-NEW (temporary solution until we move back the VOIP mirror ports to CORE3-NEW)
    
      Hello time                        : 2 seconds
      Maximum age                       : 20 seconds
      Forward delay                     : 15 seconds
      Message age                       : 2
      Number of topology changes        : 339
      Time since last topology change   : 1467966 seconds
      Topology change initiator         : ae4.0
      Topology change last recvd. from  : 78:4f:9b:18:91:c2
      Local parameters
        Bridge ID                       : 32768.3c:8a:b0:e8:8b:41
    
    
    CORE3# run show spanning-tree interface
    
    Spanning tree interface parameters for instance 0
    
    Interface    Port ID    Designated      Designated         Port    State  Role
                             port ID        bridge ID          Cost
    ae4.0            128:5        128:7  32768.784f9b1891c2     20000  FWD    ROOT
    xe-0/0/24.0    128:537      128:537  32768.3c8ab0e88b41    200000  FWD    DESG
    xe-0/0/28.0    128:541      128:541  32768.3c8ab0e88b41     20000  FWD    DESG
    xe-0/1/2.0     128:563      128:563  32768.3c8ab0e88b41      2000  FWD    DESG
    xe-1/0/24.0    128:593      128:593  32768.3c8ab0e88b41     20000  FWD    DESG
    xe-1/1/2.0     128:619      128:619  32768.3c8ab0e88b41      2000  FWD    DESG
    
    
    


    CORE3-NEW:

    CORE3-NEW# run show configuration | display set | match stp
    set protocols rstp interface all
    
    CORE3-NEW# run show spanning-tree bridge
    STP bridge parameters
    Routing instance name               : GLOBAL
    Context ID                          : 0
    Enabled protocol                    : RSTP
      Root ID                           : 4096.64:64:9b:1f:91:81
      Root cost                         : 1000
      Root port                         : ae0 <--- trunk connecting CORE3-NEW to CORE1 (all vlans)
      Hello time                        : 2 seconds
      Maximum age                       : 20 seconds
      Forward delay                     : 15 seconds
      Message age                       : 1
      Number of topology changes        : 35
      Time since last topology change   : 1452853 seconds
      Local parameters
        Bridge ID                       : 32768.78:4f:9b:18:91:c2
        Extended system ID              : 0
    
    
    CORE3-NEW# run show spanning-tree interface
    
    Spanning tree interface parameters for instance 0
    
    Interface                  Port ID    Designated         Designated         Port    State  Role
                                           port ID           bridge ID          Cost
    ae0                          128:3        128:9   4096.64649b1f9181         1000    FWD    ROOT
    ae1                          128:4        128:4  32768.784f9b1891c2        10000    FWD    DESG
    ae4                          128:7        128:7  32768.784f9b1891c2        10000    FWD    DESG
    ae8                         128:11       128:11  32768.784f9b1891c2        10000    FWD    DESG
    ae9                         128:12       128:12  32768.784f9b1891c2        10000    FWD    DESG
    ae10                        128:13       128:13  32768.784f9b1891c2        10000    FWD    DESG
    
    
    CORE3-NEW# run show spanning-tree statistics interface
    
    
    Interface     BPDUs       BPDUs        Next BPDU       TCs        Proposal    Agreement
                  Sent        Received     Transmission    Tx/Rx      Tx/Rx       Tx/Rx
    ae0             12      772744             0           0/0         0/0         0/0
    ae1         791803          19             0           0/0         0/0         0/0
    ae4         781141          12             1           0/0         0/0         0/0
    ae8         790094           0             0           0/0         0/0         0/0
    ae9         769765           4             0           0/0         0/0         0/0
    ae10        790036           0             1           0/0         0/0         0/0
    ae11        789117           2             1           0/0         0/0         0/0
    ae12        791858           0             1           0/0         0/0         0/0
    ae13        791811           0             1           0/0         0/0         0/0
    ae14        770779           0             1           0/0         0/0         0/0
    ae15        770765           0             0           0/0         0/0         0/0
    ae16        770827           0             0           0/0         0/0         0/0
    ae17        770767           0             1           0/0         0/0         0/0
    ae20        791766          75             1           0/0         0/0         0/0
    
    
    CORE3-NEW# run show spanning-tree statistics bridge
    
    
    STP Context  : default
    STP Instance : 0
    Number of Root Bridge Changes: 43           Last Changed: Mon Apr 13 09:22:17 2020
    Number of Root Port Changes:   29           Last Changed: Mon Apr 13 09:22:17 2020
    Recent TC  Received:  ae0.0                 Received    : Mon Apr 13 13:58:17 2020
    
    
    
    
    
    

    The "set protocols rstp interface ae0.0 mode point-to-point"config which was in use on CORE3 was not copied over to CORE3-NEW since this is the default configuration and doesn't need to be explicitly  set on the QFX AFAIK.

    While we had the flood/loop going on, we disabled each LACP interface 1 by 1 on CORE1 until ae8 (connecting CORE1 to CORE3-New) or aeXX connecting CORE1 to the UCS (don't remember which one) was found to be stopping the flood.

     

    CORE1:

    CORE1# run show configuration | display set | match stp
    set protocols rstp bridge-priority 4k
    set protocols rstp interface xe-0/1/0.0 edge
    set protocols rstp interface xe-0/1/1.0 edge
    set protocols rstp interface xe-0/1/2.0 edge
    set protocols rstp interface xe-0/1/3.0 edge
    set protocols rstp interface ae0.0 mode point-to-point <--- connecting CORE1 to CORE3-NEW
    set protocols rstp interface ae0.0 no-root-port
    set protocols rstp interface ae1.0 mode point-to-point
    set protocols rstp interface ae1.0 no-root-port
    set protocols rstp interface ae2.0 mode point-to-point
    set protocols rstp interface ae2.0 no-root-port
    set protocols rstp interface ae3.0 mode point-to-point
    set protocols rstp interface ae3.0 no-root-port
    set protocols rstp interface ae4.0 mode point-to-point
    set protocols rstp interface ae4.0 no-root-port
    set protocols rstp interface ae8.0 mode point-to-point
    set protocols rstp interface ae8.0 no-root-port
    set protocols rstp interface ae9.0 mode point-to-point
    set protocols rstp interface ae9.0 no-root-port
    set protocols rstp interface ae21.0 mode point-to-point
    set protocols rstp interface ae21.0 no-root-port
    set protocols rstp bpdu-block-on-edge
    

    The thought of no-mac-learning configuration on CORE3-NEW to solve the issue was brought up by a vendor we work with, we havn't checked it yet.   I am trying to understand exactly what happened and why before we start trying out things causing a downtime again.

     

     



  • 7.  RE: Loop / flood in network after moving from EX to QFX

     
    Posted 04-30-2020 05:26

    @iNc0g - I thought you said said no-mac-learning "solved" your situation, not "The thought of no-mac-learning configuration on CORE3-NEW to solve the issue was brought up by a vendor we work with, we havn't checked it yet."

     

    I extremely doubt that setting on mirrored output port would have any affect, especially for the situation you were reporting.

     

    Good luck



  • 8.  RE: Loop / flood in network after moving from EX to QFX

    Posted 05-01-2020 18:12

    Hello iNc0g,

     

    So based in the outputs, it seems that core1 is the root bridge so all the ports must be forwarding, if that is the case, the trunk between core3-new and the UCS must be blocked, but based in the ouputs, it seems that everything is in forwarding. It sounds like the UCS device is not running rstp, so when you disabled the lacp interfaces you stopped the loop.

     

    Could you check the UCS rstp configuration? If the UCS is the root bridge then the port between core3-new and core1 should be blocked. 

     

    If this solves your problem, please mark this post as "Accepted Solution".

     

     

     



  • 9.  RE: Loop / flood in network after moving from EX to QFX

    Posted 05-02-2020 07:52
    This exact topology worked/works with CORE3, only when moving to CORE3-NEW the issues started.

    I noticed that on CORE3 theres rstp specifically on ae0 , wheres on CORE3-NEW theres rstp on all interfaces by default.

    I am trying to understand why isnt it happening on CORE3 and is happening on CORE3-NEW..


  • 10.  RE: Loop / flood in network after moving from EX to QFX

    Posted 05-03-2020 23:32

    Hi,

    on CORE3-NEW, the 2 ports connected to UCS-A + UCS-B are access ports not trunk, members of VOIP vlan.

    There is no rstp config on the cisco UCS.

     

    I am still baffled about why this configuration works on CORE3 but causes a loop/flood on CORE3-NEW, no one seems to have an explanation for that, only theories about the solution.



  • 11.  RE: Loop / flood in network after moving from EX to QFX

    Posted 05-04-2020 11:40

    Hello iNc0g,

     

    From what I see the switch is working as expected, forwarding the traffic. It would be good to compare the full configuration of both units to see if we are missing something.

     

    Is the role of the ports going to the UCS's only to receive the mirrored traffic and forward it downstream, or they also send traffic upstream? As a WA, If the role is only to mirror the traffic, you can try to delete the vlan from the ports and leave them only with family ethernet-switching, that should keep the analyzer up and break the loop. 

     

    This is from a lab switch

     

    {master:0}[edit]
    root@R1# show interfaces xe-0/0/0
    unit 0 {
    family ethernet-switching;
    }

    {master:0}[edit]
    root@R1# run show forwarding-options analyzer
    Analyzer name : test
    Mirror rate : 1
    Maximum packet length : 0
    State : up
    Ingress monitored interfaces : xe-0/0/0.0
    Egress monitored interfaces : xe-0/0/0.0
    Output interface : xe-0/0/1.0

     

    About the xSTP interfaces, the change you are seeing seems to be related to this KB33693 shared by jospina. In ELS devices (Core3-new) the interfaces-all is not implicit so it must be manually configured. Based in the output you have it in that way. In legacy, there is and implict all, so all the interfaces should be part of rstp by default, I guess you added ae0 later and manually add it, but it shouldn't make a difference. 

     

    CORE3-NEW# run show configuration | display set | match stp
    set protocols rstp interface all

     

    Please let me know if it helps!