Routing

Expand all | Collapse all

EVPN VXLAN Overlay Peering

  • 1.  EVPN VXLAN Overlay Peering

     
    Posted 15 days ago

    Greetings-

    I have a simple two-leaf evpn-vxlan setup with a single esi-lag configured. When rebooting one of the leafs (or disabling and reenabling all interfaces) I'm seeing the leaf-to-leaf overlay bgp neighborship sit in active state and take up to 30 seconds to peer even though the underlay has fully established (which is very quick) and I have layer 3 connectivity between the two leafs. There appears to be some sort of rolling 30-second holddown that the ibgp peering is experiencing that the underlay ebgp is not, with the result being a 1-30 second delay. I've tried playing with hold-down settings but don't see any change in behavior. The issue is that the esi-lag interface comes up before peering has established, resulting in traffic from the host being blackholed. I've set a 60-second hold-time up on the lag interfaces, which allows enough time for peering to catch up, but I'd like to know if there is a way to peer quickly once layer 3 is established between the leafs.

    group OVERLAY {
                type internal;
                local-address 10.92.101.194;
                family evpn {
                    signaling;
                }
                local-as 64830;
                multipath;
                bfd-liveness-detection {
                    minimum-interval 1000;
                    multiplier 3;
                    session-mode automatic;
                }
                neighbor 10.92.101.195;
            }



    -Paul



  • 2.  RE: EVPN VXLAN Overlay Peering

     
    Posted 12 days ago
    Hi Paul,

    During the 30-second window, can you grab the output of show bgp summary and see what state the OVERLAY group session is in?

    It sounds a bit like the OVERLAY group session might be failing to connect initially (neighbour address may not yet be reachable), then having to wait for a full session timeout before trying again.  You could prove this by configuring static routes from [L1]->[S1]->[L2 Peering Address] and [L2]->[S1]->[L1 Peering Address] and seeing if the session comes up faster.


    ------------------------------
    Cheers,

    Ben Dale
    JNCIE-SEC #63
    JNCIP-SP
    JNCIP-ENT
    JNCIP-DC
    ------------------------------



  • 3.  RE: EVPN VXLAN Overlay Peering

     
    Posted 12 days ago
    I think you've accurately described the situation. After setting static routes and disabling the underlay I see the following
    root@test> show bgp summary
    Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194 64830 13 13 0 3 1 Active

    root@test> show bgp summary
    Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194 64830 13 13 0 3 2 Active

    . . .

    root@test> show bgp summary
    Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194 64830 13 13 0 3 29 Active

    root@test> show bgp summary
    Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194 64830 13 13 0 3 30 Active

    root@test> show bgp summary
    Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194 64830 13 13 0 3 31 Active

    root@test> show bgp summary
    Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194 64830 1 1 0 3 0 Establ




  • 4.  RE: EVPN VXLAN Overlay Peering

     
    Posted 12 days ago
    Oddly, this new forum is silently failing when I try to post this remaining  text, so here it is as an image.



  • 5.  RE: EVPN VXLAN Overlay Peering

    Posted 12 days ago
    This is exactly how I've solved this problem in the past, hold timers on the interfaces. It's a pain but unfortunately not really any other option. You will find that interface hold down is essential when using a chassis-based spine due to the varying convergence times during a reload/card reset etc.

    ------------------------------
    DANIEL HEARTY
    Principal Engineer
    TELENT TECHNOLOGY SERVICES LIMITED
    Southampton
    ------------------------------



  • 6.  RE: EVPN VXLAN Overlay Peering

    Posted 12 days ago

    How is the underlay routing done? If it's OSPF are the interfaces configured as p2p? That might help building up the underlay faster so the overlay can get there... Just a thought.

     

    Michel

     -------------------------------------------

    Original Message:
    Sent: 11/18/2020 6:58:00 AM
    From: daniel.hearty
    Subject: RE: EVPN VXLAN Overlay Peering

    This is exactly how I've solved this problem in the past, hold timers on the interfaces. It's a pain but unfortunately not really any other option. You will find that interface hold down is essential when using a chassis-based spine due to the varying convergence times during a reload/card reset etc.

    ------------------------------
    DANIEL HEARTY
    Principal Engineer
    TELENT TECHNOLOGY SERVICES LIMITED
    Southampton
    ------------------------------



  • 7.  RE: EVPN VXLAN Overlay Peering

     
    Posted 12 days ago
    Thanks all for the feedback. the overlay convergence delay occurs identically with either a ebgp or static routing underlay. I'll continue with a hold up timer on the LAG interfaces then. It's still curious that the ebgp underlay peers immediately but the ibgp doesn't.  As Michel suggested, is EBGP smart enough to know when its peering over a directly-connected segment, and skips timers?


  • 8.  RE: EVPN VXLAN Overlay Peering

     
    Posted 12 days ago
    I think you've described the situation accurately--after peering is lost, every 30 seconds the hosts attempt to reconnect. This is what I see after configuring static routes (to rule out the underlay), and clearing the underlay BGP neighborship:

    root@test> show bgp summary          
    Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194         64830         13         13       0       3           1 Active

    root@test> show bgp summary
    Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194         64830         13         13       0       3           2 Active

    . . .

    root@test> show bgp summary    
    Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194         64830         13         13       0       3          29 Active

    root@test> show bgp summary    
    Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194         64830         13         13       0       3          30 Active

    root@test> show bgp summary    

    Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194         64830         13         13       0       3          31 Active

    root@test> show bgp summary    
    Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
    10.92.101.194         64830          1          1       0       3           0 Establ

    Consistently, after exactly 31 seconds, it establishes.  Setting bgp hold-time to 5 seconds has no effect. So until the IBGP peering is established, inbound traffic from the ESI-LAG interface is blackholed. I believe that this situation may be unique to cases where no-core-isolation is used (as it is here), namely a two-leaf environment. Without no-core-isolation set, ESI-LAG interfaces are set to standby mode whenever the IBGP peering state is down--this prevents inbound traffic from using the interface until IBGP is established. With only two leafs you don't want your remaining leaf to shutdown its ESI-LAG interfaces if its only peer goes down, so no-core-isolation is used to prevent this. However the side effect on your recovering peer is that until IBGP has established you will blackhole inbound traffic.

    So I guess the question is, why does the underlay peering not see this same delay? Is it something unique to ebgp vs ibgp ?