Segment Routing Circle

  • 1.  Microloop avoidance in SR

    Posted 08-15-2020 21:43

    Hi All, 

     

    When we talk about SR, it brings up many challanges which was overtaken by RSVP earlier but when world is moving to SR which is purely stateless and IGP driven hence microloop is one of the key negative element comes along with it. TI-LFA promise you 50ms convergence which can not be achieved if Vendor's microloop avoidance implementation does not work properly. Microloop further comes along with local and remote microloop and play key role in overall convergence of the network. Any experience if anyone has while implementing this in your network ? What is juniper way of making this succussfull implementation ?

     



  • 2.  RE: Microloop avoidance in SR

     
    Posted 08-17-2020 11:21

    Praveen,

     

    From a JUNOS perspective, we will be adding support for micro-loop avoidance in ISIS within the next few releases.  We are leveraging the algorithms from our existing TI-LFA implementation to represent arbitrary post-convergence paths using compressed label stacks.  This allows us to support high levels of ECMP on the post-convergence paths.

     

    We are also trying to make the implementation deal with multiple failures in a graceful manner.  There are two extremes when doing MLA in the presence of multiple failures.  One extreme is to abort the use of the MLA paths whenever a multiple failure situation is detected.  The other extreme is to recompute and install MLA paths as more information about multiple failures is received over time.  The first extreme runs the risk of effectively deactivating MLA a good portion of the time. The second extreme runs the risk of doing more harm than good in a network experiencing a great deal of churn, by continually chasing after a final state that never stabilizes.

     

    The JUNOS implementation tries to strike a balance between these two extremes. When the code decides to do a micro-loop avoidance computation, the implementation takes into account all information available at that instant to compute the post-convergence paths that gets installed in the FIB.  As the router gets subsequent information from other LSPDUs that may arrive later, the code decides whether or not the post-convergence paths to a given destination are still valid based on the new information.  For post-convergence paths that are no longer valid, we revert to the primary path early only for those destinations.

     

    I am quite interested in hearing from other networks operators as well regarding micro-loop avoidance.

     

    Thanks,

    Chris

     



  • 3.  RE: Microloop avoidance in SR

    Posted 08-18-2020 19:06

    Thanks for Replying Chris. So how far Juniper is from Microloop development ?  do we have any little insight of how remote microloop avidance feature would work in juniper implmentation ?  Local microloop is easy fix by delaying route-installation in FIB but that is not good enough becuase one single failure in the topology bring churn in multiple nodes so it would be interesting to know how remote-microloop fix would be ?  I was thinking juniper has done this already 🙂

     

     



  • 4.  RE: Microloop avoidance in SR
    Best Answer

     
    Posted 08-19-2020 15:17

    I should have been clearer.  The description in my previous email relates to the upcoming JUNOS implementation for micro-loop avoidance when a remote link fails (draft-ietf-rtgwg-segment-routing-ti-lfa).   JUNOS has had the delay-based micro-loop avoidance feature that applies only to local links (RFC 8333) for several years.  We are actively working on the implementation of the remote MLA feature now, and we expect to release it within the next few JUNOS releases. 

     

    When a router running remote MLA detects a single link down event based on a received LSPDU, it computes the post convergence paths for all destinations.  Any destination that could be subject to micro-loops based on that link down event gets forwarding entries corresponding to the post-convergence path to that destination.  JUNOS uses a compressed label stack consisting of node-SIDs and possibly adj-SIDs to instantiate that post-convergence path.  Destinations that are not subject to micro-looping for that link-down just get the normal next-hop installed. 

     

    Instead of a single link-down event, the router may get LSPDUs corresponding to several remote link down events.  In that case, the router running remote MLA takes into account all of the information available to it from LSPDUs that it has received about the current state of topology at the time the computation is run to compute the post-convergence paths.  If all of the LSPDUs arrive at the computing router before the MLS computation is done, then everything just works. 

     

    However, if LSPDUs with information about some remote link failures arrive after the initial MLA computation is done, we don't try to change the post-convergence paths that have already been installed to take into account the new information.  We either determine that the previously installed path is still valid and continue to use it.  Or we determine that the subsequently failed link has made the post-convergence path invalid so we stop using it and revert to the normal primary next-hop.  We can deal with many combinations of link-down, link-up, and metric change events in this way.  We decide which paths are still valid based on the effect of any subsequent network changes on a given destination in the topology.

     

    In this way, the implementation tries to deal with multiple failures in a way that doesn't end up doing more harm than good when the network is experiencing a great deal of change.