Junos OS

 View Only
last person joined: yesterday 

Ask questions and share experiences about Junos OS.
  • 1.  MX240 MCLAG-ICCP Link BFD Flapping

    Posted 12-27-2016 05:59

    Hi There,

     I have 2 Juniper MX 240 running MCLAG and ICCP with ICL interfaces. We have irbs configured and connected to the LAN side of the network. Both routers are using very little CPU. For the ICCP liveliness detection we have the following. Both routers have the same configs except the peer IPs

     

    show protocols iccp    

    local-ip-addr 10.10.10.1;

    peer 10.10.10.2 {

        redundancy-group-id-list 1;

        backup-liveness-detection {

            backup-peer-ip 172.16.255.201;

        }

        liveness-detection {

            minimum-receive-interval 1000;

            multiplier 1;

            transmit-interval {

                minimum-interval 1000;

            }

            detection-time {

                threshold 2000;

            }

    Router1

    show bfd session    

                                                      Detect   Transmit

    Address                  State     Interface      Time     Interval  Multiplier

    10.10.10.1               Up                       1.000     0.900        1   

     

    1 sessions, 1 clients

    Cumulative transmit rate 1.1 pps, cumulative receive rate 1.0 pps

    Client ICCP realm 10.10.10.2, TX interval 1.000, RX interval 1.000

    Session up time 03:59:30, previous down time 00:00:02

    Local diagnostic None, remote diagnostic None

    Remote state Up, version 1

    Session type: Multi hop BFD

    Min async interval 1.000, min slow interval 1.000

    Adaptive async TX interval 1.000, RX interval 1.000

    Local min TX interval 1.000, minimum RX interval 1.000, multiplier 1

    Remote min TX interval 1.000, min RX interval 1.000, multiplier 1

    Threshold for detection time 2.000

    Local discriminator 18, remote discriminator 18

    Echo mode disabled/inactive

    Multi-hop route table 0, local-address 10.10.10.1

      Session ID: 0x0

     

    1 sessions, 1 clients

    Cumulative transmit rate 1.1 pps, cumulative receive rate 1.0 pps

     

     

     

     

     

    Router 2

    show bfd session               

                                                      Detect   Transmit

    Address                  State     Interface      Time     Interval  Multiplier

    10.10.10.2               Up                       1.000     0.900        1   

     

    Client ICCP realm 10.10.10.1, TX interval 1.000, RX interval 1.000

    Session up time 04:01:29, previous down time 00:00:01

    Local diagnostic None, remote diagnostic None

    Remote state Up, version 1

    Session type: Multi hop BFD

    Min async interval 1.000, min slow interval 1.000

    Adaptive async TX interval 1.000, RX interval 1.000

    Local min TX interval 1.000, minimum RX interval 1.000, multiplier 1

    Remote min TX interval 1.000, min RX interval 1.000, multiplier 1

    Threshold for detection time 2.000

    Local discriminator 18, remote discriminator 18

    Echo mode disabled/inactive

    Multi-hop route table 0, local-address 10.10.10.2

      Session ID: 0x0

     

    1 sessions, 1 clients

    Cumulative transmit rate 1.1 pps, cumulative receive rate 1.0 pps

     

    The above BFD session is frequently flapping with the error that

     

    Router 1 Same time

     

    BFD Session 10.10.10.2 (IFL 0) state Up -> Down LD/RD(18/18) Up time:1w4d 08:08 Local diag: NbrSignal Remote diag: CtlExpire Reason: Received DO

    WN from PEER.

    Dec 27 08:18:40  edge1.da1 bfdd[3903]: BFDD_TRAP_MHOP_STATE_DOWN: local discriminator: 18, new state: down, peer addr: 10.10.10.2

    Dec 27 08:18:40  edge1.da1 lacpd[24612]: mcae_icl_event_handle_icl_up_iccp_down_from_iccp_up: for mcae ae7 preferred_active is TRUE

    Dec 27 08:18:40  edge1.da1 mib2d[3892]: SNMP_TRAP_LINK_DOWN: ifIndex 563, ifAdminStatus up(1), ifOperStatus down(2), ifName ae6.0

    Dec 27 08:18:41  edge1.da1 rpd[3949]: RPD_OSPF_NBRDOWN: OSPF neighbor 216.52.72.9 (realm ospf-v2 irb.3312 area 0.0.0.0) state changed from Full to Init due to 1WayRcvd (event reason:

    neighbor is in one-way mode)

     

    Router 2: same time

     

    BFD Session 10.10.10.1 (IFL 0) state Up -> Down LD/RD(18/18) Up time:1w4d 08:08 Local diag: CtlExpire Remote diag: None Reason: Detect Timer Expiry.

     

    Any idea how to tune the BFD settings and make the BFD stable. 

     


    #mclag
    #BFD
    #iccp


  • 2.  RE: MX240 MCLAG-ICCP Link BFD Flapping

    Posted 12-27-2016 08:45

    Hi,

     

    Please note that BFD session for ICCP is multi-hop BFD session which works in cetralized mode. This means that keepalive exchange handled by the Routing Engine. It is not recommended to use very aggresive timers for BFD when it is operating in centralized mode. For more details please refer to these links:

     

    Multichassis Link Aggregation Guidelines

    multichassis link aggregation with ICCP

    troubleshooting checklist - bfd

     

    Hope this helps.

     

    Thanks



  • 3.  RE: MX240 MCLAG-ICCP Link BFD Flapping

    Posted 12-27-2016 09:02

    Eventhough it is multihop and handled in RE, it is just a next hop to routers. However do you any idea what is the recommended values for multihop mode handled through RE. Additionally all of the configs refers QFX platforms not for MX. 

     

     



  • 4.  RE: MX240 MCLAG-ICCP Link BFD Flapping

    Posted 12-27-2016 22:11

    The documents which I shared also applies to MX. As the BFD for ICCP is a multihop BFD session which is handled by routing engine, timers should not be very aggresive. The timers you have used should be good, only thing I see here is the multiplier, which is set to 1. This number specifies - Maximum allowable number of liveness detection requests missed by the peer. The default value for it is 3, but here you have used as 1 which means just one packet lost is enough to bring the session down. If possible can you keep this value to default and check?

     

    Thanks

     



  • 5.  RE: MX240 MCLAG-ICCP Link BFD Flapping

    Posted 12-28-2016 08:23

    Based on the reading and your suggesstion can you suggest me the importance of session-establishment hold time on the ICCP BFD setting? Can I set it along with mutlipler value around 3?



  • 6.  RE: MX240 MCLAG-ICCP Link BFD Flapping

    Posted 12-29-2016 22:46

    There is no dependency of multiplier value and session-establishment-hold-time.

    session-establishment-hold-time specify the time during which an Inter-Chassis Control Protocol (ICCP) connection must be established between peers. Configured session establishment hold time results in faster ICCP connection establishment. The recommended value is 50 seconds.

     

    http://www.juniper.net/techpubs/en_US/junos/topics/task/configuration/interfaces-configuring-multi-chassis-link-aggregation.html

     

    Hope this helps.

     

    Thanks

     



  • 7.  RE: MX240 MCLAG-ICCP Link BFD Flapping

     
    Posted 12-28-2016 11:53

    Hi Folks,

    To add.. please find some pointers on BFD timers specific to  QFX Series.

     

    Configure the ICCP liveness-detection interval (the BFD timer) to be at least 8 seconds if you have configured ICCP connectivity through an IRB interface. A liveness-detection interval of 8 seconds or more allows graceful Routing Engine switchover (GRES) to work seamlessly. By default, ICCP liveness detection uses multihop BFD, which runs in centralized mode.

     

    This recommendation does not apply if you have configured ICCP connectivity through a dedicated physical interface. In this case, you can configure single-hop BFD.

    Use the peer loopback address to establish ICCP peering. Doing so avoids any direct link failure between MC-LAG peers. As long as the logical connection between the peers remains up, ICCP stays up.

     

    http://www.juniper.net/documentation/en_US/junos16.1/topics/concept/lag-multichassis--guidelines.html

     

    -A.Rengaramalingam