Additional BFD information (distributed, centralized, additional considerations).
MNHA - Activeness
The highest priority determines the active node in the MNHA pair. On SRX-1 and SRX-2:
set chassis high-availability services-redundancy-group 1 activeness-priority 200
(Activeness configurations)
MNHA-SRX-1> show chassis high-availability services-redundancy-group 1
SRG failure event codes:
BF BFD monitoring
IP IP monitoring
IF Interface monitoring
CP Control Plane monitoring
Services Redundancy Group: 1
Deployment Type: HYBRID
Status: ACTIVE
Activeness Priority: 200
Preemption: DISABLED
Process Packet In Backup State: NO
Control Plane State: READY
System Integrity Check: N/A
Failure Events: NONE
Peer Information:
Peer Id: 2
Status : BACKUP
Health Status: HEALTHY
Failover Readiness: READY
(SRX-1 Active status for SRG-1)
MNHA-SRX-2> show chassis high-availability services-redundancy-group 1
SRG failure event codes:
BF BFD monitoring
IP IP monitoring
IF Interface monitoring
CP Control Plane monitoring
Services Redundancy Group: 1
Deployment Type: HYBRID
Status: BACKUP
Activeness Priority: 100
Preemption: DISABLED
Process Packet In Backup State: NO
Control Plane State: READY
System Integrity Check: COMPLETE
Failure Events: NONE
Peer Information:
Peer Id: 1
Status : ACTIVE
Health Status: HEALTHY
Failover Readiness: N/A
(SRX-2 Backup status for SRG-1)
MNHA - Signal Routes
The signal routes are arbitrary and installed locally to the SRXs. However, do not use active prefixes in your environment as this may cause unwanted connectivity issues with those production prefixes. The signal routes used are 169.254.200.1/32 (active) and 169.254.200.2/32 (backup).
set chassis high-availability services-redundancy-group 1 active-signal-route 169.254.200.1
set chassis high-availability services-redundancy-group 1 backup-signal-route 169.254.200.2
(Signal route configuration)
The active (highest activeness priority) will install the “active” signal route while the backup (lowest activeness priority) will install the “backup” signal route in the respective local routing tables.
The active SRX (SRX-1) installs the active signal route 169.254.200.1 and does not have an entry for the backup signal route 169.254.200.2.
MNHA-SRX-1> show route 169.254.200.0/30
inet.0: 43 destinations, 70 routes (43 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
169.254.200.1/32 *[Static/12] 16:24:08
Receive
MNHA-SRX-1>
(SRX-1 active signal route)
The backup SRX (SRX-2) installs the backup signal route 169.254.200.2 and does not have an entry for the active signal route 169.254.200.1.
MNHA-SRX-2> show route 169.254.200.0/30
inet.0: 41 destinations, 66 routes (41 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
169.254.200.2/32 *[Static/12] 16:24:47
Receive
MNHA-SRX-2>
(SRX-2 backup signal route)
BGP Export Policy
The export policy will advertise the 192.168.252.0/24 prefix to respective BGP peers with specified MED if the active route (10) or backup route (20) are present in the routing table.
The policy options and conditions are configured identically on the SRXs.
MNHA-SRX-1> show configuration policy-options condition ACTIVE_ROUTE_EXISTS_SRG1 | display set
set policy-options condition ACTIVE_ROUTE_EXISTS_SRG1 if-route-exists address-family inet 169.254.200.1/32
set policy-options condition ACTIVE_ROUTE_EXISTS_SRG1 if-route-exists address-family inet table inet.0
MNHA-SRX-1> show configuration policy-options condition BACKUP_ROUTE_EXISTS_SRG1 | display set
set policy-options condition BACKUP_ROUTE_EXISTS_SRG1 if-route-exists address-family inet 169.254.200.2/32
set policy-options condition BACKUP_ROUTE_EXISTS_SRG1 if-route-exists address-family inet table inet.0
(Policy Options)
MNHA-SRX-1> show configuration | display set | grep MNHA_
set policy-options policy-statement MNHA_ROUTE_POLICY term 1 from protocol direct
set policy-options policy-statement MNHA_ROUTE_POLICY term 1 from route-filter 192.168.252.0/24 exact
set policy-options policy-statement MNHA_ROUTE_POLICY term 1 from condition ACTIVE_ROUTE_EXISTS_SRG1
set policy-options policy-statement MNHA_ROUTE_POLICY term 1 then metric 10
set policy-options policy-statement MNHA_ROUTE_POLICY term 1 then accept
set policy-options policy-statement MNHA_ROUTE_POLICY term 2 from protocol direct
set policy-options policy-statement MNHA_ROUTE_POLICY term 2 from route-filter 192.168.252.0/24 exact
set policy-options policy-statement MNHA_ROUTE_POLICY term 2 from condition BACKUP_ROUTE_EXISTS_SRG1
set policy-options policy-statement MNHA_ROUTE_POLICY term 2 then metric 20
set policy-options policy-statement MNHA_ROUTE_POLICY term 2 then accept
set policy-options policy-statement MNHA_ROUTE_POLICY term 99 from protocol direct
set policy-options policy-statement MNHA_ROUTE_POLICY term 99 then metric 30
set policy-options policy-statement MNHA_ROUTE_POLICY term 99 then accept
set policy-options policy-statement MNHA_ROUTE_POLICY term default then reject
set protocols bgp group trust export MNHA_ROUTE_POLICY
(Export Policy)
RTR-1 preferred route to 192.168.252.0/24 is to SRX-1 (active – best metric) and RTR-2 preferred route to 192.168.252.0/24 is also to SRX-1 (active – best metric). If the link (BGP peers) from either RTR-1 or RTR-2 to SRX-1 fails, traffic will be routed across the cross connection between RTR-1 and RTR-2 to reach SRX-1.
MNHA-RTR-1> show route 192.168.252.0/24 receive-protocol bgp 10.10.99.5
inet.0: 45 destinations, 87 routes (45 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
* 192.168.252.0/24 10.10.99.5 10 65022 I
MNHA-RTR-1> show route 192.168.252.0/24 receive-protocol bgp 10.10.99.131
inet.0: 45 destinations, 87 routes (45 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
192.168.252.0/24 10.10.99.131 20 65022 I
MNHA-RTR-1> show route 192.168.252.0/24
inet.0: 45 destinations, 87 routes (45 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.252.0/24 *[BGP/170] 16:31:53, MED 10, localpref 100
AS path: 65022 I, validation-state: unverified
> to 10.10.99.5 via ge-0/0/0.0
[BGP/170] 00:17:46, MED 10, localpref 100
AS path: 65022 I, validation-state: unverified
> to 10.10.101.2 via ge-0/0/3.101
[BGP/170] 16:31:22, MED 20, localpref 100
AS path: 65022 I, validation-state: unverified
> to 10.10.99.131 via ge-0/0/2.0
(RTR-1 route to 192.168.252.0/24)
MNHA-RTR-2> show route 192.168.252.0/24 receive-protocol bgp 10.10.99.69
inet.0: 45 destinations, 87 routes (45 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
* 192.168.252.0/24 10.10.99.69 10 65022 I
MNHA-RTR-2> show route 192.168.252.0/24 receive-protocol bgp 10.10.99.195
inet.0: 45 destinations, 87 routes (45 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
192.168.252.0/24 10.10.99.195 20 65022 I
MNHA-RTR-2> show route 192.168.252.0/24
inet.0: 45 destinations, 87 routes (45 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.252.0/24 *[BGP/170] 00:19:32, MED 10, localpref 100
AS path: 65022 I, validation-state: unverified
> to 10.10.99.69 via ge-0/0/2.0
[BGP/170] 16:33:38, MED 10, localpref 100
AS path: 65022 I, validation-state: unverified
> to 10.10.101.1 via ge-0/0/3.101
[BGP/170] 15:04:14, MED 20, localpref 100
AS path: 65022 I, validation-state: unverified
> to 10.10.99.195 via ge-0/0/0.0
(RTR-2 route to 192.168.252.0/24)
MNHA – L2 Default Gateways
The active SRX will use 192.168.253.1 for VLAN 500 and will use 192.168.252.254 for the north bound L2 hand-off. A failover event will trigger a GARP advertisement to update the connected switching tables when the IP moves from the previously active SRX to the newly active SRX.
set chassis high-availability services-redundancy-group 1 virtual-ip 1 ip 192.168.252.254/24
set chassis high-availability services-redundancy-group 1 virtual-ip 1 interface ge-0/0/2.0
set chassis high-availability services-redundancy-group 1 virtual-ip 2 ip 192.168.253.1/24
set chassis high-availability services-redundancy-group 1 virtual-ip 2 interface ge-0/0/0.500
(VIP configuration)
The active SRX (SRX-1) will install the VIP while the backup SRX will show that the VIP is not installed.
Virtual IP Info:
Index: 2
IP: 192.168.253.1/24
VMAC: N/A
Interface: ge-0/0/0.500
Status: INSTALLED
Index: 1
IP: 192.168.252.254/24
VMAC: N/A
Interface: ge-0/0/2.0
Status: INSTALLED
(SRX-1 – Active)
Virtual IP Info:
Index: 2
IP: 192.168.253.1/24
VMAC: N/A
Interface: ge-0/0/0.500
Status: NOT INSTALLED
Index: 1
IP: 192.168.252.254/24
VMAC: N/A
Interface: ge-0/0/2.0
Status: NOT INSTALLED
(SRX-2 – Backup)
You can use a virtual-mac that will float with the active IP address, otherwise, the physical MAC of the interface will be used.
MNHA-SRX-1> show interfaces ge-0/0/0 | grep address:
Current address: 00:0c:29:3d:90:62, Hardware address: 00:0c:29:3d:90:62
MNHA-SRX-1> show interfaces ge-0/0/2 | grep address:
Current address: 00:0c:29:3d:90:76, Hardware address: 00:0c:29:3d:90:76
MNHA-SRX-2> show interfaces ge-0/0/0 | grep address:
Current address: 00:0c:29:ed:4b:42, Hardware address: 00:0c:29:ed:4b:42
MNHA-SRX-2> show interfaces ge-0/0/2 | grep address:
Current address: 00:0c:29:ed:4b:56, Hardware address: 00:0c:29:ed:4b:56
(SRX-1 and SRX-2 Hardware Addresses)
Hosts within the L2 domains respective gateways pointing to the VIP addresses on the SRXs shown in Figure 5.
Figure 5. Hosts’ Default Gateways
Monitoring and Failover Criteria
There are several combinations of monitoring thresholds and weights. For this example, BFD and interface monitoring is configured such that:
- If a single L3 BFD monitored interface fails, SRG failover will not occur (weight 50)
- If both L3 BFD monitored interfaces fails, SRG failover will occur (weight 100)
- If a single L2 interface fails, SRG failover will occur (weight 100)
[edit chassis high-availability services-redundancy-group 1]
MNHA-SRX-2# show | display set | grep monitor
…monitor-object BFD_UPLINKS object-threshold 100
…monitor-object BFD_UPLINKS bfd-liveliness threshold 100
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.193 src-ip 10.10.99.195
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.193 session-type singlehop
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.193 interface ge-0/0/0.10
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.193 weight 50
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.129 src-ip 10.10.99.131
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.129 session-type singlehop
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.129 interface ge-0/0/1.0
…monitor-object BFD_UPLINKS bfd-liveliness destination-ip 10.10.99.129 weight 50
…monitor-object L2-GWS object-threshold 100
…monitor-object L2-GWS interface threshold 100
…monitor-object L2-GWS interface interface-name ge-0/0/2 weight 100
…monitor-object L2-GWS interface interface-name ge-0/0/1 weight 100
…monitor srg-threshold 100
(SRG Monitored Objects)
Traffic Flows
Traffic flows during a normalized state, with SRX-1 as the active node, traffic will first hit the hosts’ respective default gateways. For the MNHA-L3 flow (Figure 6), the SRX-1 has an ECMP route to the destination and flows will be routed accordingly.
Figure 6. MNHA-L3 Traffic Flow
With the MNHA-DFG (Figure 7) scenario, return traffic (or originating traffic) from 192.168.253.244 will be sent to its default gateway on the active SRX.
Figure 7. MNHA-DFG Traffic Flow
During failover testing, a L3 flow is represented as a long-lived (TCP 5202) session from 192.168.99.100, internal - behind RTR-X, to 192.168.252.99, external. An L2 flow is represented as a long-lived (TCP 5201) session from 192.168.253.244, internal, to 192.168.252.99, external.
Sessions on the active SRX will be installed in the table with an HA Wing state of Active. Session information, under normalized conditions:
MNHA-SRX-1> show security flow session destination-prefix 192.168.252.99/32 protocol tcp
Session ID: 118631, Policy name: default-policy-logical-system-00/2, HA State: Active, Timeout: 1800, Session State: Valid
In: 192.168.253.244/55293 --> 192.168.252.99/5202;tcp, Conn Tag: 0x0, If: ge-0/0/0.500, Pkts: 565563, Bytes: 846279017, HA Wing State: Active,
Out: 192.168.252.99/5202 --> 192.168.253.244/55293;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 93439, Bytes: 3737596, HA Wing State: Active,
Session ID: 119130, Policy name: default-policy-logical-system-00/2, HA State: Active, Timeout: 1800, Session State: Valid
In: 192.168.99.100/52650 --> 192.168.252.99/5201;tcp, Conn Tag: 0x0, If: ge-0/0/0.10, Pkts: 244665, Bytes: 366100861, HA Wing State: Active,
Out: 192.168.252.99/5201 --> 192.168.99.100/52650;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 23385, Bytes: 938184, HA Wing State: Active,
Total sessions: 2
(Active SRX Security Flows)
Sessions on the backup SRX will be installed in the table with an HA Wing state of Warm.
MNHA-SRX-2> show security flow session destination-prefix 192.168.252.99/32 protocol tcp
Session ID: 114931, Policy name: default-policy-logical-system-00/2, HA State: Warm, Timeout: 13604, Session State: Valid
In: 192.168.253.244/55293 --> 192.168.252.99/5202;tcp, Conn Tag: 0x0, If: ge-0/0/0.500, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Out: 192.168.252.99/5202 --> 192.168.253.244/55293;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Session ID: 115414, Policy name: default-policy-logical-system-00/2, HA State: Warm, Timeout: 13942, Session State: Valid
In: 192.168.99.100/52650 --> 192.168.252.99/5201;tcp, Conn Tag: 0x0, If: ge-0/0/0.10, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Out: 192.168.252.99/5201 --> 192.168.99.100/52650;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Total sessions: 2
(Backup SRX Security Flows)
Active Forwarding Path – BFD Detection
If there is failure on the active forwarding path in the bow-tie design, traffic will be instantly routed across the cross connect between RTR-1 and RTR-2 (MED 10) and a failover event will not occur as shown below with the change of interface on session (ID 119130) from 0/0/0.10 to 0/0/1.0. The L2 forwarding paths were not changed (0/0/0.500).
MNHA-SRX-1> show bfd session
Detect Transmit
Address State Interface Time Interval Multiplier
10.10.99.1 Down ge-0/0/0.10 1.500 2.000 3
10.10.99.65 Up ge-0/0/1.0 1.500 0.500 3
100.0.0.1 Up 2.000 0.400 5
3 sessions, 5 clients
Cumulative transmit rate 5.0 pps, cumulative receive rate 6.5 pps
jadmin@MNHA-SRX-1> show security flow session destination-prefix 192.168.252.99/32 protocol tcp
Session ID: 118631, Policy name: default-policy-logical-system-00/2, HA State: Active, Timeout: 1800, Session State: Valid
In: 192.168.253.244/55293 --> 192.168.252.99/5202;tcp, Conn Tag: 0x0, If: ge-0/0/0.500, Pkts: 1166223, Bytes: 1745079945, HA Wing State: Active,
Out: 192.168.252.99/5202 --> 192.168.253.244/55293;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 186388, Bytes: 7455580, HA Wing State: Active,
Session ID: 119130, Policy name: default-policy-logical-system-00/2, HA State: Active, Timeout: 1800, Session State: Valid
In: 192.168.99.100/52650 --> 192.168.252.99/5201;tcp, Conn Tag: 0x0, If: ge-0/0/1.0, Pkts: 854061, Bytes: 1277977581, HA Wing State: Active,
Out: 192.168.252.99/5201 --> 192.168.99.100/52650;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 79932, Bytes: 3212636, HA Wing State: Active,
Total sessions: 2
(Active forwarding path failure, SRX-1)
With an active fault condition, single BFD session down between SRX-1 and RTR-1, the backup SRX session table remains unchanged; no failover event triggered.
MNHA-SRX-2> show security flow session destination-prefix 192.168.252.99/32 protocol tcp
Session ID: 114931, Policy name: default-policy-logical-system-00/2, HA State: Warm, Timeout: 13070, Session State: Valid
In: 192.168.253.244/55293 --> 192.168.252.99/5202;tcp, Conn Tag: 0x0, If: ge-0/0/0.500, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Out: 192.168.252.99/5202 --> 192.168.253.244/55293;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Session ID: 115414, Policy name: default-policy-logical-system-00/2, HA State: Warm, Timeout: 13408, Session State: Valid
In: 192.168.99.100/52650 --> 192.168.252.99/5201;tcp, Conn Tag: 0x0, If: ge-0/0/0.10, Pkts: 0, Bytes: 0, HA Wing State: Warm,
Out: 192.168.252.99/5201 --> 192.168.99.100/52650;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 0, Bytes: 0, HA Wing State: Warm,
(Active forwarding path failure, SRX-2)
Failover Event – BFD Detection
If both redundant BFD sessions fail, a failover event is triggered (Figure 8). The initially active SRX (SRX-1) transitions into an Ineligible state while the previously backup SRX (SRX-2) moves into the active state. The sessions transition from Warm to Active on the newly active SRX-2 and vice versa for SRX-1.
MNHA-SRX-1> show bfd session
Detect Transmit
Address State Interface Time Interval Multiplier
10.10.99.1 Down ge-0/0/0.10 1.500 2.000 3
10.10.99.65 Down ge-0/0/1.0 1.500 2.000 3
100.0.0.1 Up 2.000 0.400 5
3 sessions, 5 clients
Cumulative transmit rate 3.5 pps, cumulative receive rate 6.5 pps
jadmin@MNHA-SRX-1> show security flow session destination-prefix 192.168.252.99/32 protocol tcp
Session ID: 118631, Policy name: default-policy-logical-system-00/2, HA State: Warm, Timeout: 1808, Session State: Valid
In: 192.168.253.244/55293 --> 192.168.252.99/5202;tcp, Conn Tag: 0x0, If: ge-0/0/0.500, Pkts: 1507953, Bytes: 2256429529, HA Wing State: Warm,
Out: 192.168.252.99/5202 --> 192.168.253.244/55293;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 240892, Bytes: 9635740, HA Wing State: Warm,
Session ID: 119130, Policy name: default-policy-logical-system-00/2, HA State: Warm, Timeout: 1806, Session State: Valid
In: 192.168.99.100/52650 --> 192.168.252.99/5201;tcp, Conn Tag: 0x0, If: ge-0/0/1.0, Pkts: 1202249, Bytes: 1798992517, HA Wing State: Warm,
Out: 192.168.252.99/5201 --> 192.168.99.100/52650;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 111920, Bytes: 4496748, HA Wing State: Warm,
Total sessions: 2
(BFD Triggered Failover – SRX-1)
MNHA-SRX-2> show bfd session
Detect Transmit
Address State Interface Time Interval Multiplier
10.10.99.129 Up ge-0/0/1.0 1.500 0.500 3
10.10.99.193 Up ge-0/0/0.10 1.500 0.500 3
100.0.0.0 Up 2.000 0.400 5
3 sessions, 5 clients
Cumulative transmit rate 6.5 pps, cumulative receive rate 6.5 pps
jadmin@MNHA-SRX-2> show security flow session destination-prefix 192.168.252.99/32 protocol tcp
Session ID: 114931, Policy name: default-policy-logical-system-00/2, HA State: Active, Timeout: 1798, Session State: Valid
In: 192.168.253.244/55293 --> 192.168.252.99/5202;tcp, Conn Tag: 0x0, If: ge-0/0/0.500, Pkts: 26550, Bytes: 39728240, HA Wing State: Active,
Out: 192.168.252.99/5202 --> 192.168.253.244/55293;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 4299, Bytes: 171960, HA Wing State: Active,
Session ID: 115414, Policy name: default-policy-logical-system-00/2, HA State: Active, Timeout: 1798, Session State: Valid
In: 192.168.99.100/52650 --> 192.168.252.99/5201;tcp, Conn Tag: 0x0, If: ge-0/0/0.10, Pkts: 26291, Bytes: 39337804, HA Wing State: Active,
Out: 192.168.252.99/5201 --> 192.168.99.100/52650;tcp, Conn Tag: 0x0, If: ge-0/0/2.0, Pkts: 2998, Bytes: 123396, HA Wing State: Active,
Total sessions: 2
(BFD Triggered Failover – SRX-2)
Figure 8 visualizes the BFD detection time, impacting L3 traffic for 1.5 seconds.
Figure 8. BFD triggered Failover
Failover Event – Reboot Active Node and Split-Brain Detection
With hybrid deployments, when there is impact to the ICL connectivity, for example losing power to the active node, the mechanism to tell the backup SRX for an SRG group to transition to active is gone. Once the configurable detection time expires, 2 seconds in these scenarios, either an ICMP or BFD based split-brain detection initializes. Split-brain detection is leveraged to ensure that there is not an opportunity for the VIP to be active on both SRXs (duplicate IP address) and negatively impacting traffic flows.
ICMP activeness is the default method for split-brain detection. The following example uses a 1 second probe, requiring 3 failed responses to declare the SRX as active for the VIP – triggering the GARP process (if previously backup for the SRG). The default gateway, VIP, move can have an impact to traffic between 3-8 seconds with ICMP detection.
set chassis high-availability services-redundancy-group 1 activeness-probe minimum-interval 1000
set chassis high-availability services-redundancy-group 1 activeness-probe multiplier 3
(ICMP Activeness Configuration)
Figure 9. Reboot Active SRG SRX ICMP Based
Notice that L3 traffic, traffic routed towards the destination versus switched, was not impacted. The interfaces were hard down during the reboot, traffic routed from SRX-1 (00:0C:29:3D:90:76) to 192.168.252.99 via SRX-2 (00:0C:29:ED:4B:56) (Figure 10).
Figure10. Reboot Active SRG SRX - L3 flow
Single-hop BFD is an alternative to ICMP based split-brain detection. While ICMP based split-brain detection initiates after an ICL failure is detected, BFD is constantly running on the SRG and can achieve quicker default gateway failovers. Be aware that you can not configure both ICMP and BFD split-brain detections simultaneously.
Additional details about Split-Brain Detection.
Preemption
With preemption, there is an automatic fail-back once the monitoring criteria is met. The BFD BGP hold-down timer mentioned previously is simply used to illustrate potential convergence/instability issues along the active SRXs forwarding path(s). In the following example (Figure 11), SRX-1 preempts becoming active. L2 traffic is not impacted (with interface failover). However, the L3 traffic is impacted by the duration of the hold-down timer (10s); the BGP peers are not up while SRX-1 is active.
Failover event graphs were from packet captures taken on 192.168.252.99 (L2 external host).
Figure 11. Preempt Failover/Failback with hold-down
Figure 12 shows the same traffic flows during failover and a preempted failback, without the BGP BFD hold-down timer configured.
Figure 12. Preempt Failover/Failback without hold-down
request chassis high-availability failover peer-id 1 services-redundancy-group 1
Figure 13 shows the same failover, with BGP BFD hold down (10s) configured, without preemption and a manual fail back to SRX-1 from SRX-2.
Figure 13. Failover/Manual Failback withhold-down
Alternative Designs
iBGP Considerations
Under normalized operations, no failed links, a bow-tie design with iBGP and MNHA operates as expected – symmetrical traffic to the active SRX. However, there may be an issue during an active failed link event without any additional design considerations. For example, if there was a BGP peer failure on the active forwarding path, let’s say RTR-1 to SRX-1, traffic will not flow across the inter-connect between RTR-1 and RTR-2 to reach SRX-1. Instead, traffic will flow from the link between RTR-1 to SRX-2 (backup SRX) creating an asymmetric flow and causing stateful flows to break. This is due to BGP’s loop prevention, preventing iBGP peers from forwarding iBGP learned prefixes (SRX-1/2) to other iBGP neighbors (RTR-1/2).
Figure 14. Asymmetrical Traffic Flow
To mitigate stateful traffic disruptions during an active failed path, with an iBGP bowtie configuration, implement an IGP with appropriate redistribution or a “floating static route” between RTR-1 and RTR-2. Another mitigation method is to use ICD (Inter Chassis Datapath) link(s).
LAG/No Bow-tie
The same level of physical redundancy can be achieved by deploying 802.11ad (LACP) aggregate interfaces (Figure 15) between the routers and SRXs without “crossing” the links. Instead of ECMP load-balancing, the load-balancing would be with the LAG’s hashing algorithm.
Figure 15. LAG Uplinks
Active/Active Service Redundancy Groups
Another alternative would be to use 2 SRGs (Service Redundancy Groups) in an active/active (Figure 16) scenario such that the redundant L3 links from the routers are split between the 2 groups. This would also require an additional VIP.
Figure 16. Active-Active Multiple SRG
Summary
With any high-availability design prudent planning is imperative. Understanding the requirements, goals, capabilities and integration with the network provides for a successful MNHA deployment:
- Which MNHA model to deploy Layer 3, Layer 2 (Default Gateway) or Hybrid?
- Active/Active or Active/Backup?
- If L3, what protocols will be used to integrate with the network (eBGP, iBGP, OSPF)?
- If L2, what type of fabric to integrate? Spanning Tree concerns?
- Any convergence issues?
- How quick to failover? Automatic Failback?
- What mechanism to track for failover (BFD, IP, Interface, or combination of the 3)?
Once decided, it’s just a matter of testing and configuring your deployment.
Useful links
Glossary
- AS – Autonomous System
- BFD – Bi-Directional Forwarding Detection
- BGP – Border Gateway Protocol
- HA – High Availability
- ICD – Inter Chassis Datapath Link
- ICL – Inter-Chassis Link
- L2 – Layer 2
- L3 – Layer 3
- MED – Multi-Exit Discriminator
- SRG – Services Redundancy Group
- VIP – Virtual IP
Acknowledgements
Karel Hendrych reviewing and sanity checking. Steven Jacques providing excellent foundation MNHA post to build on.