Failover doesn't work between HA SRX240

View Only

last person joined: yesterday

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.

Back to discussions

Expand all | Collapse all

Failover doesn't work between HA SRX240

Jump to Best Answer

1. Failover doesn't work between HA SRX240

Recommend

Erdem

Posted 09-10-2009 21:11

Hi all,

I have configured a cluster between 2 SRX240 and routing seems to work fine.

Failover doesn't work.

HA config is as follows:

## Last commit: 2009-09-10 06:37:49 UTC by root
version 9.6R1.13;
groups {
    node1 {
        system {
            host-name au.fw_node1;
            root-authentication {
                encrypted-password "XXX"; ## SECRET-DATA
            }
        }
        interfaces {
            fxp0 {
                unit 0 {
                    family inet {
                        address 192.168.103.253/24;
                    }
                }
            }
        }
    }
    node0 {
        system {
            host-name au.fw_node0;
        }
        interfaces {
            fxp0 {
                unit 0 {
                    family inet {
                        address 192.168.103.252/24;
                    }
                }
            }
        }
    }
}
apply-groups "${node}";
system {
    root-authentication {
        encrypted-password "XXX"; ## SECRET-DATA
    }
    services {
        ssh;
        web-management {
            http {
                interface [ fxp0.0 reth2.0 ];
            }
        }
    }
    syslog {
        user * {
            any emergency;
        }
        file messages {
            any critical;
            authorization info;
        }
        file interactive-commands {
            interactive-commands error;
        }
    }
    max-configurations-on-flash 5;
    max-configuration-rollbacks 5;
    license {
        autoupdate {
            url https://ae1.juniper.net/junos/key_retrieval;
        }
    }
}
chassis {
    cluster {
        reth-count 2;
        heartbeat-interval 1000;
        heartbeat-threshold 3;
        node 0; ## Warning: 'node' is deprecated
        node 1; ## Warning: 'node' is deprecated
        redundancy-group 1 {
            node 0 priority 100;
            node 1 priority 1;
            preempt;
            interface-monitor {
                ge-0/0/5 weight 255;
                ge-5/0/5 weight 255;
            }
        }
        redundancy-group 0 {
            node 0 priority 100;
            node 1 priority 1;
        }
        redundancy-group 2 {
            node 0 priority 100;
            node 1 priority 1;
            preempt;
            interface-monitor {
                ge-0/0/15 weight 255;
                ge-5/0/15 weight 255;
            }
        }
    }
}
interfaces {
    ge-0/0/2 {
        unit 0;
    }
    ge-0/0/5 {
        gigether-options {
            redundant-parent reth1;
        }
    }
    ge-0/0/15 {
        gigether-options {
            redundant-parent reth0;
        }
    }
    ge-5/0/2 {
        unit 0;
    }
    ge-5/0/5 {
        gigether-options {
            redundant-parent reth1;
        }
    }
    ge-5/0/15 {
        gigether-options {
            redundant-parent reth0;
        }
    }
    fab0 {
        fabric-options {
            member-interfaces {
                ge-0/0/3;
            }
        }
    }
    fab1 {
        fabric-options {
            member-interfaces {
                ge-5/0/3;
            }
        }
    }
    reth0 {
        redundant-ether-options {
            redundancy-group 2;
        }
        unit 0 {
            family inet {
                mtu 1500;
                address 192.168.1.240/24;
            }
        }
    }
    reth1 {
        vlan-tagging;
        redundant-ether-options {
            redundancy-group 1;
        }
        unit 0 {
            vlan-id 43;
            family inet {
                mtu 1500;
                address 202.43.4.254/24;
            }
        }
        unit 1 {
            vlan-id 101;
            family inet {
                mtu 1500;
                address 192.168.101.254/24;
            }
        }
        unit 2 {
            vlan-id 102;
            family inet {
                mtu 1500;
                address 192.168.102.254/24;
            }
        }
    }
}
security {
    screen {
        ids-option untrust-screen {
            icmp {
                ping-death;
            }
            ip {
                source-route-option;
                tear-drop;
            }
            tcp {
                syn-flood {
                    alarm-threshold 1024;
                    attack-threshold 200;
                    source-threshold 1024;
                    destination-threshold 2048;
                    queue-size 2000; ## Warning: 'queue-size' is deprecated
                    timeout 20;
                }
                land;
            }
        }
    }
    zones {
        functional-zone management;
        security-zone untrust {
            screen untrust-screen;
            interfaces {
                reth0.0;
            }
        }
        security-zone dmz {
            tcp-rst;
            screen untrust-screen;
            interfaces {
                reth1.0 {
                    host-inbound-traffic {
                        system-services {
                            all;
                        }
                    }
                }
            }
        }
        security-zone private {
            tcp-rst;
            screen untrust-screen;
            interfaces {
                reth1.1 {
                    host-inbound-traffic {
                        system-services {
                            all;
                        }
                    }
                }
            }
        }
        security-zone management {
            tcp-rst;
            screen untrust-screen;
            interfaces {
                reth1.2 {
                    host-inbound-traffic {
                        system-services {
                            all;
                        }
                    }
                }
            }
        }
    }
    policies {
        default-policy {
            permit-all;
        }
    }
}
vlans {
    DMZ {
        vlan-id 43;
    }
    Management {
        vlan-id 102;
    }
    Private {
        vlan-id 101;
    }
}

I've also noticed a few additional moments:

1) both nodes don't see ge-5/0/X interfaces.

root@au.fw_node0> show interfaces terse
Admin Link Proto Local Remote
ge-0/0/0 up down
ge-0/0/1 up up
ge-0/0/2 up down
ge-0/0/2.0 up down
ge-0/0/3 up up
ge-0/0/3.0 up up aenet --> fab0.0
ge-0/0/4 up down
ge-0/0/5 up up
ge-0/0/5.0 up up aenet --> reth1.0
ge-0/0/5.1 up up aenet --> reth1.1
ge-0/0/5.2 up up aenet --> reth1.2
ge-0/0/6 up up
ge-0/0/7 up down
ge-0/0/8 up down
ge-0/0/9 up down
ge-0/0/10 up down
ge-0/0/11 up down
ge-0/0/12 up down
ge-0/0/13 up down
ge-0/0/14 up down
ge-0/0/15 up up
ge-0/0/15.0 up up aenet --> reth0.0
fab0 up up
fab0.0 up up inet 30.17.0.200/24
fab1 up down
fab1.0 up down inet 30.18.0.200/24
fxp0 up up
fxp0.0 up up inet 192.168.103.252/24
fxp1 up up
fxp1.0 up up inet 129.16.0.1/2
tnp 0x1100001
gre up up
ipip up up
lo0 up up
lo0.16384 up up inet 127.0.0.1 --> 0/0
lo0.16385 up up inet 10.0.0.1 --> 0/0
10.0.0.16 --> 0/0
128.0.0.1 --> 0/0
128.0.1.16 --> 0/0
inet6 fe80::226:88ff:fe06:1280
lsi up up
mtun up up
pimd up up
pime up up
pp0 up up
reth0 up up
reth0.0 up up inet 192.168.1.240/24
reth1 up up
reth1.0 up up inet 202.43.4.254/24
reth1.1 up up inet 192.168.101.254/24
reth1.2 up up inet 192.168.102.254/24
reth1.32767 up down
st0 up up
tap up up
vlan up up

2) ge-5/0/5 interface in cluster interface status shown as down, though cable is plugged and connected to switch, LEDs on the port are blinking with green.

root@au.fw_node0> show chassis cluster interfaces
Control link name: fxp1

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 2
reth1 Up 1

Interface Monitoring:
Interface Weight Status Redundancy-group
ge-5/0/5 255 Down 1
ge-0/0/5 255 Up 1
ge-5/0/15 255 Up 2
ge-0/0/15 255 Up 2

3) Probes seems not to go through fabric (not sure whether it is how it should be or not, but looks concerning...):

node0
=====

root@au.fw_node0> show chassis cluster control-plane statistics
Control link statistics:
Heartbeat packets sent: 63898
Heartbeat packets received: 63754
Fabric link statistics:
Probes sent: 63892
Probes received: 0

root@au.fw_node0> show chassis cluster data-plane statistics
Services Synchronized:
Service name RTOs sent RTOs received
Translation context 0 0
Incoming NAT 0 0
Resource manager 0 0
Session create 8296 0
Session close 5572 0
Session change 0 0
Gate create 0 0
Session ageout refresh requests 0 0
Session ageout refresh replies 0 0
IPSec VPN 0 0
Firewall user authentication 0 0
MGCP ALG 0 0
H323 ALG 0 0
SIP ALG 0 0
SCCP ALG 0 0
PPTP ALG 0 0
RPC ALG 0 0
RTSP ALG 0 0
RAS ALG 0 0
MAC address learning 0 0

{primary:node0}
root@au.fw_node0> show interfaces ge-0/0/3
Physical interface: ge-0/0/3, Enabled, Physical link is Up
Interface index: 134, SNMP ifIndex: 123
Link-level type: 64, MTU: 9014, Link-mode: Full-duplex, Speed: 1000mbps,
BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled,
Remote fault: Online
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:83
Last flapped : 2009-09-10 06:51:07 UTC (17:52:06 ago)
Input rate : 0 bps (0 pps)
Output rate : 0 bps (3 pps)
Active alarms : None
Active defects : None

Logical interface ge-0/0/3.0 (Index 75) (SNMP ifIndex 151)
Flags: SNMP-Traps Encapsulation: ENET2
Input packets : 0
Output packets: 206253
Security: Zone: Null
Protocol aenet, AE bundle: fab0.0 Link Index: 0

{primary:node0}
root@au.fw_node0> show interfaces fab0
Physical interface: fab0, Enabled, Physical link is Up
Interface index: 130, SNMP ifIndex: 117
Link-level type: Ethernet, MTU: 9014, Speed: 1000mbps, BPDU Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:ff
Last flapped : 2009-09-10 06:51:07 UTC (17:52:29 ago)
Input rate : 0 bps (0 pps)
Output rate : 0 bps (2 pps)

Logical interface fab0.0 (Index 72) (SNMP ifIndex 120)
Flags: SNMP-Traps 0x0 Encapsulation: ENET2
Statistics Packets pps Bytes bps
Bundle:
Input : 0 0 0 0
Output: 206328 2 0 0
Security: Zone: Null
Protocol inet, MTU: 9000
Flags: None
Addresses, Flags: Is-Preferred Is-Primary
Destination: 30.17.0/24, Local: 30.17.0.200, Broadcast: 30.17.0.255

{primary:node0}
root@au.fw_node0> show interfaces fab1
Physical interface: fab1, Enabled, Physical link is Down
Interface index: 150, SNMP ifIndex: 142
Link-level type: Ethernet, MTU: 9014, Speed: Unspecified, BPDU Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
Device flags : Present Running
Interface flags: Hardware-Down SNMP-Traps Internal: 0x0
Current address: 00:26:88:06:0f:7f, Hardware address: 00:26:88:06:0f:7f
Last flapped : 2009-09-10 06:52:09 UTC (17:51:35 ago)
Input rate : 0 bps (0 pps)
Output rate : 0 bps (0 pps)

Logical interface fab1.0 (Index 80) (SNMP ifIndex 143)
Flags: Hardware-Down Device-Down SNMP-Traps 0x0 Encapsulation: ENET2
Statistics Packets pps Bytes bps
Bundle:
Input : 0 0 0 0
Output: 0 0 0 0
Security: Zone: Null
Protocol inet, MTU: 9000
Flags: None
Addresses, Flags: Is-Preferred Is-Primary
Destination: 30.18.0/24, Local: 30.18.0.200, Broadcast: 30.18.0.255

node1
=====

{secondary:node1}
root@au.fw_node1> show chassis cluster data-plane statistics
Services Synchronized:
Service name RTOs sent RTOs received
Translation context 0 0
Incoming NAT 0 0
Resource manager 0 0
Session create 0 0
Session close 0 0
Session change 0 0
Gate create 0 0
Session ageout refresh requests 0 0
Session ageout refresh replies 0 0
IPSec VPN 0 0
Firewall user authentication 0 0
MGCP ALG 0 0
H323 ALG 0 0
SIP ALG 0 0
SCCP ALG 0 0
PPTP ALG 0 0
RPC ALG 0 0
RTSP ALG 0 0
RAS ALG 0 0
MAC address learning 0 0

{secondary:node1}
root@au.fw_node1> show chassis cluster control-plane statistics
Control link statistics:
Heartbeat packets sent: 64147
Heartbeat packets received: 64114
Fabric link statistics:
Probes sent: 64144
Probes received: 0

{secondary:node1}
root@au.fw_node1> show interfaces ge-0/0/3
Physical interface: ge-0/0/3, Enabled, Physical link is Up
Interface index: 134, SNMP ifIndex: 123
Link-level type: 64, MTU: 9014, Link-mode: Half-duplex, Speed: Unspecified,
BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled,
Remote fault: Online
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:83
Last flapped : 2009-09-10 06:51:07 UTC (17:55:39 ago)
Input rate : 0 bps (0 pps)
Output rate : 0 bps (0 pps)
Active alarms : None
Active defects : None

Logical interface ge-0/0/3.0 (Index 75) (SNMP ifIndex 151)
Flags: SNMP-Traps Encapsulation: ENET2
Input packets : 0
Output packets: 0
Security: Zone: Null
Protocol aenet, AE bundle: fab0.0 Link Index: 0

{secondary:node1}
root@au.fw_node1> show interfaces fab0
Physical interface: fab0, Enabled, Physical link is Up
Interface index: 130, SNMP ifIndex: 117
Link-level type: Ethernet, MTU: 9014, Speed: 1000mbps, BPDU Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:ff
Last flapped : 2009-09-10 06:51:07 UTC (17:55:48 ago)
Input rate : 0 bps (0 pps)
Output rate : 0 bps (0 pps)

Logical interface fab0.0 (Index 72) (SNMP ifIndex 120)
Flags: SNMP-Traps 0x0 Encapsulation: ENET2
Statistics Packets pps Bytes bps
Bundle:
Input : 0 0 0 0
Output: 0 0 0 0
Security: Zone: Null
Protocol inet, MTU: 9000
Flags: None
Addresses, Flags: Is-Preferred Is-Primary
Destination: 30.17.0/24, Local: 30.17.0.200, Broadcast: 30.17.0.255

{secondary:node1}
root@au.fw_node1> show interfaces fab1
Physical interface: fab1, Enabled, Physical link is Down
Interface index: 150, SNMP ifIndex: 142
Link-level type: Ethernet, MTU: 9014, Speed: Unspecified, BPDU Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
Device flags : Present Running
Interface flags: Hardware-Down SNMP-Traps Internal: 0x0
Current address: 00:26:88:06:0f:7f, Hardware address: 00:26:88:06:0f:7f
Last flapped : 2009-09-10 06:52:09 UTC (17:54:50 ago)
Input rate : 0 bps (0 pps)
Output rate : 0 bps (0 pps)

Logical interface fab1.0 (Index 80) (SNMP ifIndex 143)
Flags: Hardware-Down Device-Down SNMP-Traps Encapsulation: ENET2
Statistics Packets pps Bytes bps
Bundle:
Input : 0 0 0 0
Output: 0 0 0 0
Security: Zone: Null
Protocol inet, MTU: 9000
Flags: None
Addresses, Flags: Is-Preferred Is-Primary
Destination: 30.18.0/24, Local: 30.18.0.200, Broadcast: 30.18.0.255

4) HA LEDs on both nodes are amber

Any ideas would be highly appreciated.

Kind regards,
Vladimir

#failover
#HA
#SRX240
#fab

2. RE: Failover doesn't work between HA SRX240
Best Answer

0 Recommend
Erdem
Posted 09-10-2009 23:32

Reply Reply Privately
I am guessing that node1 could be in disabled state. You don't show the cluster state from the CLI prompt, but based on the fact that you don't see any ge-5/0/x interfaces this seems likely. Only way out of disabled state is to reboot the node. Also are you connecting fab link through a switch or are you directly connecting the ge-0/0/3 on both nodes? At present time only directly connected control and fabric links are supported on SRX branch series. Finally just double-checking that you have JUNOS 9.6R1 as JSRP on SRX240 isn't supported on 9.5 release.

If you confirm JUNOS version is 9.6 and control and fabric links are directly connected (not going through a switch) and node is rebooted (to clear disabled state), then check to ensure that you see heartbeats and probes sent and received on both nodes. If not then this will again cause disabled state. Finally check jsrpd log to see if there is any issues with perhaps cold-sync monitoring, etc.

-Richard
3. RE: Failover doesn't work between HA SRX240

0 Recommend
Erdem
Posted 09-11-2009 00:06

Reply Reply Privately
Hi Richard,

Thanks for your reply.

You have solved my problem and take my greatest appreciation for it!

Just to clarify my situation:

I have JunOS9.6R1.13 (latest available) installed on both nodes.
Fab link organized by DIRECT connection with straight-thru cat5e cable between ge-0/0/3 and ge-5/0/3 (though I tried to connect with cross-over and through switch during troubleshooting, but it haven't helped).
Control link - by DIRECTLY connected with straight-thru cat5e cable between ge-0/0/1 and ge-5/0/1.

I've rebooted node 1 by executing on node1:

set chassis cluster cluster-id 1 node 1 reboot

tnpdump utility output on node1:

root@au% tnpdump
Name TNPaddr MAC address IF MTU E H R
cluster1.node0 0x1100001 00:26:88:06:12:81 fxp1 1500 2 1 3
cluster1.node1 0x2100001 00:26:88:06:0f:01 fxp1 1500 0 1 3
cluster1.master 0xf100001 00:26:88:06:12:81 fxp1 1500 2 1 3
bcast 0xffffffff ff:ff:ff:ff:ff:ff fxp1 1500 0 1 3

Fabric link seems to start pass through:

root@au.fw_node1> show chassis cluster control-plane statistics
Control link statistics:
Heartbeat packets sent: 287
Heartbeat packets received: 285
Fabric link statistics:
Probes sent: 284
Probes received: 180

Cluster status started to show correct Proirities for RGs:

root@au.fw_node1> show chassis cluster status Cluster ID: 1 Node name Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 0 node0 100 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 8 node0 100 primary yes no node1 1 secondary yes no Redundancy group: 2 , Failover count: 2 node0 100 primary yes no node1 1 secondary yes no

ge-5/0/X interfaces are visible now:

root@au.fw_node1> show interfaces terse | match aenet ge-0/0/3.0 up up aenet --> fab0.0 ge-0/0/5.0 up up aenet --> reth1.0 ge-0/0/5.1 up up aenet --> reth1.1 ge-0/0/5.2 up up aenet --> reth1.2 ge-0/0/15.0 up up aenet --> reth0.0 ge-5/0/3.0 up up aenet --> fab1.0 ge-5/0/5.0 up up aenet --> reth1.0 ge-5/0/5.1 up up aenet --> reth1.1 ge-5/0/5.2 up up aenet --> reth1.2 ge-5/0/15.0 up up aenet --> reth0.0

HA LEDs turned green.

Actually, that's very very strange, as I had rebooted both nodes earlier at least 2 times with both this command (chassis cluster reboot) and just 'request system reboot'.

Clustering works fine now. I've done a few simple tests and have confirmed that failover works within 5 seconds (i guess it is the delay cased by enabled RSTP on switches).

Thanks again for your help!

Kind regards,
Vladimir
Message Edited by Vladimir on 09-11-2009 12:09 AM
4. RE: Failover doesn't work between HA SRX240

0 Recommend
Erdem
Posted 11-18-2009 15:02

Reply Reply Privately
Hi Richard,

You wrote that today, this functionality works only through the direct connection. Do you have confirmed information that will be implemented in new versions of software? On the other hand, have a look at this document ...
http://junos.juniper.net/content/Resources/!Rebranded_Resources/3500165-EN.pdf

Personally, I have a problem with the combination of two SRX650, maybe there is some possibility of configuring the switch to HA to work.

Marcin
5. RE: Failover doesn't work between HA SRX240

0 Recommend
Pato
Posted 12-21-2009 20:07

Reply Reply Privately
Vladimir

The problem you are seeing might be related to the fact that, prior to some 10.0 versions, the fab link probes and traffic were being sent skipping some of the checks done to IP traffic (like checksums and length). The reason, was to avoid some expensive checks (in terms of CPU utilization, of course) that are not required if you assume the devices are connected back-to-back.

Because of the way the chassis cluster is being deployed, latest Junos releases have added those checks in order to better interoperate with switches that do inspect the IP layer.

When trying to deploy a cluster using an L2 transport network you should either:

Disable all IP checks in the switching layer (i.e. no checksum, same address and length checks)
Try using Junos 10.0R3, 10.1R1 or later builds

SRX

Failover doesn't work between HA SRX240

Erdem09-10-2009 21:11

Erdem09-10-2009 23:32Best Answer

Erdem09-11-2009 00:06

Erdem11-18-2009 15:02

Pato12-21-2009 20:07

1. Failover doesn't work between HA SRX240

2. RE: Failover doesn't work between HA SRX240 Best Answer

3. RE: Failover doesn't work between HA SRX240

4. RE: Failover doesn't work between HA SRX240

5. RE: Failover doesn't work between HA SRX240

2. RE: Failover doesn't work between HA SRX240
Best Answer