SRX

 View Only
last person joined: yesterday 

Ask questions and share experiences about the SRX Series.
  • 1.  Failover doesn't work between HA SRX240

    Posted 09-10-2009 21:11

    Hi all,

    I have configured a cluster between 2 SRX240 and routing seems to work fine.

    Failover doesn't work.

    HA config is as follows:

     

     

    ## Last commit: 2009-09-10 06:37:49 UTC by root
    version 9.6R1.13;
    groups {
    node1 {
    system {
    host-name au.fw_node1;
    root-authentication {
    encrypted-password "XXX"; ## SECRET-DATA
    }
    }
    interfaces {
    fxp0 {
    unit 0 {
    family inet {
    address 192.168.103.253/24;
    }
    }
    }
    }
    }
    node0 {
    system {
    host-name au.fw_node0;
    }
    interfaces {
    fxp0 {
    unit 0 {
    family inet {
    address 192.168.103.252/24;
    }
    }
    }
    }
    }
    }
    apply-groups "${node}";
    system {
    root-authentication {
    encrypted-password "XXX"; ## SECRET-DATA
    }
    services {
    ssh;
    web-management {
    http {
    interface [ fxp0.0 reth2.0 ];
    }
    }
    }
    syslog {
    user * {
    any emergency;
    }
    file messages {
    any critical;
    authorization info;
    }
    file interactive-commands {
    interactive-commands error;
    }
    }
    max-configurations-on-flash 5;
    max-configuration-rollbacks 5;
    license {
    autoupdate {
    url https://ae1.juniper.net/junos/key_retrieval;
    }
    }
    }
    chassis {
    cluster {
    reth-count 2;
    heartbeat-interval 1000;
    heartbeat-threshold 3;
    node 0; ## Warning: 'node' is deprecated
    node 1; ## Warning: 'node' is deprecated
    redundancy-group 1 {
    node 0 priority 100;
    node 1 priority 1;
    preempt;
    interface-monitor {
    ge-0/0/5 weight 255;
    ge-5/0/5 weight 255;
    }
    }
    redundancy-group 0 {
    node 0 priority 100;
    node 1 priority 1;
    }
    redundancy-group 2 {
    node 0 priority 100;
    node 1 priority 1;
    preempt;
    interface-monitor {
    ge-0/0/15 weight 255;
    ge-5/0/15 weight 255;
    }
    }
    }
    }
    interfaces {
    ge-0/0/2 {
    unit 0;
    }
    ge-0/0/5 {
    gigether-options {
    redundant-parent reth1;
    }
    }
    ge-0/0/15 {
    gigether-options {
    redundant-parent reth0;
    }
    }
    ge-5/0/2 {
    unit 0;
    }
    ge-5/0/5 {
    gigether-options {
    redundant-parent reth1;
    }
    }
    ge-5/0/15 {
    gigether-options {
    redundant-parent reth0;
    }
    }
    fab0 {
    fabric-options {
    member-interfaces {
    ge-0/0/3;
    }
    }
    }
    fab1 {
    fabric-options {
    member-interfaces {
    ge-5/0/3;
    }
    }
    }
    reth0 {
    redundant-ether-options {
    redundancy-group 2;
    }
    unit 0 {
    family inet {
    mtu 1500;
    address 192.168.1.240/24;
    }
    }
    }
    reth1 {
    vlan-tagging;
    redundant-ether-options {
    redundancy-group 1;
    }
    unit 0 {
    vlan-id 43;
    family inet {
    mtu 1500;
    address 202.43.4.254/24;
    }
    }
    unit 1 {
    vlan-id 101;
    family inet {
    mtu 1500;
    address 192.168.101.254/24;
    }
    }
    unit 2 {
    vlan-id 102;
    family inet {
    mtu 1500;
    address 192.168.102.254/24;
    }
    }
    }
    }
    security {
    screen {
    ids-option untrust-screen {
    icmp {
    ping-death;
    }
    ip {
    source-route-option;
    tear-drop;
    }
    tcp {
    syn-flood {
    alarm-threshold 1024;
    attack-threshold 200;
    source-threshold 1024;
    destination-threshold 2048;
    queue-size 2000; ## Warning: 'queue-size' is deprecated
    timeout 20;
    }
    land;
    }
    }
    }
    zones {
    functional-zone management;
    security-zone untrust {
    screen untrust-screen;
    interfaces {
    reth0.0;
    }
    }
    security-zone dmz {
    tcp-rst;
    screen untrust-screen;
    interfaces {
    reth1.0 {
    host-inbound-traffic {
    system-services {
    all;
    }
    }
    }
    }
    }
    security-zone private {
    tcp-rst;
    screen untrust-screen;
    interfaces {
    reth1.1 {
    host-inbound-traffic {
    system-services {
    all;
    }
    }
    }
    }
    }
    security-zone management {
    tcp-rst;
    screen untrust-screen;
    interfaces {
    reth1.2 {
    host-inbound-traffic {
    system-services {
    all;
    }
    }
    }
    }
    }
    }
    policies {
    default-policy {
    permit-all;
    }
    }
    }
    vlans {
    DMZ {
    vlan-id 43;
    }
    Management {
    vlan-id 102;
    }
    Private {
    vlan-id 101;
    }
    }

     

     

     I've also noticed a few additional moments:

    1)       both nodes don't see ge-5/0/X interfaces.

    root@au.fw_node0> show interfaces terse
    Admin Link Proto Local Remote
    ge-0/0/0 up down
    ge-0/0/1 up up
    ge-0/0/2 up down
    ge-0/0/2.0 up down
    ge-0/0/3 up up
    ge-0/0/3.0 up up aenet --> fab0.0
    ge-0/0/4 up down
    ge-0/0/5 up up
    ge-0/0/5.0 up up aenet --> reth1.0
    ge-0/0/5.1 up up aenet --> reth1.1
    ge-0/0/5.2 up up aenet --> reth1.2
    ge-0/0/6 up up
    ge-0/0/7 up down
    ge-0/0/8 up down
    ge-0/0/9 up down
    ge-0/0/10 up down
    ge-0/0/11 up down
    ge-0/0/12 up down
    ge-0/0/13 up down
    ge-0/0/14 up down
    ge-0/0/15 up up
    ge-0/0/15.0 up up aenet --> reth0.0
    fab0 up up
    fab0.0 up up inet 30.17.0.200/24
    fab1 up down
    fab1.0 up down inet 30.18.0.200/24
    fxp0 up up
    fxp0.0 up up inet 192.168.103.252/24
    fxp1 up up
    fxp1.0 up up inet 129.16.0.1/2
    tnp 0x1100001
    gre up up
    ipip up up
    lo0 up up
    lo0.16384 up up inet 127.0.0.1 --> 0/0
    lo0.16385 up up inet 10.0.0.1 --> 0/0
    10.0.0.16 --> 0/0
    128.0.0.1 --> 0/0
    128.0.1.16 --> 0/0
    inet6 fe80::226:88ff:fe06:1280
    lsi up up
    mtun up up
    pimd up up
    pime up up
    pp0 up up
    reth0 up up
    reth0.0 up up inet 192.168.1.240/24
    reth1 up up
    reth1.0 up up inet 202.43.4.254/24
    reth1.1 up up inet 192.168.101.254/24
    reth1.2 up up inet 192.168.102.254/24
    reth1.32767 up down
    st0 up up
    tap up up
    vlan up up

     


    2)  ge-5/0/5 interface in cluster interface status shown as down, though cable is plugged and connected to switch, LEDs on the port are blinking with green.

     


    root@au.fw_node0> show chassis cluster interfaces
    Control link name: fxp1

    Redundant-ethernet Information:
    Name Status Redundancy-group
    reth0 Up 2
    reth1 Up 1

    Interface Monitoring:
    Interface Weight Status Redundancy-group
    ge-5/0/5 255 Down 1
    ge-0/0/5 255 Up 1
    ge-5/0/15 255 Up 2
    ge-0/0/15 255 Up 2

     

    3) Probes seems not to go through fabric (not sure whether it is how it should be or not, but looks concerning...):

    node0
    =====

     

    root@au.fw_node0> show chassis cluster control-plane statistics
    Control link statistics:
    Heartbeat packets sent: 63898
    Heartbeat packets received: 63754
    Fabric link statistics:
    Probes sent: 63892
    Probes received: 0

    root@au.fw_node0> show chassis cluster data-plane statistics
    Services Synchronized:
    Service name RTOs sent RTOs received
    Translation context 0 0
    Incoming NAT 0 0
    Resource manager 0 0
    Session create 8296 0
    Session close 5572 0
    Session change 0 0
    Gate create 0 0
    Session ageout refresh requests 0 0
    Session ageout refresh replies 0 0
    IPSec VPN 0 0
    Firewall user authentication 0 0
    MGCP ALG 0 0
    H323 ALG 0 0
    SIP ALG 0 0
    SCCP ALG 0 0
    PPTP ALG 0 0
    RPC ALG 0 0
    RTSP ALG 0 0
    RAS ALG 0 0
    MAC address learning 0 0

    {primary:node0}
    root@au.fw_node0> show interfaces ge-0/0/3
    Physical interface: ge-0/0/3, Enabled, Physical link is Up
    Interface index: 134, SNMP ifIndex: 123
    Link-level type: 64, MTU: 9014, Link-mode: Full-duplex, Speed: 1000mbps,
    BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
    Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled,
    Remote fault: Online
    Device flags : Present Running
    Interface flags: SNMP-Traps Internal: 0x0
    Link flags : None
    CoS queues : 8 supported, 8 maximum usable queues
    Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:83
    Last flapped : 2009-09-10 06:51:07 UTC (17:52:06 ago)
    Input rate : 0 bps (0 pps)
    Output rate : 0 bps (3 pps)
    Active alarms : None
    Active defects : None

    Logical interface ge-0/0/3.0 (Index 75) (SNMP ifIndex 151)
    Flags: SNMP-Traps Encapsulation: ENET2
    Input packets : 0
    Output packets: 206253
    Security: Zone: Null
    Protocol aenet, AE bundle: fab0.0 Link Index: 0

    {primary:node0}
    root@au.fw_node0> show interfaces fab0
    Physical interface: fab0, Enabled, Physical link is Up
    Interface index: 130, SNMP ifIndex: 117
    Link-level type: Ethernet, MTU: 9014, Speed: 1000mbps, BPDU Error: None,
    MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
    Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
    Device flags : Present Running
    Interface flags: SNMP-Traps Internal: 0x0
    Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:ff
    Last flapped : 2009-09-10 06:51:07 UTC (17:52:29 ago)
    Input rate : 0 bps (0 pps)
    Output rate : 0 bps (2 pps)

    Logical interface fab0.0 (Index 72) (SNMP ifIndex 120)
    Flags: SNMP-Traps 0x0 Encapsulation: ENET2
    Statistics Packets pps Bytes bps
    Bundle:
    Input : 0 0 0 0
    Output: 206328 2 0 0
    Security: Zone: Null
    Protocol inet, MTU: 9000
    Flags: None
    Addresses, Flags: Is-Preferred Is-Primary
    Destination: 30.17.0/24, Local: 30.17.0.200, Broadcast: 30.17.0.255

    {primary:node0}
    root@au.fw_node0> show interfaces fab1
    Physical interface: fab1, Enabled, Physical link is Down
    Interface index: 150, SNMP ifIndex: 142
    Link-level type: Ethernet, MTU: 9014, Speed: Unspecified, BPDU Error: None,
    MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
    Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
    Device flags : Present Running
    Interface flags: Hardware-Down SNMP-Traps Internal: 0x0
    Current address: 00:26:88:06:0f:7f, Hardware address: 00:26:88:06:0f:7f
    Last flapped : 2009-09-10 06:52:09 UTC (17:51:35 ago)
    Input rate : 0 bps (0 pps)
    Output rate : 0 bps (0 pps)

    Logical interface fab1.0 (Index 80) (SNMP ifIndex 143)
    Flags: Hardware-Down Device-Down SNMP-Traps 0x0 Encapsulation: ENET2
    Statistics Packets pps Bytes bps
    Bundle:
    Input : 0 0 0 0
    Output: 0 0 0 0
    Security: Zone: Null
    Protocol inet, MTU: 9000
    Flags: None
    Addresses, Flags: Is-Preferred Is-Primary
    Destination: 30.18.0/24, Local: 30.18.0.200, Broadcast: 30.18.0.255

     

     
    node1
    =====

     

    {secondary:node1}
    root@au.fw_node1> show chassis cluster data-plane statistics
    Services Synchronized:
    Service name RTOs sent RTOs received
    Translation context 0 0
    Incoming NAT 0 0
    Resource manager 0 0
    Session create 0 0
    Session close 0 0
    Session change 0 0
    Gate create 0 0
    Session ageout refresh requests 0 0
    Session ageout refresh replies 0 0
    IPSec VPN 0 0
    Firewall user authentication 0 0
    MGCP ALG 0 0
    H323 ALG 0 0
    SIP ALG 0 0
    SCCP ALG 0 0
    PPTP ALG 0 0
    RPC ALG 0 0
    RTSP ALG 0 0
    RAS ALG 0 0
    MAC address learning 0 0

    {secondary:node1}
    root@au.fw_node1> show chassis cluster control-plane statistics
    Control link statistics:
    Heartbeat packets sent: 64147
    Heartbeat packets received: 64114
    Fabric link statistics:
    Probes sent: 64144
    Probes received: 0

    {secondary:node1}
    root@au.fw_node1> show interfaces ge-0/0/3
    Physical interface: ge-0/0/3, Enabled, Physical link is Up
    Interface index: 134, SNMP ifIndex: 123
    Link-level type: 64, MTU: 9014, Link-mode: Half-duplex, Speed: Unspecified,
    BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
    Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled,
    Remote fault: Online
    Device flags : Present Running
    Interface flags: SNMP-Traps Internal: 0x0
    Link flags : None
    CoS queues : 8 supported, 8 maximum usable queues
    Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:83
    Last flapped : 2009-09-10 06:51:07 UTC (17:55:39 ago)
    Input rate : 0 bps (0 pps)
    Output rate : 0 bps (0 pps)
    Active alarms : None
    Active defects : None

    Logical interface ge-0/0/3.0 (Index 75) (SNMP ifIndex 151)
    Flags: SNMP-Traps Encapsulation: ENET2
    Input packets : 0
    Output packets: 0
    Security: Zone: Null
    Protocol aenet, AE bundle: fab0.0 Link Index: 0

    {secondary:node1}
    root@au.fw_node1> show interfaces fab0
    Physical interface: fab0, Enabled, Physical link is Up
    Interface index: 130, SNMP ifIndex: 117
    Link-level type: Ethernet, MTU: 9014, Speed: 1000mbps, BPDU Error: None,
    MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
    Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
    Device flags : Present Running
    Interface flags: SNMP-Traps Internal: 0x0
    Current address: 00:26:88:06:12:ff, Hardware address: 00:26:88:06:12:ff
    Last flapped : 2009-09-10 06:51:07 UTC (17:55:48 ago)
    Input rate : 0 bps (0 pps)
    Output rate : 0 bps (0 pps)

    Logical interface fab0.0 (Index 72) (SNMP ifIndex 120)
    Flags: SNMP-Traps 0x0 Encapsulation: ENET2
    Statistics Packets pps Bytes bps
    Bundle:
    Input : 0 0 0 0
    Output: 0 0 0 0
    Security: Zone: Null
    Protocol inet, MTU: 9000
    Flags: None
    Addresses, Flags: Is-Preferred Is-Primary
    Destination: 30.17.0/24, Local: 30.17.0.200, Broadcast: 30.17.0.255

    {secondary:node1}
    root@au.fw_node1> show interfaces fab1
    Physical interface: fab1, Enabled, Physical link is Down
    Interface index: 150, SNMP ifIndex: 142
    Link-level type: Ethernet, MTU: 9014, Speed: Unspecified, BPDU Error: None,
    MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
    Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
    Device flags : Present Running
    Interface flags: Hardware-Down SNMP-Traps Internal: 0x0
    Current address: 00:26:88:06:0f:7f, Hardware address: 00:26:88:06:0f:7f
    Last flapped : 2009-09-10 06:52:09 UTC (17:54:50 ago)
    Input rate : 0 bps (0 pps)
    Output rate : 0 bps (0 pps)

    Logical interface fab1.0 (Index 80) (SNMP ifIndex 143)
    Flags: Hardware-Down Device-Down SNMP-Traps Encapsulation: ENET2
    Statistics Packets pps Bytes bps
    Bundle:
    Input : 0 0 0 0
    Output: 0 0 0 0
    Security: Zone: Null
    Protocol inet, MTU: 9000
    Flags: None
    Addresses, Flags: Is-Preferred Is-Primary
    Destination: 30.18.0/24, Local: 30.18.0.200, Broadcast: 30.18.0.255

     

       4) HA LEDs on both nodes are amber

    Any ideas would be highly appreciated.

     
    Kind regards,
    Vladimir

     

     


    #failover
    #HA
    #SRX240
    #fab


  • 2.  RE: Failover doesn't work between HA SRX240
    Best Answer

    Posted 09-10-2009 23:32

    I am guessing that node1 could be in disabled state. You don't show the cluster state from the CLI prompt, but based on the fact that you don't see any ge-5/0/x interfaces this seems likely. Only way out of disabled state is to reboot the node. Also are you connecting fab link through a switch or are you directly connecting the ge-0/0/3 on both nodes? At present time only directly connected control and fabric links are supported on SRX branch series. Finally just double-checking that you have JUNOS 9.6R1 as JSRP on SRX240 isn't supported on 9.5 release.

     

    If you confirm JUNOS version is 9.6 and control and fabric links are directly connected (not going through a switch) and node is rebooted (to clear disabled state), then check to ensure that you see heartbeats and probes sent and received on both nodes. If not then this will again cause disabled state. Finally check jsrpd log to see if there is any issues with perhaps cold-sync monitoring, etc.

     

    -Richard



  • 3.  RE: Failover doesn't work between HA SRX240

    Posted 09-11-2009 00:06

     Hi Richard,

     

    Thanks for your reply.

     

    You have solved my problem and take my greatest appreciation for it!

     

    Just to clarify my situation:

     

    I have JunOS9.6R1.13 (latest available) installed on both nodes.

    Fab link organized by DIRECT connection with straight-thru cat5e cable between ge-0/0/3 and ge-5/0/3 (though I tried to connect with cross-over and through switch during troubleshooting, but it haven't helped).

    Control link - by DIRECTLY connected with straight-thru cat5e cable between ge-0/0/1 and ge-5/0/1.

     

    I've rebooted node 1 by executing on node1:

     

    set chassis cluster cluster-id 1 node 1 reboot

     

     

     

    tnpdump utility output on node1:

     

    root@au% tnpdump
    Name TNPaddr MAC address IF MTU E H R
    cluster1.node0 0x1100001 00:26:88:06:12:81 fxp1 1500 2 1 3
    cluster1.node1 0x2100001 00:26:88:06:0f:01 fxp1 1500 0 1 3
    cluster1.master 0xf100001 00:26:88:06:12:81 fxp1 1500 2 1 3
    bcast 0xffffffff ff:ff:ff:ff:ff:ff fxp1 1500 0 1 3

     

     Fabric link seems to start pass through:

     

    root@au.fw_node1> show chassis cluster control-plane statistics
    Control link statistics:
    Heartbeat packets sent: 287
    Heartbeat packets received: 285
    Fabric link statistics:
    Probes sent: 284
    Probes received: 180

     

    Cluster status started to show correct Proirities for RGs:

     

    root@au.fw_node1> show chassis cluster status Cluster ID: 1 Node name Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 0 node0 100 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 8 node0 100 primary yes no node1 1 secondary yes no Redundancy group: 2 , Failover count: 2 node0 100 primary yes no node1 1 secondary yes no

     

    ge-5/0/X interfaces are visible now:

     

    root@au.fw_node1> show interfaces terse | match aenet ge-0/0/3.0 up up aenet --> fab0.0 ge-0/0/5.0 up up aenet --> reth1.0 ge-0/0/5.1 up up aenet --> reth1.1 ge-0/0/5.2 up up aenet --> reth1.2 ge-0/0/15.0 up up aenet --> reth0.0 ge-5/0/3.0 up up aenet --> fab1.0 ge-5/0/5.0 up up aenet --> reth1.0 ge-5/0/5.1 up up aenet --> reth1.1 ge-5/0/5.2 up up aenet --> reth1.2 ge-5/0/15.0 up up aenet --> reth0.0

     

     

     

    HA LEDs turned green.

     

     

    Actually, that's very very strange, as I had rebooted both nodes earlier at least 2 times with both this command (chassis cluster reboot) and just 'request system reboot'.

     

     

    Clustering works fine now. I've done a few simple tests and have confirmed that failover works within 5 seconds (i guess it is the delay cased by enabled RSTP on switches).

     

     

     

    Thanks again for your help!

     

     

    Kind regards,

    Vladimir

    Message Edited by Vladimir on 09-11-2009 12:09 AM


  • 4.  RE: Failover doesn't work between HA SRX240

    Posted 11-18-2009 15:02

    Hi Richard,

     

    You wrote that today, this functionality works only through the direct connection. Do you have confirmed information that will be implemented in new versions of software? On the other hand, have a look at this document ...
    http://junos.juniper.net/content/Resources/!Rebranded_Resources/3500165-EN.pdf

    Personally, I have a problem with the combination of two SRX650, maybe there is some possibility of configuring the switch to HA to work.

    Marcin



  • 5.  RE: Failover doesn't work between HA SRX240

     
    Posted 12-21-2009 20:07

    Vladimir

     

    The problem you are seeing might be related to the fact that, prior to some 10.0 versions, the fab link probes and traffic were being sent skipping some of the checks done to IP traffic (like checksums and length). The reason, was to avoid some expensive checks (in terms of CPU utilization, of course) that are not required if you assume the devices are connected back-to-back.

     

    Because of the way the chassis cluster is being deployed, latest Junos releases have added those checks in order to better interoperate with switches that do inspect the IP layer.

     

    When trying to deploy a cluster using an L2 transport network you should either:

     

     

    • Disable all IP checks in the switching layer (i.e. no checksum, same address and length checks)
    • Try using Junos 10.0R3, 10.1R1 or later builds