SRX

last person joined: 15 hours ago 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
  • 1.  Problems and more problems in a SRX340 cluster.... the neverending story

    Posted 09-11-2019 14:08

    Hi guys, 

    This story is coming from here https://forums.juniper.net/t5/SRX-Services-Gateway/Junos-upgrade-fails-on-SRX340-cluster-from-15-1X49-D170-4-to-17/td-p/467752

     

    I was strugling to upgrade a SX340 cluster to a newer Junos version, and finally with the help of some gurus, I made it upgrade to version 18.3R2.7 on both nodes.

     

    Now after the upgrade however i'm facing new issues... I can't SSH the device anymore, on its single reth interface configured while i can on the console port with same root password... Also sometimes the HA shows fine, but some times it shows amber HA led, and the output of the regular commands shows as below:

     

    root@SPCFW-BRAVO> show chassis firmware  
    node0:
    --------------------------------------------------------------------------
    Part                     Type       Version
    FPC 0                    O/S        Version 18.3R2.7 by builder on 2019-05-03 09:17:52 UTC
    FWDD                     O/S        Version 18.3R2.7 by builder on 2019-05-03 09:17:52 UTC
    
    node1:
    --------------------------------------------------------------------------
    Part                     Type       Version
    FPC 0                    O/S        Version 18.3R2.7 by builder on 2019-05-03 09:17:52 UTC
    FWDD                     O/S        Version 18.3R2.7 by builder on 2019-05-03 09:17:52 UTC
    
    root@SPCFW-BRAVO> show chassis cluster information 
    node0:
    --------------------------------------------------------------------------
    Redundancy Group Information:
    
        Redundancy Group 0 , Current State: primary, Weight: 255
    
            Time            From                 To                   Reason
            Sep 11 20:57:13 hold                 secondary            Hold timer expired
            Sep 11 20:57:22 secondary            primary              Better priority (200/100)
    
        Redundancy Group 1 , Current State: primary, Weight: 0
    
            Time            From                 To                   Reason
            Sep 11 20:57:13 hold                 secondary            Hold timer expired
            Sep 11 20:57:24 secondary            primary              Remote yield (0/0)
    
    Chassis cluster LED information:
        Current LED color: Amber
        Last LED change reason: Monitored objects are down
    Control port tagging:                   
        Disabled
    
    Failure Information:
    
        Coldsync Monitoring Failure Information:
            Statistics:
                Coldsync Total SPUs: 1
                Coldsync completed SPUs: 0
                Coldsync not complete SPUs: 1
    
        Fabric-link Failure Information:
            Fabric Interface: fab0
              Child interface   Physical / Monitored Status     
              ge-0/0/2              Up   / Down 
    
    node1:
    --------------------------------------------------------------------------
    Redundancy Group Information:
    
        Redundancy Group 0 , Current State: secondary, Weight: 0
    
            Time            From                 To                   Reason
            Sep 11 20:57:21 hold                 secondary            Hold timer expired
    
        Redundancy Group 1 , Current State: secondary, Weight: -255
    
            Time            From                 To                   Reason
            Sep 11 20:57:22 hold                 secondary            Hold timer expired
    
    Chassis cluster LED information:
        Current LED color: Amber
        Last LED change reason: Monitored objects are down
    Control port tagging:
        Disabled
    
    Failure Information:
    
        Coldsync Monitoring Failure Information:
            Statistics:
                Coldsync Total SPUs: 1
                Coldsync completed SPUs: 0
                Coldsync not complete SPUs: 1
    
        Fabric-link Failure Information:    
            Fabric Interface: fab1
              Child interface   Physical / Monitored Status     
              ge-5/0/2              Up   / Down 
    
    {secondary:node1}
    root@SPCFW-BRAVO> show chassis cluster status        
    Monitor Failure codes:
        CS  Cold Sync monitoring        FL  Fabric Connection monitoring
        GR  GRES monitoring             HW  Hardware monitoring
        IF  Interface monitoring        IP  IP monitoring
        LB  Loopback monitoring         MB  Mbuf monitoring
        NH  Nexthop monitoring          NP  NPC monitoring              
        SP  SPU monitoring              SM  Schedule monitoring
        CF  Config Sync monitoring      RE  Relinquish monitoring
     
    Cluster ID: 1
    Node   Priority Status               Preempt Manual   Monitor-failures
    
    Redundancy group: 0 , Failover count: 0
    node0  200      primary              no      no       None           
    node1  0        secondary            no      no       FL             
    
    Redundancy group: 1 , Failover count: 0
    node0  0        primary              yes     no       CS             
    node1  0        secondary            yes     no       CS FL          
    root@SPCFW-BRAVO> show chassis cluster interfaces 
    Control link status: Up
    
    Control interfaces: 
        Index   Interface   Monitored-Status   Internal-SA   Security
        0       fxp1        Up                 Disabled      Disabled  
    
    Fabric link status: Down
    
    Fabric interfaces: 
        Name    Child-interface    Status                    Security
                                   (Physical/Monitored)
        fab0    ge-0/0/2           Up   / Down               Disabled   
        fab0   
        fab1    ge-5/0/2           Up   / Down               Disabled   
        fab1   
    
    Redundant-ethernet Information:     
        Name         Status      Redundancy-group
        reth0        Down        Not configured   
        reth1        Up          1                
        reth2        Down        Not configured   
        reth3        Down        Not configured   
        reth4        Down        Not configured   
                                            
    Redundant-pseudo-interface Information:
        Name         Status      Redundancy-group
        lo0          Up          0                

    It seems that for some reason I can´t understand, fab0 ge-0/0/2 comes up sometimes, and comes down other times. 

     

    What do you think? should I resinstall the same Junos version? go back to 15.1? 

     

    Any help would be much appreciated

    Thanks!



  • 2.  RE: Problems and more problems in a SRX340 cluster.... the neverending story

    Posted 09-11-2019 14:10

    BTW, this is the full config of the cluster, 

     

    root@SPCFW-BRAVO> show configuration 
    ## Last commit: 2019-09-10 23:53:54 CEST by root
    version 18.3R2.7;
    groups {
        node0 {
            system {
                host-name SPCFW-ALPHA;
            }
            interfaces {
                fxp0 {
                    unit 0 {
                        family inet {
                            address 10.101.44.1/24;
                        }
                    }
                }
            }
        }
        node1 {
            system {
                host-name SPCFW-BRAVO;
            }
            interfaces {
                fxp0 {
                    unit 0 {                
                        family inet {
                            address 10.101.44.2/24;
                        }
                    }
                }
            }
        }
    }
    apply-groups "${node}";
    system {
        root-authentication {
            encrypted-password "$5$ ## SECRET-DATA
        }
        time-zone Europe/Madrid;
        name-server {
            8.8.8.8;
            8.8.4.4;
        }
        services {
            ssh;
            netconf {
                ssh;                        
            }
            web-management {
                https {
                    system-generated-certificate;
                }
            }
        }
        syslog {
            archive size 100k files 3;
            user * {
                any emergency;
            }
            file messages {
                any notice;
                authorization info;
            }
            file interactive-commands {
                interactive-commands any;
            }
        }
        max-configurations-on-flash 5;
        max-configuration-rollbacks 5;
        license {                           
            autoupdate {
                url https://ae1.juniper.net/junos/key_retrieval;
            }
        }
        ntp {
            server 69.164.198.192 prefer;
            server 216.239.35.8 prefer;
        }
        phone-home {
            server https://redirect.juniper.net;
        }
    }
    chassis {
        alarm {
            management-ethernet {
                link-down ignore;
            }
        }
        cluster {
            control-link-recovery;
            reth-count 5;
            redundancy-group 0 {
                node 0 priority 200;        
                node 1 priority 100;
            }
            redundancy-group 1 {
                node 0 priority 200;
                node 1 priority 100;
                preempt;
            }
        }
    }
    security {
        log {
            mode stream;
            report;
        }
        screen {
            ids-option untrust-screen {
                icmp {
                    ping-death;
                }
                ip {
                    source-route-option;
                    tear-drop;
                }                           
                tcp {
                    syn-flood {
                        alarm-threshold 1024;
                        attack-threshold 200;
                        source-threshold 1024;
                        destination-threshold 2048;
                        timeout 20;
                    }
                    land;
                }
            }
        }
        zones {
            security-zone Internal {
                host-inbound-traffic {
                    system-services {
                        all;
                    }
                    protocols {
                        all;
                    }
                }
                interfaces {                
                    reth1.0;
                }
            }
            security-zone External;
            security-zone VPN;
            security-zone DMZ;
        }
    }
    interfaces {
        ge-0/0/3 {
            gigether-options {
                redundant-parent reth1;
            }
        }
        ge-5/0/3 {
            gigether-options {
                redundant-parent reth1;
            }
        }
        fab0 {
            fabric-options {
                member-interfaces {
                    ge-0/0/2;               
                }
            }
        }
        fab1 {
            fabric-options {
                member-interfaces {
                    ge-5/0/2;
                }
            }
        }
        reth1 {
            description MGMT;
            redundant-ether-options {
                redundancy-group 1;
            }
            unit 0 {
                family inet {
                    address 10.101.40.254/24;
                }
            }
        }
    }
    protocols {                             
        l2-learning {
            global-mode switching;
        }
        rstp {
            interface all;
        }
    }
    access {
        address-assignment {
            pool junosDHCPPool1 {
                family inet {
                    network 192.168.1.0/24;
                    range junosRange {
                        low 192.168.1.2;
                        high 192.168.1.254;
                    }
                    dhcp-attributes {
                        router {
                            192.168.1.1;
                        }
                        propagate-settings ge-0/0/0.0;
                    }
                }                           
            }
            pool junosDHCPPool2 {
                family inet {
                    network 192.168.2.0/24;
                    range junosRange {
                        low 192.168.2.2;
                        high 192.168.2.254;
                    }
                    dhcp-attributes {
                        router {
                            192.168.2.1;
                        }
                        propagate-settings ge-0/0/0.0;
                    }
                }
            }
        }
    }
    vlans {
        vlan-trust {
            vlan-id 3;
            l3-interface irb.0;
        }                                   
    }
    
    {secondary:node1}


  • 3.  RE: Problems and more problems in a SRX340 cluster.... the neverending story
    Best Answer

    Posted 09-12-2019 01:00

    Trasgu,

     

    Can you change the cable connecting ge-0/0/2 of both nodes in order to isolate a bad cable?

    Can you change the fabric link to an interface different than ge-0/0/2 on both nodes?

    Gather a "show interfaces terse" when the issue is reported to confirm if the physical interfaces are going down.

     



  • 4.  RE: Problems and more problems in a SRX340 cluster.... the neverending story

    Posted 09-12-2019 03:22

    Hi Andres, 

    Of course I can try with a different cable , but don't think this will help, as the same if on the second node it's fine, and also because the left led is green and this happened only after the junos upgrade... 

     

    Also, the problem with the SSH... eveything smells really bad. 

     

    I'll make those tests this evening

     

    Thanks



  • 5.  RE: Problems and more problems in a SRX340 cluster.... the neverending story

    Posted 09-18-2019 02:14

    Trasgu,

     

    Can you check the following command on both nodes: show chassis cluster statistics

     

    Being the fact that the Fab is down on only one node, can you reboot both nodes simultaneously to have them sync?

     



  • 6.  RE: Problems and more problems in a SRX340 cluster.... the neverending story

    Posted 09-18-2019 02:17

    Finally I made it work stable wih the help of a Juniper guru, he bassically deleted all the configuration and started from scratch. However after that, the SSH was still failing when using SecureCRT, but worked from Putty. We had to change the SSH authentication options.

     

    Thanks



  • 7.  RE: Problems and more problems in a SRX340 cluster.... the neverending story

    Posted 09-18-2019 02:24

    Nice! based on my research a simultaneous reboot should have helped but anyway you did it during the process of re-configuring the cluster. Im glad your SRX cluster is back on track.