Data Center

 View Only

IMPORTANT MODERATION NOTICE

This community is currently under full moderation, meaning  all posts will be reviewed before appearing in the community. Please expect a brief delay—there is no need to post multiple times. If your post is rejected, you'll receive an email outlining the reason(s). We've implemented full moderation to control spam. Thank you for your patience and participation.



srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node and fab monitor shows the interface fab0 is down although it is physically up

  • 1.  srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node and fab monitor shows the interface fab0 is down although it is physically up

    Posted 01-16-2022 15:10
    Dear Juniper Communit,
    I`m facing a really strange issue with SRX 1500 cluster, the node is directly connected and no switch in between. it seems that the two nodes lost communication between each other. the primary node seems to not being able to become the master and the secondary node when it becomes the master, its not being able to route anything. all the services  like bgp and ipsec vpn are currently down .even when i force the primary node to take over the mastership by executing "request chassis cluster failover redundancy-group 0 node 0 force"still nothing works and no traffic is being processed.
    After checking, i noticed that the received probe packet  in Fabric link is always 0 no mater what troubleshoot i did and "show chassis cluster interfaces" is always showing interface fab0 is down although it is physically up. I even used this hidden command "set chassis cluster no-fabric-monitoring" and rebooted node0 but still exactly the same:

    Software version:
    JUNOS Software Release [15.1X49-D180.2]

    Below is cluster configuration ( i deactivated the interface monitor to troubleshoot further):

    @sn-dx-node0>show configuration chassis cluster 

    no-fabric-monitoring;

    reth-count 128;

    redundancy-group 0 {

        node 0 priority 100;

        node 1 priority 1;

    }

    redundancy-group 1 {

        node 0 priority 100;

        node 1 priority 1;

        inactive: interface-monitor {

            xe-0/0/16 weight 255;

            xe-7/0/16 weight 255;

        }

    }

    redundancy-group 2 {

        node 0 priority 100;

        node 1 priority 1;

        inactive: interface-monitor {

            ge-0/0/12 weight 255;

            ge-7/0/12 weight 255;

        }

    }

    and below is most of the show chassis cluster outputs after i manually forced failover to node 0:

    @sn-dx-node0> show chassis cluster status 

    Cluster ID: 1

    Node   Priority Status         Preempt Manual   Monitor-failures

    Redundancy group: 0 , Failover count: 1

    node0  255      primary        no      yes      None           

    node1  1        secondary      no      yes      None           

    Redundancy group: 1 , Failover count: 1

    node0  255      primary        no      yes      HW             

    node1  1        secondary      no      yes      None           

    Redundancy group: 2 , Failover count: 1

    node0  255      primary        no      yes      HW             

    node1  1        secondary      no      yes      None  

       

    @sn-dx-node0> show chassis cluster statistics   

    Control link statistics:

        Control link 0:

        Heartbeat packets sent: 8688

        Heartbeat packets received: 8695

        Heartbeat packet errors: 0

    Fabric link statistics:

        Child link 0

        Probes sent: 19323

        Probes received: 0

        Child link 1

        Probes sent: 0

        Probes received: 0

    @sn-dx-node0> show chassis cluster interfaces    

    Control link status: Up

    Control interfaces: 

        Index   Interface   Monitored-Status   Internal-SA   Security

        0       em0         Up                 Disabled      Disabled  

    Fabric link status: Down

    Fabric interfaces: 

        Name    Child-interface    Status                    Security

                                   (Physical/Monitored)

        fab0    ge-0/0/11          Up   / Down               Disabled   

        fab0   

        fab1    ge-7/0/11          Up   / Up                 Disabled   

        fab1   

    below is the configuration of fab0 :

    @sn-dx-node0> show configuration interfaces fab0  

    fabric-options {

        member-interfaces {

            ge-0/0/11;

        }

    }
    and here is the show interface ge-0/0/11 output:

    msaidani@sn-dx-node0> show interfaces ge-0/0/11                               

    Physical interface: ge-0/0/11, Enabled, Physical link is Up

      Interface index: 323, SNMP ifIndex: 523

      Link-level type: 64, MTU: 9014, LAN-PHY mode, Link-mode: Full-duplex,

      Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None,

      Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled,

      Auto-negotiation: Enabled, Remote fault: Online

      Device flags   : Present Running

      Interface flags: SNMP-Traps Internal: 0x4000

      Link flags     : None

      CoS queues     : 8 supported, 8 maximum usable queues

      Current address: c0:bf:a7:a5:30:30, Hardware address: c0:bf:a7:a5:2f:0b

      Last flapped   : 2022-01-15 20:06:49 UTC (02:27:45 ago)

      Input rate     : 0 bps (0 pps)

      Output rate    : 2264 bps (1 pps)

      Active alarms  : None

      Active defects : None

      Interface transmit statistics: Disabled

      Logical interface ge-0/0/11.0 (Index 77) (SNMP ifIndex 572)

        Flags: Up SNMP-Traps 0x4000 Encapsulation: ENET2

        Input packets : 123

        Output packets: 21292

        Security: Zone: Null

        Protocol aenet, AE bundle: fab0.0   Link Index: 0


    Troubleshoot actions done so far:

    • Rebooting both devices several times 
    • Rebooting a single device.
    • Performing a "set chassis cluster no-fabric-monitoring" then reboot node 0 
    • Performing "request chassis cluster failover redundancy-group 0 node 0 force" 
    • Logging onto the secondary and performing a "request chassis cluster configuration-synchronize"
    • Changing the physical cable to a different port and moving the configuration.
    • Swapping the physical cable completely with a new one.

    Please help me as im really running out of options :( 
    Thank you in advance 



    ------------------------------
    Maroua Saidani
    ------------------------------