srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

View Only

last person joined: 8 days ago

Ask questions and share experiences with Juniper Connected Security. Discuss Advanced Threat Protection, SecIntel, Secure Analytics, Secure Connect, Security Director, and all things related to Juniper security technologies.

Back to discussions

Expand all | Collapse all

srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

1. srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

0 Recommend
Maroua Saidani
Posted 01-16-2022 15:10

Reply Reply Privately
Dear Security community,
Im facing a really strange issue with srx1500 cluster: it seems that both node lost communication between each other and traffic stopped being processed. Currently all the services like ipsec vpn and bgp connections are down.After checking i noticed that the received probe on node0 (the primary) is always 0 and the fabric link fab0 is physically up but its showing on the monitor status down.I even disabled the fabric link monitoring by" set chassis cluster no-fabric-monitoring" and then rebooted node0 but still the same issue. i forced the failover between nodes many times, i rebooted both nodes several times but still the same behaviour. Please find below all the information regarding the cluster:

Software version:JUNOS Software Release [15.1X49-D180.2]
cluster configuration( i deactivated the monitoring for now to troubleshoot further):

@sn-dx-node0> show configuration chassis cluster

no-fabric-monitoring;

reth-count 128;

redundancy-group 0 {

node 0 priority 100;

node 1 priority 1;

}

redundancy-group 1 {

node 0 priority 100;

node 1 priority 1;

inactive: interface-monitor {

xe-0/0/16 weight 255;

xe-7/0/16 weight 255;

}

}

redundancy-group 2 {

node 0 priority 100;

node 1 priority 1;

inactive: interface-monitor {

ge-0/0/12 weight 255;

ge-7/0/12 weight 255;

}

}
cluster status:

Cluster ID: 1

Node Priority Status Preempt Manual Monitor-failures

Redundancy group: 0 , Failover count: 1

node0 255 primary no yes None

node1 1 secondary no yes None

Redundancy group: 1 , Failover count: 1

node0 255 primary no yes HW

node1 1 secondary no yes None

Redundancy group: 2 , Failover count: 1

node0 255 primary no yes HW

node1 1 secondary no yes None

Cluster statistics:

{primary:node0}

@sn-dx-node0> show chassis cluster statistics

Control link statistics:

Control link 0:

Heartbeat packets sent: 70200

Heartbeat packets received: 70207

Heartbeat packet errors: 0

Fabric link statistics:

Child link 0

Probes sent: 159815

Probes received: 0

Child link 1

Probes sent: 0

Probes received: 0

Cluster interface

msaidani@sn-dx-node0> show chassis cluster interfaces

Control link status: Up

Control interfaces:

Index Interface Monitored-Status Internal-SA Security

0 em0 Up Disabled Disabled

Fabric link status: Down

Fabric interfaces:

Name Child-interface Status Security

(Physical/Monitored)

fab0 ge-0/0/11 Up / Down Disabled

fab0

fab1 ge-7/0/11 Up / Up Disabled

fab1

Redundant-ethernet Information:

Name Status Redundancy-group

reth0 Down Not configured

reth1 Up 1

reth2 Down 2
interface ge-0/0/11 status:

msaidani@sn-dx-node0> show interfaces ge-0/0/11

Physical interface: ge-0/0/11, Enabled, Physical link is Up

Interface index: 323, SNMP ifIndex: 523

Link-level type: 64, MTU: 9014, LAN-PHY mode, Link-mode: Full-duplex,

Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None,

Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled,

Auto-negotiation: Enabled, Remote fault: Online

Device flags : Present Running

Interface flags: SNMP-Traps Internal: 0x4000

Link flags : None

CoS queues : 8 supported, 8 maximum usable queues

Current address: c0:bf:a7:a5:30:30, Hardware address: c0:bf:a7:a5:2f:0b

Last flapped : 2022-01-15 20:06:49 UTC (19:34:08 ago)

Input rate : 0 bps (0 pps)

Output rate : 2264 bps (1 pps)

Active alarms : None

Active defects : None

Interface transmit statistics: Disabled

Logical interface ge-0/0/11.0 (Index 77) (SNMP ifIndex 572)

Flags: Up SNMP-Traps 0x4000 Encapsulation: ENET2

Input packets : 123

Output packets: 170252

Security: Zone: Null

Protocol aenet, AE bundle: fab0.0 Link Index: 0

Troubleshoot action done so far:

Rebooting both devices several times

Rebooting a single device.

Performing "set chassis cluster no-fabric-monitoring" then reboot node0

Performing "request chassis cluster failover redundancy-group 0 node 0 force"

Logging onto the secondary and performing a "request chassis cluster configuration-synchronize"

Changing the physical cable to a different port and moving the configuration.

Swapping the physical cable completely with a new one.

As the last option im thinking about disabling the cluster and doing it all over again but before doing that i wanted to check with you if you may have other options.
Thank you in advance

------------------------------
Maroua Saidani
------------------------------
2. RE: srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

0 Recommend
Nellikka
Posted 01-17-2022 05:39

Reply Reply Privately
Hi Maroua,

Cluster status output shows that there is a Hardware Monitoring Failure (HW) on node0. Please check for any active alarm (show chassis alarm) and coredump (show system core-dumps) on node0. The cluster priority of node0 is 255. When you do manual failover, the priority will be set to 255 and it should be cleared using 'request chassis cluster failover reset redundancy-group <0/1> command. Otherwise it will prevent auto failover when there is a monitoring failure and cause outage.

Thanks.
Nellikka

Original Message
3. RE: srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

0 Recommend
Maroua Saidani
Posted 01-17-2022 07:16
Edited by spuluka 01-17-2022 19:51

Reply Reply Privately
Hi Nellikka,
Thank you so much for your reply.
Yes i forced the failover as one of the troubleshoot step i did ,and there is already an outage right now as no service works even when automatic failover to node1 happen.
I`m also aware about that HW failure and im in the process of checking/replacing my SFPs, cables.. , but does that impact the issue with fab0 ? thats what i can not figure it out and i wanted to check with you.
Here is show chassis alarm on node 0:

@sn-dx-node0> show chassis alarms

node0:

--------------------------------------------------------------------------

1 alarms currently active

Alarm time Class Description

2022-01-15 20:19:52 UTC Major FPC 0 Major Errors

and below is core dump

@sn-dx-node0> show system core-dumps

node0:

--------------------------------------------------------------------------

-rw-rw---- 1 root wheel 65404663 Jul 18 2019 /var/crash/vmcore.0.gz

-rw-rw---- 1 root wheel 52874807 Jul 18 2019 /var/crash/vmcore.1.gz

-rw-rw---- 1 root wheel 64108763 Jul 22 2019 /var/crash/vmcore.2.gz

/var/tmp/*core*: No such file or directory

/var/tmp/pics/*core*: No such file or directory

/var/crash/kernel.*: No such file or directory

/var/jails/rest-api/tmp/*core*: No such file or directory

/tftpboot/corefiles/*core*: No such file or directory

total files: 3

/var/crash/corefiles:

total blocks: 12

total files: 0

Thank you in advance

------------------------------
Maroua Saidani
------------------------------

Original Message

Security

srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

Maroua Saidani01-16-2022 15:10

Nellikka01-17-2022 05:39

Maroua Saidani01-17-2022 07:16

1. srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

2. RE: srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down

3. RE: srx 1500 cluster issue: received probe packet is ALWAYS zero on the primary node - fabric link is physically up, monitor status down