Dear Juniper Communit,
I`m facing a really strange issue with SRX 1500 cluster, the node is directly connected and no switch in between. it seems that the two nodes lost communication between each other. the primary node seems to not being able to become the master and the secondary node when it becomes the master, its not being able to route anything. all the services like bgp and ipsec vpn are currently down .even when i force the primary node to take over the mastership by executing "
request chassis cluster failover redundancy-group 0 node 0 force"still nothing works and no traffic is being processed.
After checking,
i noticed that the received probe packet in Fabric link is always 0 no mater what troubleshoot i did and
"show chassis cluster interfaces" is always showing interface fab0 is down although it is physically up. I even used this hidden command "set chassis cluster no-fabric-monitoring" and rebooted node0 but still exactly the same:
Software version:JUNOS Software Release [15.1X49-D180.2]
Below is cluster configuration ( i deactivated the interface monitor to troubleshoot further):
@sn-dx-node0>show configuration chassis cluster
no-fabric-monitoring;
reth-count 128;
redundancy-group 0 {
node 0 priority 100;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 100;
node 1 priority 1;
inactive: interface-monitor {
xe-0/0/16 weight 255;
xe-7/0/16 weight 255;
}
}
redundancy-group 2 {
node 0 priority 100;
node 1 priority 1;
inactive: interface-monitor {
ge-0/0/12 weight 255;
ge-7/0/12 weight 255;
}
}
and below is most of the show chassis cluster outputs after i manually forced failover to node 0:
@sn-dx-node0> show chassis cluster status
Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 255 primary no yes None
node1 1 secondary no yes None
Redundancy group: 1 , Failover count: 1
node0 255 primary no yes HW
node1 1 secondary no yes None
Redundancy group: 2 , Failover count: 1
node0 255 primary no yes HW
node1 1 secondary no yes None
@sn-dx-node0> show chassis cluster statistics
Control link statistics:
Control link 0:
Heartbeat packets sent: 8688
Heartbeat packets received: 8695
Heartbeat packet errors: 0
Fabric link statistics:
Child link 0
Probes sent: 19323
Probes received: 0
Child link 1
Probes sent: 0
Probes received: 0
@sn-dx-node0> show chassis cluster interfaces
Control link status: Up
Control interfaces:
Index Interface Monitored-Status Internal-SA Security
0 em0 Up Disabled Disabled
Fabric link status: Down
Fabric interfaces:
Name Child-interface Status Security
(Physical/Monitored)
fab0 ge-0/0/11 Up / Down Disabled
fab0
fab1 ge-7/0/11 Up / Up Disabled
fab1
below is the configuration of fab0 :
@sn-dx-node0> show configuration interfaces fab0
fabric-options {
member-interfaces {
ge-0/0/11;
}
}
and here is the show interface ge-0/0/11 output:
msaidani@sn-dx-node0> show interfaces ge-0/0/11
Physical interface: ge-0/0/11, Enabled, Physical link is Up
Interface index: 323, SNMP ifIndex: 523
Link-level type: 64, MTU: 9014, LAN-PHY mode, Link-mode: Full-duplex,
Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None,
Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled,
Auto-negotiation: Enabled, Remote fault: Online
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x4000
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: c0:bf:a7:a5:30:30, Hardware address: c0:bf:a7:a5:2f:0b
Last flapped : 2022-01-15 20:06:49 UTC (02:27:45 ago)
Input rate : 0 bps (0 pps)
Output rate : 2264 bps (1 pps)
Active alarms : None
Active defects : None
Interface transmit statistics: Disabled
Logical interface ge-0/0/11.0 (Index 77) (SNMP ifIndex 572)
Flags: Up SNMP-Traps 0x4000 Encapsulation: ENET2
Input packets : 123
Output packets: 21292
Security: Zone: Null
Protocol aenet, AE bundle: fab0.0 Link Index: 0
Troubleshoot actions done so far:
- Rebooting both devices several times
- Rebooting a single device.
- Performing a "set chassis cluster no-fabric-monitoring" then reboot node 0
- Performing "request chassis cluster failover redundancy-group 0 node 0 force"
- Logging onto the secondary and performing a "request chassis cluster configuration-synchronize"
- Changing the physical cable to a different port and moving the configuration.
- Swapping the physical cable completely with a new one.
Please help me as im really running out of options :(
Thank you in advance
------------------------------
Maroua Saidani
------------------------------