Hello all,
we are running two SRX1400 as a cluster.
After running for quite some time without issues, suddenly the secondary node lost all his network interfaces.
Checking the cluster hardware it seems that the secondary node has lost its FPC0, or at least the connection to it.
admin@node0> show chassis hardware
node0:
--------------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
...
FPC 0 REV 19 750-031019 XXXXXXXX SRX1k 10GE SYSIO
PIC 0 BUILTIN BUILTIN 6x 1GE RJ45 3x 1GE SFP 3x 10GE SFP+
Xcvr 6 REV 02 740-013111 XXXXXXX SFP-T
Xcvr 7 REV 02 740-013111 XXXXXXX SFP-T
Xcvr 8 NON-JNPR XXXXXXXXXXX SFP+-10G-SR
Xcvr 9 NON-JNPR XXXXXXXXXXX SFP+-10G-SR
...
node1:
--------------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
...
FPC 0 REV 19 750-031019 YYYYYYYY SRX1k 10GE SYSIO
PIC 0
...
All productive interfaces are located on this FPC0.
In addition to that, the productive redundancy group 1 was running on secondary node at that time and we lost complete connectivity to and through that device.
Interface monitoring was configured, but it did not execute a failover.
We had to do a manual failover to get the connections working again.
Currently I am planning a reboot of the secondary node, to see if the system can recognize FPC0 again completely.
Apart from that, I wonder, why the interface monitoring did not help to do a fail-over? Any hints?
Thanks in advance.
EDIT: Reboot has been done (without re-seating the card). The card has been recognized again; interfaces are listed again and are available again in the system. "Solution accepted" for the hint to "jsrpd might stuck" and the recommended software version.