SRX

Expand all | Collapse all

SRX1400 - lost "contact" to the SYSIO card in FPC0

Jump to Best Answer
  • 1.  SRX1400 - lost "contact" to the SYSIO card in FPC0

    Posted 09-09-2020 04:01

    Hello all,

     

    we are running two SRX1400 as a cluster.

    After running for quite some time without issues, suddenly the secondary node lost all his network interfaces.

    Checking the cluster hardware it seems that the secondary node has lost its FPC0, or at least the connection to it.

     

    admin@node0> show chassis hardware
    node0:
    --------------------------------------------------------------------------
    Hardware inventory:
    Item Version Part number Serial number Description
    ...
    FPC 0 REV 19 750-031019 XXXXXXXX SRX1k 10GE SYSIO
    PIC 0 BUILTIN BUILTIN 6x 1GE RJ45 3x 1GE SFP 3x 10GE SFP+
    Xcvr 6 REV 02 740-013111 XXXXXXX SFP-T
    Xcvr 7 REV 02 740-013111 XXXXXXX SFP-T
    Xcvr 8 NON-JNPR XXXXXXXXXXX SFP+-10G-SR
    Xcvr 9 NON-JNPR XXXXXXXXXXX SFP+-10G-SR
    ...

    node1:
    --------------------------------------------------------------------------
    Hardware inventory:
    Item Version Part number Serial number Description
    ...
    FPC 0 REV 19 750-031019 YYYYYYYY SRX1k 10GE SYSIO
    PIC 0
    ...

     

    All productive interfaces are located on this FPC0.

     

    In addition to that, the productive redundancy group 1 was running on secondary node at that time and we lost complete connectivity to and through that device.

    Interface monitoring was configured, but it did not execute a failover.
    We had to do a manual failover to get the connections working again.

     

    Currently I am planning a reboot of the secondary node, to see if the system can recognize FPC0 again completely.

    Apart from that, I wonder, why the interface monitoring did not help to do a fail-over? Any hints?

     

    Thanks in advance.

     

    EDIT: Reboot has been done (without re-seating the card). The card has been recognized again; interfaces are listed again and are available again in the system. "Solution accepted" for the hint to "jsrpd might stuck" and the recommended software version.



  • 2.  RE: SRX1400 - lost "contact" to the SYSIO card in FPC0

    Posted 09-09-2020 04:19

    Hello Hermod,

     

    I would suggest to re-seat the SYSIO and if the issue persists again then RMA has to be initiated.

     

    Regarding Interface monitoring, Have you configured the weights properly? i.e. Your configured weight will be decremented with global weight 255 and it has to reach 0 in order to initiate a failover.

     

    Send me the output of user@host> show configuration chassis cluster | display set



  • 3.  RE: SRX1400 - lost "contact" to the SYSIO card in FPC0

    Posted 09-09-2020 04:28

    Thanks for your reply.

    This is the requested output:

     

    set chassis cluster control-link-recovery
    set chassis cluster reth-count 12
    set chassis cluster redundancy-group 1 node 0 priority 200
    set chassis cluster redundancy-group 1 node 1 priority 100
    set chassis cluster redundancy-group 1 preempt
    set chassis cluster redundancy-group 1 interface-monitor xe-0/0/8 weight 255
    set chassis cluster redundancy-group 1 interface-monitor xe-0/0/9 weight 255
    set chassis cluster redundancy-group 1 interface-monitor xe-4/0/8 weight 255
    set chassis cluster redundancy-group 1 interface-monitor xe-4/0/9 weight 255
    set chassis cluster redundancy-group 1 interface-monitor ge-0/0/0 weight 255
    set chassis cluster redundancy-group 1 interface-monitor ge-4/0/0 weight 255
    set chassis cluster redundancy-group 0 node 0 priority 200
    set chassis cluster redundancy-group 0 node 1 priority 100

     

     

     

    Edit2: The xx-4/y/z interfaces are those from the "disappeared" FPC0 on the second node.

     

    Edit1: I had an idea that if interface-monitoring can only work for "existing" interfaces, that could explain why nothing happened. The interfaces disappeared and are not accessible anymore... But my knowledge of SRX isn't that deep... I am only guessing here.



  • 4.  RE: SRX1400 - lost "contact" to the SYSIO card in FPC0
    Best Answer

    Posted 09-09-2020 04:38

    Hi Hermod,

     

    This seems like a strange issue.

     

    Even though the interfaces disappear on the secondary node, the jsrpd should trigger a failover. I have seen this behaviour in older Junos version where the jsrpd might stuck and didn't trigger a failover due to a software bug. 

     

    I would suggest you to stay on the recommended Junos version.