Blogs

What steps do I need to take when one of the High-Availability nodes in the cluster shows the status "down"?

By Erdem posted 03-14-2016 15:50

  

Question

What steps do I need to take when one of the High-Availability nodes in the fabric (cluster) shows the status "down"?

Answer

  1. Collect all troubleshooting data and logs for the fabric (cluster) from the Space Troubleshooting page (Administration > Space Troubleshooting).
    NOTE: For information about how to download the troubleshooting data and log files, see Downloading the Troubleshooting Log File in Server Mode, Downloading the Troubleshooting Log File in Maintenance Mode, or Downloading Troubleshooting System Log Files Through the Junos Space CLI.
    These logs might be needed later for troubleshooting.
  2. Try to log in to the console of the node that is down.
  3. If you are able to log in to the console, proceed to step 4. If you are unable to log in to the console, go to step 5.
  4. If you are able to log in to the console:
    1. Using the Junos Space Settings menu, access the debug shell.
    2. Check whether the jmp-watchdog and jboss services are up by executing the following command: service service-name status, where service-name is the name of the service; for example, jboss.
    3. If the services are down:
      1. Restart the services by executing the following command: service service-name restart, where service-name is the name of the service; for example, jboss.
      2. Wait for 15 to 20 minutes and check the status of the node on the Administration > Fabric page.
        If the node status is still Down, go to step 4
    4. If the services are up, but the node status is still Down, open a case with the Juniper Networks Technical Assistance Center (JTAC).
  5. If you are unable to log in to the console or if the node status is still Down after restarting the services, reboot the node. For more information, see Shutting Down or Rebooting Nodes in the Junos Space Fabric.
  6. Wait for 15 to 20 minutes and check the status of the node in the Administration > Fabric page.
    If the node status is still down, proceed to step 7.
  7. Delete the node from the Junos Space fabric (on the Administration > Fabric page).
    NOTE: If another node is available in the fabric that can be prompted as a high availability node, you are prompted to enable high availability on that node before deleting the node that is down.
    For more information, see Deleting a Node from the Junos Space Fabric.
  8. For the deleted node, depending on whether you are using a virtual appliance or a hardware appliance, you can redeploy the virtual appliance or reimage the hardware appliance by using a USB drive. For more information, refer to the Junos Space hardware or virtual appliance documentation at Junos Space and Applications.
    NOTE: Ensure that the Junos Space version that you are reimaging or redeploying is the same version as the Junos Space version running on other nodes in the fabric.
  9. (Optional) If you are unable to reimage the hardware appliance or redeploy the virtual appliance, you might need to procure a new hardware or virtual appliance as a replacement.
  10. Reconfigure the hardware or virtual appliance with exactly the same network settings that were previously configured.
  11. Navigate to the Fabric page (Administration > Fabric) and add the node back to the fabric.
    For more information, see Adding a Node to an Existing Junos Space Fabric topic.

#FAQ