Switching

 View Only

IMPORTANT MODERATION NOTICE

This community is currently under full moderation, meaning  all posts will be reviewed before appearing in the community. Please expect a brief delay—there is no need to post multiple times. If your post is rejected, you'll receive an email outlining the reason(s). We've implemented full moderation to control spam. Thank you for your patience and participation.



  • 1.  Sudden infrequent reboots of EX4200 0x2:watchdog

    Posted 04-25-2022 18:45

    My EX4200 started to infrequently reboot after about 7 years in operation with no reboots. The intermittent reboots are about once a month.

    The reboot reason is 0x2:watchdog

    show chassis routing-engine
    Routing Engine status:
      Slot 0:
        Current state                  Master
        Temperature                 48 degrees C / 118 degrees F
        CPU temperature             48 degrees C / 118 degrees F
        DRAM                      1024
        Memory utilization          49 percent
        CPU utilization:
          User                       6 percent
          Background                 0 percent
          Kernel                     4 percent
          Interrupt                  0 percent
          Idle                      89 percent
        Model                          EX4200-48T, 8 POE
        Serial ID                      BP0208249061
        Start time                     2013-12-17 20:31:57 PST
        Uptime                         1 hour, 4 minutes, 54 seconds
        Last reboot reason             0x2:watchdog
        Load averages:                 1 minute   5 minute  15 minute
                                           0.16       0.35       0.39
    

    The was an uninformative core dump:

    > show system core-dumps core-file-info /var/tmp/vmcore.0.gz
    fpc0:
    --------------------------------------------------------------------------
    'junos' process terminated
    Stack trace:
    #0  0x00000000 in ?? ()
    #0  0x00000000 in ?? ()
    
    {master:0}

    The master log has nothing: it suddenly stopped writing there and started with the reboot messages, without even inserting  a carriage return:

    > show log messages.4.gz
    Apr 25 11:01:49  365main-c0326-swi-01 sshd: SSHD_LOGIN_FAILED: Login failed for user 'root' from host '134.209.168.212'                               
    Apr 25 11:01:49  365main-c0326-swi-01 sshd[10932]: failed to update number of consecutive login attempts for user: root                               
    Apr 25 11:01:49  365main-c0326-swi-01 sshd[10932]: Received disconnect from 134.209.168.212: 11: Bye Bye [preauth]                                    
    Apr 25 11:01:49  365main-c0326-swi-01 inetd[1330]: /usr/sbin/sshd[10932]: exited, status 255                                                          
    Apr 25 11:01:52  365main-c0326-swi-01 sshd[10935]: Failed password for root from 61.177.173.5 port 52375 ssh2                                         
    Apr 25 11:01:52  365main-c0326-swi-01 sshd: SSHD_LOGIN_FAILED: Login failed for user 'root' from host '61.177.173.5'                                  
    Apr 25 11:01:52  365main-c0326-swi-01 sshd[10935]: failedDec 17 20:33:57  365main-c0326-swi-01 eventd[967]: SYSTEM_OPERATIONAL: System is operational 
    Dec 17 20:33:57  365main-c0326-swi-01 /kernel: Copyright (c) 1996-2013, Juniper Networks, Inc.                                                        
    Dec 17 20:33:57  365main-c0326-swi-01 /kernel: All rights reserved.                                                                                   
    Dec 17 20:33:57  365main-c0326-swi-01 /kernel: Copyright (c) 1992-2006 The FreeBSD Project.                                                           
    Dec 17 20:33:57  365main-c0326-swi-01 /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994                               
    Dec 17 20:33:57  365main-c0326-swi-01 /kernel:  The Regents of the University of California. All rights reserved.                                     
    Dec 17 20:33:57  365main-c0326-swi-01 /kernel: JUNOS 12.3R5.7 #0: 2013-12-18 01:32:43 UTC           

    I have two identically configured EX4200s and both started these random reboots around the same time, a few months ago. Thus I thought it was not a hardware problem. system uptime showed excessive load of about 1.5, and tcpdump in the shell showed a UDP NTP  flood. I reconfigured the firewall to protect the routing engine from the NTP traffic, and the load averages dropped down to 0.3-0.5. The second switch has never rebooted since, but his one has. There are no chassis alarms.

    What might a possible reboot reason be?



  • 2.  RE: Sudden infrequent reboots of EX4200 0x2:watchdog

     
    Posted 04-25-2022 19:24
    There is a known software bug that is noted in PR1047142 and will require a software update to one of the fixed versions.

    https://supportportal.juniper.net/s/article/Junos-Service-Release-12-3R9-S1-Released

    ------------------------------
    Steve Puluka BSEET - Juniper Ambassador
    IP Architect - DQE Communications Pittsburgh, PA (Metro Ethernet & ISP)
    http://puluka.com/home
    ------------------------------



  • 3.  RE: Sudden infrequent reboots of EX4200 0x2:watchdog

    Posted 04-26-2022 05:25

    Thanks. The PR says:

    • PR1047142   -  EX4200 rebooting randomly due to watchdog:0x2 without any logs and core.
    In this case, there was a core dump, but it was uninformative. Do you think it still can be fixed by this update?


  • 4.  RE: Sudden infrequent reboots of EX4200 0x2:watchdog

     
    Posted 04-26-2022 05:26
    Assuming your Junos version is before the listed one, yes seems like the same issue as the core is empty.

    ------------------------------
    Steve Puluka BSEET - Juniper Ambassador
    IP Architect - DQE Communications Pittsburgh, PA (Metro Ethernet & ISP)
    http://puluka.com/home
    ------------------------------