Routing

Expand all | Collapse all

alarm

Jump to Best Answer
  • 1.  alarm

     
    Posted 05-12-2020 16:58

    hi all

    this is on mx...Any idea?

    show system alarms
    1 alarms currently active
    Alarm time               Class  Description
    2020-04-11 09:11:21  Minor  Backup RE Active
    


  • 2.  RE: alarm

     
    Posted 05-12-2020 17:10

     

    Hi,

     

    This alarm pops up when your config expected other routing engine to become MASTER but backup RE became master here.

     

    Take my router output as an example:

    1.  Alarm active 

    2. RE0 is default master with out any config. We can overide this be configuring which RE wanted to be a master.

    3.But RE1 is master here hence alarm active

    labroot@usm> show chassis alarms
    2 alarms currently active
    Alarm time Class Description
    2020-04-24 18:32:39 UTC Minor Backup RE Active

     

    labroot@usm show configuration chassis redundancy

     

    labroot@usm> show chassis routing-engine
    Routing Engine status:
    Slot 0:
    Current state Backup. >>>>
    Election priority Master (default). >>>>>
    <>

    Routing Engine status:
    Slot 1:
    Current state Master. >>>>>
    Election priority Backup (default). 

    Recovery procedure:

    1. Make current RE as a master  and commit.

     

    labroot@usm# set chassis redundancy routing-engine 1 master

    [edit]
    labroot@usm# set chassis redundancy routing-engine 0 backup

     

    labroot@usm# commit

     

     

     

    labroot@usm# run show chassis alarms
    1 alarms currently active
    Alarm time Class Description



  • 3.  RE: alarm

     
    Posted 05-12-2020 17:36

    Hi Arix,

     

    if it solves your problem with my procedure then you can flag as a Accepted solution for the benefit of others.

     

    Thanks.



  • 4.  RE: alarm

     
    Posted 05-12-2020 17:48

    ths for that. but reason must be found? why this mastership changed..



  • 5.  RE: alarm

    Posted 05-12-2020 20:02

    Hi Airx,

     

    A lot of reasons could have led to why the mastership switch happened. Please be aware that this is not a vulnerable condition since the other RE is equally efficient in taking care of the functionalities.

     

    Some reasons I can think of could be "loss of keepalives between master RE and backup RE after which backup Re assumes mastership even when the master has no legible damage", "high memory utilization on RE might be seen when there is very less memory left on master RE", "hardware failure", "master not able to communicate with the chassis manager".

     

    To figure out the real reason behind this, we might need log snippets starting from 5 minutes before the issue was recorded.

    > show log messages    ( make sure this lists the logs from the timestamp that you can see in "show system alarms" and 5 minutes before)

     

    If messages file does not have the logs for the required timestamp, you can keep looking into subsequent files that would be named messages.0.gz, messages.1.gz and so on.

     

    Hope this helps 🙂

     

    Please mark this as "Accepted Solution" if this addresses your query.

    Kudos would be much appreciated too 🙂



  • 6.  RE: alarm

     
    Posted 05-13-2020 02:07

    Hi all,

    No log about messages which of gone......I got only chassis one...

     show chassis routing-engine no-forwarding
    Routing Engine status:
      Slot 0:
        Current state                  Backup
        Election priority              Master
        Temperature                 39 degrees C / 102 degrees F
        CPU temperature             52 degrees C / 125 degrees F
        DRAM                      49105 MB (49152 MB installed)
        Memory utilization           7 percent
        5 sec CPU utilization:
          User                       0 percent
          Background                 0 percent
          Kernel                     1 percent
       
        Start time                     2020-05-01 19:33:20 
        Uptime                         10 hours, 5 minutes, 35 seconds----------------->>>>>>>>>>>>>>>>>>>
        Last reboot reason             0x4000:VJUNOS reboot
        Load averages:                 1 minute   5 minute  15 minute
                                           0.27       0.19       0.32
    
    
    

     



  • 7.  RE: alarm

     
    Posted 05-13-2020 02:28

    Hello Arix,

     

    I see that there are 3 core-dumps during the time of RE switchover. The last reboot reason is:

        Last reboot reason             0x4000:VJUNOS reboot

     

    > show system alarms
    3 alarms currently active
    Alarm time               Class  Description
    2020-05-13 09:12:53 EUPE Minor  Backup RE Active

    > show system core-dumps re0
    re0:
    --------------------------------------------------------------------------
    /var/crash/*core*: No such file or directory
    -rw-rw----  1 root  wheel   11593830 May 13 09:22 /var/tmp/bbe-smgd.core-tarball.3.tgz
    -rw-rw----  1 root  wheel      24194 May 13 09:24 /var/tmp/bbe-smgd.core-tarball.4.tgz
    -rw-r--r--  1 root  wheel   37761024 May 13 09:10 /var/tmp/cpcdd_jtac.core

     I would recommend you to open a JTAC ticket as the issue requires analysis of the core-dumps and log files to know the root cause of the RE mastership switchover.

     

    I hope this helps. Please mark this post "Accept as solution" if this answers your query.

     

    Kudos are always appreciated! Smiley Happy

     

    Best Regards,

    Lingabasappa H



  • 8.  RE: alarm

     
    Posted 05-13-2020 02:52

    thks response...

    Couldn't find any KB about following if this the point to start torubleshooting..

     

    0x4000:VJUNOS reboot

     



  • 9.  RE: alarm

    Posted 05-13-2020 03:16

    Hi Arix,

     

    The logs indicate multiple cores generated after which there was a mastership switchover. So, in order to understand why RE 0 was rebooted and Backup RE is currently active, we might have to take a look at the cores generated as well as detailed log analysis.

     

    Well, there are several reasons that could have eventually led to a VJUNOS reboot. In most cases, it is a self-recovery reboot initiated to overcome some vulnerability. So, the key to solving this problem is analysing what led to the reboot (Root Cause Analysis) in order to prevent future risks. For this, as Lingu suggested, you will need to open JTAC case for detailed analysis.

     

    If you would to understand what is the reason to a VJUNOS reboot, you can refer the below:

    https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-chassis-routing-engine.html

     

    Hope this helps 🙂

     

    Please mark this "Accepted Solution" if this helps you in addressing your query.

    Kudos would be much appreciated too 🙂

     



  • 10.  RE: alarm

     
    Posted 05-13-2020 05:22

    Hi all, thanks for replies...

    further investigation based on the following kb.... Can you give some idea? Disks in chassis hardware detail but they are not attached machdep?

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB35691&act=login

     

     

    sysctl -a | grep boo
    kern.boottime: { sec = 1587105767, usec = 812218 } Mon Marc 17 19:22:37 2020
    kern.bootfile: /packages/sets/active/boot/os-kernel/kernel
    kern.panic_reboot_wait_time: 15
    1 PART vtbd0p1 16384 512 i 1 o 4096 ty freebsd-boot xs GPT xt 83bd6b9d-7f41-11dc-be0b-001560b84f0f
                <type>freebsd-boot</type>
    kern.vt.kbd_reboot: 1
    kern.cam.boot_delay: 0
    vm.boot_pages: 64
    debug.boothowto_string: SERIAL
    debug.bootverbose: 0
    debug.boothowto: 4096
    hw.acpi.handle_reboot: 0
    hw.acpi.disable_on_reboot: 0
    hw.bcmsdk_boot_mode: 0
    hw.product.pvi.config.chasd.bootp.c24.grub_cfg: smpc_grub.cfg
    hw.product.pvi.config.chasd.bootp.c24.imgargs_str.1: turbotx.elf
    hw.product.pvi.config.chasd.bootp.c24.imgargs_str.0: smpc.elf
    hw.product.pvi.config.chasd.bootp.c24.images_count: 2
    hw.product.pvi.config.chasd.bootp.c24.bootcommand:     :td=/usr/share/pfe/:bf="grub.efi":\
    hw.product.pvi.config.chasd.bootp.c24.require_bootp: 1
    hw.product.pvi.config.chasd.bootp.c23.grub_cfg: smpc_grub.cfg
    hw.product.pvi.config.chasd.bootp.c23.imgargs_str.1: turbotx.elf
    hw.product.pvi.config.chasd.bootp.c23.imgargs_str.0: smpc.elf
    hw.product.pvi.config.chasd.bootp.c23.images_count: 2
    hw.product.pvi.config.chasd.bootp.c23.bootcommand:     :td=/usr/share/pfe/:bf="grub.efi":\
    hw.product.pvi.config.chasd.bootp.c23.require_bootp: 1
    hw.product.pvi.config.chasd.lc.c23.part_no.midplane_bootstring: 0
    hw.product.pvi.config.chasd.lc.c24.part_no.midplane_bootstring: 0
    hw.product.pvi.config.chasd.lc.c07.part_no.midplane_bootstring: 0
    hw.product.pvi.config.chasd.lc.c06.part_no.midplane_bootstring: 0
    hw.product.pvi.config.chasd.lc.c05.part_no.midplane_bootstring: 0
    hw.product.pvi.config.chasd.lc.c04.part_no.midplane_bootstring: 0
    hw.re.vmhost_reboot_reason_string: 0x1:power cycle/failure
    hw.re.vmhost_reboot_reason: 1
    hw.re.booted_up: 1
    hw.re.reboot_on_diskfail: 0
    hw.re.dualroot.booted_from:
    hw.re.reboot_reason_string: 0x4000:VJUNOS reboot
    hw.re.reboot_reason: 16384
    hw.re.jnx_rebooting: 0
    hw.re.fast_boot: 0
    hw.usb.no_boot_wait: 0
    machdep.currbootdev:
    machdep.bootsuccess: 0
    machdep.nextbootdev:
    machdep.bootdevs: 
    machdep.bootmethod: BIOS
    show chassis hardware detail
    Hardware inventory:
    Item             Version  Part number  Serial number     Description
    Chassis                                JN1xxxxxxxx      MX240
    Midplane         REV 39   750-047865   AMwee2          Enhanced MX240 Backplane
    FPM Board        REV 04   760-059207   kng6079          Front Panel Display
    PEM 0            Rev 01   740-063046   QCXXXGDWZ       PS 1.4-2.52kW; 90-264V AC in
    PEM 1            Rev 01   740-063046   QCXXXGDZ        PS 1.4-2.52kW; 90-264V AC in
    Routing Engine 0 REV 19   750-054758   CAKM56085          RE-S-2X00x6
      vtbd0 17408 MB                                         Virtio Block Disk-------------->
      vtbd1 15360 MB                                         Virtio Block Disk------------->
      ada0    511 MB  QEMU HARDDISK        QM00002           Emulated IDE Disk------------->
      usb0 (addr 0.1) XHCI root HUB 0      0x8086            uhub0
    Routing Engine 1 REV 19   750-054758   CAMZ8100          RE-S-2X00x6
      vtbd0 17408 MB                                         Virtio Block Disk-------------->
      vtbd1 15360 MB                                         Virtio Block Disk-------------->
      ada0    511 MB  QEMU HARDDISK        QM00002           Emulated IDE Disk-------------->
      usb0 (addr 0.1) XHCI root HUB 0      0x8086            uhub0
    y
    
    
    

     

     



  • 11.  RE: alarm

     
    Posted 05-13-2020 05:47

    Hi Arix,

     

    Looking the output all the boot devices are missing in the boot sequence. Kindly perform next steps mentioned in the KB to manually add the boot devices and rebooting RE to check if missing partitions were added back.

     

    machdep.nextbootdev:
    machdep.bootdevs: 
    machdep.bootmethod: BIOS

     

     sysctl -w machdep.bootdevs=compact-flash,disk1,disk2,lan

     

     

     

    If this solves your problem, please mark this post as "Accepted Solution" so we can help others too

     

    Kudos are appreciated too 

     

     

    Regards,

    Nadeem

     



  • 12.  RE: alarm
    Best Answer

     
    Posted 05-13-2020 07:18

    Hello Arix,

     

    Looking at the output, I see Routing Engines with model RE-S-2X00x6 has only one disk [AD0]

    From your output, I can see we are having AD0 on both the RE's:

    Routing Engine 0 REV 19   750-054758   CAKM56085          RE-S-2X00x6
      vtbd0 17408 MB                                         Virtio Block Disk-------------->
      vtbd1 15360 MB                                         Virtio Block Disk------------->
      ada0    511 MB  QEMU HARDDISK        QM00002           Emulated IDE Disk------------->
      usb0 (addr 0.1) XHCI root HUB 0      0x8086            uhub0
    Routing Engine 1 REV 19   750-054758   CAMZ8100          RE-S-2X00x6
      vtbd0 17408 MB                                         Virtio Block Disk-------------->
      vtbd1 15360 MB                                         Virtio Block Disk-------------->
      ada0    511 MB  QEMU HARDDISK        QM00002           Emulated IDE Disk-------------->
      usb0 (addr 0.1) XHCI root HUB 0      0x8086            uhub0

    This is expected.  I have tried the same in one of our lab device and below are the outputs:

     

    Routing Engine 0 REV 05 750-072925 XXXXXX RE-S-2X00x6
    vtbd0 17408 MB Virtio Block Disk
    vtbd1 15360 MB Virtio Block Disk
    ada0 511 MB QEMU HARDDISK QM00002 Emulated IDE Disk
    usb0 (addr 0.1) XHCI root HUB 0 0x8086 uhub0
    Routing Engine 1 REV 05 750-072925 XXXXXXX RE-S-2X00x6
    vtbd0 17408 MB Virtio Block Disk
    vtbd1 15360 MB Virtio Block Disk
    ada0 511 MB QEMU HARDDISK QM00002 Emulated IDE Disk
    usb0 (addr 0.1) XHCI root HUB 0 0x8086 uhub0

     

    labroot@jtac-mx480-r2032-re0> start shell user root
    Password:
    root@jtac-mx480-r2032-re0:/var/home/labroot # sysctl -a | grep bootdev
    machdep.currbootdev:
    machdep.nextbootdev:
    machdep.bootdevs:

     

    There are no results and it is expected.

    From the above outputs, I see the issue is not related to the Hard-disk failure/missing. We need to perform a detailed analysis to find the root cause of this issue.

     

    We need to decode the core files and extract some information from that to know the root cause. For core-dump and log analysis, you need to open a JTAC ticket.

     

    Let me know if you need any more information on this issue.

     

    I hope this helps. Please mark this post "Accept as solution" if this answers your query.

     

    Kudos are always appreciated! Smiley Happy

     

    Best Regards,

    Lingabasappa H

     

     

     



  • 13.  RE: alarm

     
    Posted 05-12-2020 19:00

    Hello Arix,

     

    Mastership switchover from one RE to other happens due to many reasons:

     

    1. High CPU utilization on the Master RE before the mastership switch over.

    2. File system corruption.

    3. The sudden reboot of Master RE before the switchover.

    4. Hard-disk failure on the Master RE before the switchover.

     

    Please share below outputs

    >show chassis alarms

    >show system core-dumps

    >show chassis hardware detail

    >show system processes extensive

    >show chassis routing-engine no-forwarding

     

    For troubleshooting High CPU, please refer the below KB:

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261

     

    For checking the Hard-disk and other bootlist components:-

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB35691

     

    The above are the basic troubleshooting steps for knowing the root cause of the mastership switchover. 

     

    I hope this helps. Please mark this post "Accept as solution" if this answers your query.

     

    Kudos are always appreciated! Smiley Happy

     

    Best Regards,

    Lingabasappa H