Routing

View Only

last person joined: 5 days ago

Ask questions and share experiences about ACX Series, CTP Series, MX Series, PTX Series, SSR Series, JRR Series, and all things routing, including portfolios and protocols.

Back to discussions

Expand all | Collapse all

alarm

Jump to Best Answer

1. alarm

0 Recommend
Arix
Posted 05-12-2020 16:58

Reply Reply Privately
hi all

this is on mx...Any idea?

show system alarms 1 alarms currently active Alarm time Class Description 2020-04-11 09:11:21 Minor Backup RE Active
2. RE: alarm

1 Recommend
satyank
Posted 05-12-2020 17:10

Reply Reply Privately
Hi,

This alarm pops up when your config expected other routing engine to become MASTER but backup RE became master here.

Take my router output as an example:

1. Alarm active

2. RE0 is default master with out any config. We can overide this be configuring which RE wanted to be a master.

3.But RE1 is master here hence alarm active

labroot@usm> show chassis alarms
2 alarms currently active
Alarm time Class Description
2020-04-24 18:32:39 UTC Minor Backup RE Active

labroot@usm show configuration chassis redundancy

labroot@usm> show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Backup. >>>>
Election priority Master (default). >>>>>
<>

Routing Engine status:
Slot 1:
Current state Master. >>>>>
Election priority Backup (default).

Recovery procedure:

1. Make current RE as a master and commit.

labroot@usm# set chassis redundancy routing-engine 1 master

[edit]
labroot@usm# set chassis redundancy routing-engine 0 backup

labroot@usm# commit

labroot@usm# run show chassis alarms
1 alarms currently active
Alarm time Class Description
3. RE: alarm

0 Recommend
satyank
Posted 05-12-2020 17:36

Reply Reply Privately
Hi Arix,

if it solves your problem with my procedure then you can flag as a Accepted solution for the benefit of others.

Thanks.
4. RE: alarm

0 Recommend
Arix
Posted 05-12-2020 17:48

Reply Reply Privately
ths for that. but reason must be found? why this mastership changed..
5. RE: alarm

1 Recommend
JoelNovans
Posted 05-12-2020 20:02

Reply Reply Privately
Hi Airx,

A lot of reasons could have led to why the mastership switch happened. Please be aware that this is not a vulnerable condition since the other RE is equally efficient in taking care of the functionalities.

Some reasons I can think of could be "loss of keepalives between master RE and backup RE after which backup Re assumes mastership even when the master has no legible damage", "high memory utilization on RE might be seen when there is very less memory left on master RE", "hardware failure", "master not able to communicate with the chassis manager".

To figure out the real reason behind this, we might need log snippets starting from 5 minutes before the issue was recorded.

> show log messages ( make sure this lists the logs from the timestamp that you can see in "show system alarms" and 5 minutes before)

If messages file does not have the logs for the required timestamp, you can keep looking into subsequent files that would be named messages.0.gz, messages.1.gz and so on.

Hope this helps 🙂

Please mark this as "Accepted Solution" if this addresses your query.

Kudos would be much appreciated too 🙂
6. RE: alarm

0 Recommend
Arix
Posted 05-13-2020 02:07

Reply Reply Privately
Hi all,

No log about messages which of gone......I got only chassis one...

show chassis routing-engine no-forwarding Routing Engine status: Slot 0: Current state Backup Election priority Master Temperature 39 degrees C / 102 degrees F CPU temperature 52 degrees C / 125 degrees F DRAM 49105 MB (49152 MB installed) Memory utilization 7 percent 5 sec CPU utilization: User 0 percent Background 0 percent Kernel 1 percent Start time 2020-05-01 19:33:20 Uptime 10 hours, 5 minutes, 35 seconds----------------->>>>>>>>>>>>>>>>>>> Last reboot reason 0x4000:VJUNOS reboot Load averages: 1 minute 5 minute 15 minute 0.27 0.19 0.32
7. RE: alarm

0 Recommend
shlinga
Posted 05-13-2020 02:28

Reply Reply Privately
Hello Arix,

I see that there are 3 core-dumps during the time of RE switchover. The last reboot reason is:

Last reboot reason 0x4000:VJUNOS reboot

> show system alarms 3 alarms currently active Alarm time Class Description 2020-05-13 09:12:53 EUPE Minor Backup RE Active

> show system core-dumps re0 re0: -------------------------------------------------------------------------- /var/crash/*core*: No such file or directory -rw-rw---- 1 root wheel 11593830 May 13 09:22 /var/tmp/bbe-smgd.core-tarball.3.tgz -rw-rw---- 1 root wheel 24194 May 13 09:24 /var/tmp/bbe-smgd.core-tarball.4.tgz -rw-r--r-- 1 root wheel 37761024 May 13 09:10 /var/tmp/cpcdd_jtac.core

I would recommend you to open a JTAC ticket as the issue requires analysis of the core-dumps and log files to know the root cause of the RE mastership switchover.

I hope this helps. Please mark this post "Accept as solution" if this answers your query.

Kudos are always appreciated!

Best Regards,

Lingabasappa H
8. RE: alarm

0 Recommend
Arix
Posted 05-13-2020 02:52

Reply Reply Privately
thks response...

Couldn't find any KB about following if this the point to start torubleshooting..

0x4000:VJUNOS reboot
9. RE: alarm

1 Recommend
JoelNovans
Posted 05-13-2020 03:16

Reply Reply Privately
Hi Arix,

The logs indicate multiple cores generated after which there was a mastership switchover. So, in order to understand why RE 0 was rebooted and Backup RE is currently active, we might have to take a look at the cores generated as well as detailed log analysis.

Well, there are several reasons that could have eventually led to a VJUNOS reboot. In most cases, it is a self-recovery reboot initiated to overcome some vulnerability. So, the key to solving this problem is analysing what led to the reboot (Root Cause Analysis) in order to prevent future risks. For this, as Lingu suggested, you will need to open JTAC case for detailed analysis.

If you would to understand what is the reason to a VJUNOS reboot, you can refer the below:

https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-chassis-routing-engine.html

Hope this helps 🙂

Please mark this "Accepted Solution" if this helps you in addressing your query.

Kudos would be much appreciated too 🙂

10. RE: alarm

Recommend

Arix

Posted 05-13-2020 05:22

Hi all, thanks for replies...

further investigation based on the following kb.... Can you give some idea? Disks in chassis hardware detail but they are not attached machdep?

https://kb.juniper.net/InfoCenter/index?page=content&id=KB35691&act=login

sysctl -a | grep boo
kern.boottime: { sec = 1587105767, usec = 812218 } Mon Marc 17 19:22:37 2020
kern.bootfile: /packages/sets/active/boot/os-kernel/kernel
kern.panic_reboot_wait_time: 15
1 PART vtbd0p1 16384 512 i 1 o 4096 ty freebsd-boot xs GPT xt 83bd6b9d-7f41-11dc-be0b-001560b84f0f
            <type>freebsd-boot</type>
kern.vt.kbd_reboot: 1
kern.cam.boot_delay: 0
vm.boot_pages: 64
debug.boothowto_string: SERIAL
debug.bootverbose: 0
debug.boothowto: 4096
hw.acpi.handle_reboot: 0
hw.acpi.disable_on_reboot: 0
hw.bcmsdk_boot_mode: 0
hw.product.pvi.config.chasd.bootp.c24.grub_cfg: smpc_grub.cfg
hw.product.pvi.config.chasd.bootp.c24.imgargs_str.1: turbotx.elf
hw.product.pvi.config.chasd.bootp.c24.imgargs_str.0: smpc.elf
hw.product.pvi.config.chasd.bootp.c24.images_count: 2
hw.product.pvi.config.chasd.bootp.c24.bootcommand:     :td=/usr/share/pfe/:bf="grub.efi":\
hw.product.pvi.config.chasd.bootp.c24.require_bootp: 1
hw.product.pvi.config.chasd.bootp.c23.grub_cfg: smpc_grub.cfg
hw.product.pvi.config.chasd.bootp.c23.imgargs_str.1: turbotx.elf
hw.product.pvi.config.chasd.bootp.c23.imgargs_str.0: smpc.elf
hw.product.pvi.config.chasd.bootp.c23.images_count: 2
hw.product.pvi.config.chasd.bootp.c23.bootcommand:     :td=/usr/share/pfe/:bf="grub.efi":\
hw.product.pvi.config.chasd.bootp.c23.require_bootp: 1
hw.product.pvi.config.chasd.lc.c23.part_no.midplane_bootstring: 0
hw.product.pvi.config.chasd.lc.c24.part_no.midplane_bootstring: 0
hw.product.pvi.config.chasd.lc.c07.part_no.midplane_bootstring: 0
hw.product.pvi.config.chasd.lc.c06.part_no.midplane_bootstring: 0
hw.product.pvi.config.chasd.lc.c05.part_no.midplane_bootstring: 0
hw.product.pvi.config.chasd.lc.c04.part_no.midplane_bootstring: 0
hw.re.vmhost_reboot_reason_string: 0x1:power cycle/failure
hw.re.vmhost_reboot_reason: 1
hw.re.booted_up: 1
hw.re.reboot_on_diskfail: 0
hw.re.dualroot.booted_from:
hw.re.reboot_reason_string: 0x4000:VJUNOS reboot
hw.re.reboot_reason: 16384
hw.re.jnx_rebooting: 0
hw.re.fast_boot: 0
hw.usb.no_boot_wait: 0
machdep.currbootdev:
machdep.bootsuccess: 0
machdep.nextbootdev:
machdep.bootdevs: 
machdep.bootmethod: BIOS

show chassis hardware detail
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                JN1xxxxxxxx      MX240
Midplane         REV 39   750-047865   AMwee2          Enhanced MX240 Backplane
FPM Board        REV 04   760-059207   kng6079          Front Panel Display
PEM 0            Rev 01   740-063046   QCXXXGDWZ       PS 1.4-2.52kW; 90-264V AC in
PEM 1            Rev 01   740-063046   QCXXXGDZ        PS 1.4-2.52kW; 90-264V AC in
Routing Engine 0 REV 19   750-054758   CAKM56085          RE-S-2X00x6
  vtbd0 17408 MB                                         Virtio Block Disk-------------->
  vtbd1 15360 MB                                         Virtio Block Disk------------->
  ada0    511 MB  QEMU HARDDISK        QM00002           Emulated IDE Disk------------->
  usb0 (addr 0.1) XHCI root HUB 0      0x8086            uhub0
Routing Engine 1 REV 19   750-054758   CAMZ8100          RE-S-2X00x6
  vtbd0 17408 MB                                         Virtio Block Disk-------------->
  vtbd1 15360 MB                                         Virtio Block Disk-------------->
  ada0    511 MB  QEMU HARDDISK        QM00002           Emulated IDE Disk-------------->
  usb0 (addr 0.1) XHCI root HUB 0      0x8086            uhub0
y

11. RE: alarm

1 Recommend
Loki
Posted 05-13-2020 05:47

Reply Reply Privately
Hi Arix,

Looking the output all the boot devices are missing in the boot sequence. Kindly perform next steps mentioned in the KB to manually add the boot devices and rebooting RE to check if missing partitions were added back.

machdep.nextbootdev: machdep.bootdevs: machdep.bootmethod: BIOS

sysctl -w machdep.bootdevs=compact-flash,disk1,disk2,lan

If this solves your problem, please mark this post as "Accepted Solution" so we can help others too

Kudos are appreciated too

Regards,

Nadeem
12. RE: alarm
Best Answer

1 Recommend
shlinga
Posted 05-13-2020 07:18

Reply Reply Privately
Hello Arix,

Looking at the output, I see Routing Engines with model RE-S-2X00x6 has only one disk [AD0]

From your output, I can see we are having AD0 on both the RE's:

Routing Engine 0 REV 19 750-054758 CAKM56085 RE-S-2X00x6 vtbd0 17408 MB Virtio Block Disk--------------> vtbd1 15360 MB Virtio Block Disk-------------> ada0 511 MB QEMU HARDDISK QM00002 Emulated IDE Disk-------------> usb0 (addr 0.1) XHCI root HUB 0 0x8086 uhub0 Routing Engine 1 REV 19 750-054758 CAMZ8100 RE-S-2X00x6 vtbd0 17408 MB Virtio Block Disk--------------> vtbd1 15360 MB Virtio Block Disk--------------> ada0 511 MB QEMU HARDDISK QM00002 Emulated IDE Disk--------------> usb0 (addr 0.1) XHCI root HUB 0 0x8086 uhub0

This is expected. I have tried the same in one of our lab device and below are the outputs:

Routing Engine 0 REV 05 750-072925 XXXXXX RE-S-2X00x6
vtbd0 17408 MB Virtio Block Disk
vtbd1 15360 MB Virtio Block Disk
ada0 511 MB QEMU HARDDISK QM00002 Emulated IDE Disk
usb0 (addr 0.1) XHCI root HUB 0 0x8086 uhub0
Routing Engine 1 REV 05 750-072925 XXXXXXX RE-S-2X00x6
vtbd0 17408 MB Virtio Block Disk
vtbd1 15360 MB Virtio Block Disk
ada0 511 MB QEMU HARDDISK QM00002 Emulated IDE Disk
usb0 (addr 0.1) XHCI root HUB 0 0x8086 uhub0

labroot@jtac-mx480-r2032-re0> start shell user root
Password:
root@jtac-mx480-r2032-re0:/var/home/labroot # sysctl -a | grep bootdev
machdep.currbootdev:
machdep.nextbootdev:
machdep.bootdevs:

There are no results and it is expected.

From the above outputs, I see the issue is not related to the Hard-disk failure/missing. We need to perform a detailed analysis to find the root cause of this issue.

We need to decode the core files and extract some information from that to know the root cause. For core-dump and log analysis, you need to open a JTAC ticket.

Let me know if you need any more information on this issue.

I hope this helps. Please mark this post "Accept as solution" if this answers your query.

Kudos are always appreciated!

Best Regards,

Lingabasappa H
13. RE: alarm

1 Recommend
shlinga
Posted 05-12-2020 19:00

Reply Reply Privately
Hello Arix,

Mastership switchover from one RE to other happens due to many reasons:

1. High CPU utilization on the Master RE before the mastership switch over.

2. File system corruption.

3. The sudden reboot of Master RE before the switchover.

4. Hard-disk failure on the Master RE before the switchover.

Please share below outputs

>show chassis alarms

>show system core-dumps

>show chassis hardware detail

>show system processes extensive

>show chassis routing-engine no-forwarding

For troubleshooting High CPU, please refer the below KB:

https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261

For checking the Hard-disk and other bootlist components:-

https://kb.juniper.net/InfoCenter/index?page=content&id=KB35691

The above are the basic troubleshooting steps for knowing the root cause of the mastership switchover.

I hope this helps. Please mark this post "Accept as solution" if this answers your query.

Kudos are always appreciated!

Best Regards,

Lingabasappa H

Routing

alarm

Arix05-12-2020 16:58

satyank05-12-2020 17:10

satyank05-12-2020 17:36

Arix05-12-2020 17:48

JoelNovans05-12-2020 20:02

Arix05-13-2020 02:07

shlinga05-13-2020 02:28

Arix05-13-2020 02:52

JoelNovans05-13-2020 03:16

Arix05-13-2020 05:22

Loki05-13-2020 05:47

shlinga05-13-2020 07:18Best Answer

shlinga05-12-2020 19:00

1. alarm

2. RE: alarm

3. RE: alarm

4. RE: alarm

5. RE: alarm

6. RE: alarm

7. RE: alarm

8. RE: alarm

9. RE: alarm

10. RE: alarm

11. RE: alarm

12. RE: alarm Best Answer

13. RE: alarm

12. RE: alarm
Best Answer