Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

View Only

last person joined: yesterday

Ask questions and share experiences about EX and QFX portfolios and all switching solutions across your data center, campus, and branch locations.

Back to discussions

Expand all | Collapse all

Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

1. Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

0 Recommend
ERICK MOYERS
Posted 02-16-2021 11:54

Reply Reply Privately
Hey friends,

Trying to troubleshoot an issue, was hoping to get some insight or suggestions here.

I'm new(ish) to Juniper, getting my feet wet fast.

I recently stood up a Junos Space VM, and have pretty much everything configured and running "smoothly".

All of the devices are managed and in sync.

However, I have a handful of devices that constantly report as 'Down', and then go back 'Up' pretty much immediately.
I have the OpenNMS 'nodeDown' alert configured.

This is a campus network, with many weird and confusing things going on, and I assumed that the timeouts were just a result of strict SNMP parameters coupled with some latency on the network, so I widened the net quite a bit:
60 second timeout, 5 retries.

Well, this didn't have an effect, so I checked the logs.

These devices are all littered with 'JTASK_SCHED_SLIP_KEVENT' messages, for the following processes:

dot1xd, mcsnoopd, sflowd, l2cpd, rpd

Would anyone happen to have any experience with this, and have suggestions on a troubleshooting path?

We have hardware repair/replace, but no Service with JTAC. So they'll replace it, but I have 12 devices and I know it's not the hardware.

This is (12) devices, on (3) platforms, in different geographical areas with different gateways.
EX2200-C-12P-2G
EX2300-24P
EX2300-C-12P

Any comments are appreciated.

------------------------------
ERICK MOYERS
------------------------------
2. RE: Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

0 Recommend
raviky
Posted 02-16-2021 20:56

Reply Reply Privately
Hi Erick,

In general the troubleshooting steps would be following
1. Check if the timing of SCHED_SLIP matches with the tine when the device is reported down.
2. If yes then check if this is true for all the devices exhibiting the problem.
3. Check if you have high cpu at the time of the issue (show chassis routig-engine) and (show system process extensive) to see what is causing the scheduler slip.
4. Narrow down the issue as much as possible.
5. Open a JTAC case for further investigation

Hope this helps

Original Message
3. RE: Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

0 Recommend
ERICK MOYERS
Posted 02-17-2021 09:10

Reply Reply Privately
ah thank you! I'll will check these items and see what I can gather.
I'm diving face-first into alot of uncharted territory, it's nice to have this forum :)

------------------------------
ERICK MOYERS
------------------------------

Original Message

Switching

Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

ERICK MOYERS02-16-2021 11:54

raviky02-16-2021 20:56

ERICK MOYERS02-17-2021 09:10

1. Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

2. RE: Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

3. RE: Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts