Switching

Expand all | Collapse all

Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

  • 1.  Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

    Posted 16 days ago

    Hey friends,

    Trying to troubleshoot an issue, was hoping to get some insight or suggestions here.

    I'm new(ish) to Juniper, getting my feet wet fast.

    I recently stood up a Junos Space VM, and have pretty much everything configured and running "smoothly".

    All of the devices are managed and in sync.

    However, I have a handful of devices that constantly report as 'Down', and then go back 'Up' pretty much immediately.
    I have the OpenNMS 'nodeDown' alert configured.

    This is a campus network, with many weird and confusing things going on, and I assumed that the timeouts were just a result of strict SNMP parameters coupled with some latency on the network, so I widened the net quite a bit:
    60 second timeout, 5 retries.

    Well, this didn't have an effect, so I checked the logs.

    These devices are all littered with 'JTASK_SCHED_SLIP_KEVENT' messages, for the following processes:

    dot1xd, mcsnoopd, sflowd, l2cpd, rpd

    Would anyone happen to have any experience with this, and have suggestions on a troubleshooting path?

    We have hardware repair/replace, but no Service with JTAC. So they'll replace it, but I have 12 devices and I know it's not the hardware.

    This is (12) devices, on (3) platforms, in different geographical areas with different gateways.
    EX2200-C-12P-2G
    EX2300-24P
    EX2300-C-12P

    Any comments are appreciated.



    ------------------------------
    ERICK MOYERS
    ------------------------------


  • 2.  RE: Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

     
    Posted 15 days ago
    Hi Erick,

    In general the troubleshooting steps would be following
    1. Check if the timing of SCHED_SLIP matches with the tine when the device is reported down.
    2. If yes then check if this is true for all the devices exhibiting the problem.
    3. Check if you have high cpu at the time of the issue (show chassis routig-engine) and (show system process extensive) to see what is causing the scheduler slip.
    4. Narrow down the issue as much as possible.
    5. Open a JTAC case for further investigation

    Hope this helps


  • 3.  RE: Junos Space timeout on EX Devices w/ JTASK_SCHED_SLIP_KEVENT - triggers 'nodeDown' Alerts

    Posted 15 days ago
    ah thank you! I'll will check these items and see what I can gather.    
    I'm diving face-first into alot of uncharted territory, it's nice to have this forum :)

    ------------------------------
    ERICK MOYERS
    ------------------------------