Junos OS

Expand all | Collapse all

High routing engine CPU because of snmp / mib2d process

  • 1.  High routing engine CPU because of snmp / mib2d process

    Posted 05-01-2020 02:05

    Hello,

     

    I got an MX5 router with a significant amount of interfaces (IFLs).

    It could be the reason why my NMS monitors a constant CPU in use % of 100%.

    Regardless of the snmp filter I'm using (you can see below, it will only get all physical interfaces), there are time where it takes forever to do a interface description snmp mib walk..

     

    The routing engine is not always at high % in the NMS, the NMS only shows it because it's polling every 5 min.

     

    'show chassis routing-engine' always shows the user at 74% and Kernel at 22%.

     

    What can I check or fix?

     

    beelze@ams-nik-er2> show configuration snmp 
    location "[52.355980, 4.950350] // Nikhef, Science Park 105, Amsterdam, the Netherlands";
    filter-interfaces {
        interfaces {
            cbp0;
            demux0;
            gre0;
            tap;
            gre;
            ipip;
            pime;
            pimd;
            mtun;
            pip0;
            dsc;
            irb;
            pp0;
            lsi;
            ip-*;
            lt-*;
            mt-*;
            pe-*;
            pfe-*;
            pfh-*;
            ut-*;
            vt-*;
            ".*\.32767";
            ".*\.16384";
            pd-*;
            lc-*;
            ".*\.32768";
            ".*\.16386";
            ".*\.16385";
            jsrv.*;
            esi;
            fxp0;
            "!(ge-.*/[0-9]$|ge-.*/1[0-9]$|xe-.*/[0-9]$|xe-.*/1[0-9]$|ae[0-9]$|lo0$|^ge-1/0/6.287|^ge-1/0/6.288)";
        }
        all-internal-interfaces;
    }
    filter-duplicates;
    community "ComSav3311!!" {
        authorization read-only;
        client-list-name MANAGEMENT;
    }
    
    beelze@ams-nik-er2> show system processes extensive | except 0.00    
    last pid: 33774;  load averages:  4.08,  3.79,  3.53  up 762+06:50:43    11:01:54
    165 processes: 8 running, 129 sleeping, 28 waiting
    
    Mem: 1413M Active, 168M Inact, 227M Wired, 63M Cache, 112M Buf, 111M Free
    Swap: 2821M Total, 2821M Free
    
    
      PID USERNAME         THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
     1739 root               1  76    0   125M   110M RUN    4556.7 45.17% mib2d
    33620 root               1  76    0   132M 11052K RUN      0:15 11.74% mgd
    33614 root               1  76    0  8684K  3580K RUN      0:07  5.00% sshd
     1817 root               1   4    0   467M   418M kqread 490.4H  2.49% rpd
     1924 root               1  43    0 94364K 52012K select 193.5H  1.95% dcd
     1422 root               1  70    0 16012K  8804K RUN    558.5H  1.90% eventd
     1836 root               3  63    0   118M 65276K sigwai 268.2H  1.86% jpppd
       11 root               1 -56 -159     0K    16K WAIT   252.5H  1.03% swi2: netisr 0
    33618 oxidized-comsav    1  57    0 53944K 42800K select   0:03  0.98% cli
     1669 root              11  42    0 21196K 11452K ucond  582.2H  0.98% clksyncd
     1859 root               1  58    0 37220K 29888K select 1343.6  0.83% snmpd
    15662 root               3  60    0 95796K 56572K sigwai  77.8H  0.15% pppoed
     1663 root               1  43    0 52200K 44724K select 228.2H  0.10% ppmd
     1832 root               1  43    0 82516K 32756K select 142.5H  0.05% jdhcpd
     1682 root               1  41    0 16076K  9696K select  23.8H  0.05% license-check
    33082 root               1  42    0  8680K  3612K select   0:01  0.05% sshd
    beelze@ams-nik-er2> show chassis routing-engine    
    Routing Engine status:
        Temperature                 42 degrees C / 107 degrees F
        CPU temperature             51 degrees C / 123 degrees F
        DRAM                      2048 MB (2048 MB installed)
        Memory utilization          92 percent
        CPU utilization:
          User                      74 percent
          Background                 0 percent
          Kernel                    22 percent
          Interrupt                  4 percent
          Idle                       0 percent
        Model                          RE-MX5-T
        Serial ID                      S/N CABS8179
        Start time                     2018-03-31 04:11:41 CEST
        Uptime                         762 days, 6 hours, 51 minutes, 19 seconds
        Last reboot reason             Router rebooted after a normal shutdown.
        Load averages:                 1 minute   5 minute  15 minute
                                           4.52       4.06       3.66


  • 2.  RE: High routing engine CPU because of snmp / mib2d process

     
    Posted 05-01-2020 09:19

    Hi ,

     

    Greetings

    Can you share show snmp stats-response-statistics and also see can you limit the stats to only to IFD's , not for all the IFLs.

    MX5 is PPC [Power PC] model hence polling all the scaled IFLs defintely you will see high CPU in SNMP polling on scaled IFL scenario.

     

    https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-snmp-stats-response-statistics.html

     

    Thanks



  • 3.  RE: High routing engine CPU because of snmp / mib2d process

    Posted 05-04-2020 00:49

    Here you go:

     

    beelze@ams-nik-er2> show snmp stats-response-statistics    
    
    Average response time statistics:
    Stats                Stats                    Average
    Type                 Responses                Response
                                                  Time (ms)
    ifd(non ae)          99547                    110.00
    ifd(ae)              10956                    24.00
    ifl(non ae)          10770                    37.56
    ifl(ae)              46226                    142.69
    firewall             677                      38562.82
    
    Bucket statistics:
    Bucket               Stats
    Type(ms)             Responses
    0 - 10               148282               
    11 - 50              15085                
    51 - 100             4074                 
    101 - 200            605                  
    201 - 500            86                   
    501 - 1000           17                   
    1001 - 2000          2                    
    2001 - 5000          2                    
    More than 5001       23                   
    
    Bad responses:
    Response        Request                Stats          Key
                    Time                   Type
    (ms)            (UTC)
    10320.26        2020-05-01 14:34:03    firewall       25Mb
    10319.96        2020-05-01 14:34:03    firewall       5Mb
    10319.71        2020-05-01 14:34:03    firewall       ACCEPT-PPPOE-ONLY-IN
    10319.45        2020-05-01 14:34:03    firewall       ACCEPT-PPPOE-ONLY-OUT
    10311.83        2020-05-01 14:34:03    firewall       PROTECT-ROUTER-v4
    10301.85        2020-05-01 14:34:03    firewall       PartnerCNE-Data_Down
    10301.59        2020-05-01 14:34:03    firewall       PartnerCNE-Data_Up
    10301.34        2020-05-01 14:34:03    firewall       PartnerCNE-Voice_Down
    10301.11        2020-05-01 14:34:03    firewall       PartnerCNE-Voice_Up
    10296.41        2020-05-01 14:34:03    firewall       SilverMobilityKatwijk
    10291.72        2020-05-01 14:34:03    firewall       l3vpn-horizon
    10291.47        2020-05-01 14:34:03    firewall       l3vpn-libernet
    10291.22        2020-05-01 14:34:03    firewall       l3vpn-mica
    10290.99        2020-05-01 14:34:03    firewall       police-2M-xe-1/3/0.11427-i
    10290.75        2020-05-01 14:34:03    firewall       police-2M-xe-1/3/0.11427-o
    10290.52        2020-05-01 14:34:03    firewall       police-48M-xe-1/3/0.10427-i
    10290.28        2020-05-01 14:34:03    firewall       police-48M-xe-1/3/0.10427-o
    10286.56        2020-05-01 14:34:03    firewall       urpf-filter4
    10286.15        2020-05-01 14:34:03    firewall       urpf-filter6
    10285.91        2020-05-01 14:34:03    firewall       __default_bpdu_filter__

     

    Is there a way to restrict the NMS from polling firewall filters?

    That seems to be the bottleneck here.



  • 4.  RE: High routing engine CPU because of snmp / mib2d process

    Posted 05-05-2020 05:39

    Bump.

     

    Tried to block SNMP with below view configuration.

    Does not really do much in regard to routing engine CPU %.

     

    view test {
        oid .1 include;
        oid 1.3.6.1.4.1.2636.3.5 exclude;
    }
    community "abc123456789" {
        view test;
    }


  • 5.  RE: High routing engine CPU because of snmp / mib2d process

    Posted 06-04-2020 21:27

    Hello Beeelzebub,

     

    Good day!
    I happened to come accross this post. I ust wanted to understand the following?

     

    1. is there any bulk SNMP requests being sent? If yes, limit to a lesser no. of commands per poll rather than bulk. Please check this out: https://kb.juniper.net/InfoCenter/index?page=content&id=KB30713&cat=EX3300&actp=LIST

     

    2. Do you see any continuous snmp related log messages in the logs?

     

    3. Are the processes continuously high? Can you run the command to check if the processes continuously use 45.17% of the CPU.
    show system processes extensive | refresh 10

     

    Can you possibly try to de-activate SNMP configuration on the device to see how the CPU behaves?



  • 6.  RE: High routing engine CPU because of snmp / mib2d process

     
    Posted 06-05-2020 00:27

    Hello Beeelzebub,

     

    beelze@ams-nik-er2> show system processes extensive | except 0.00    
    last pid: 33774;  load averages:  4.08,  3.79,  3.53  up 762+06:50:43    11:01:54
    165 processes: 8 running, 129 sleeping, 28 waiting
    
    Mem: 1413M Active, 168M Inact, 227M Wired, 63M Cache, 112M Buf, 111M Free
    Swap: 2821M Total, 2821M Free
    
    
      PID USERNAME         THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
     1739 root               1  76    0   125M   110M RUN    4556.7 45.17% mib2d
    33620 root               1  76    0   132M 11052K RUN      0:15 11.74% mgd

     

    From the above outputs, I see that mib2d is utilizing 45.17 % and total free processes are around 5600M[Swap+Free].

     

     In Routing-Engine, i see there is 0% Idle memory:

     

        CPU utilization:
          User                      74 percent
          Background                 0 percent
          Kernel                    22 percent
          Interrupt                  4 percent
          Idle                       0 percent

    Check the below Troubleshooting Checklist - Routing Engine High CPU:

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261 

     

    I see there is SNMP configured on the device and that may consume CPU for its process.

     

    Can you disable SNMP monitoring and manually check the CPU usage for a while?

     

    Bulk SNMP walk is bound to increase CPU on this EX.  And if we poll for a lot of data in a short time, it could spike "mgd" or CLI process as too.  It's better to only probe for critical alarms/events like interface downs, chassis/system alarms, etc., work out a lesser aggressive polling interval (if you poll every 5mins, try 10mins for example) and limit to a lesser no. of commands per poll rather than bulk.  Please check this out: 

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB30713&cat=EX3300&actp=LIST

     

    The best to do is stay on the recommended Junos to avoid any others, but I believe for CPU utilization spike, that's something to expect during a bulk SNMP walk.  If limited polling is done for what's critical, we must be alright:

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB21476&actp=METADATA#ex_series

     

    I hope this helps. Please mark this post "Accept as solution" if this answers your query.

     

    Kudos are always appreciated! Smiley Happy

     

    Best Regards,

    Lingabasappa H


    #SNMP
    #process
    #CPUutilization


  • 7.  RE: High routing engine CPU because of snmp / mib2d process

    Posted 06-05-2020 04:49

    Hi Beeelzebub,

     

    Greetings !!

    If there is Bulk request sent and recieved via agent and NMS  there could be possible high CPU Spike observed on the device 

    you can check all request sent and recived in the logs if the Traceoptions are enabled under SNMP

    You can check the below docs 

    show log snmpd

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB30713&cat=EX3300&actp=LIST

     

    Menahile from the Output i can see mib2d is utilized around 45.17%

    beelze@ams-nik-er2> show system processes extensive | except 0.00    
    last pid: 33774;  load averages:  4.08,  3.79,  3.53  up 762+06:50:43    11:01:54
    165 processes: 8 running, 129 sleeping, 28 waiting
    
    Mem: 1413M Active, 168M Inact, 227M Wired, 63M Cache, 112M Buf, 111M Free
    Swap: 2821M Total, 2821M Free
    
    
      PID USERNAME         THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
     1739 root               1  76    0   125M   110M RUN    4556.7 45.17% mib2d

     check for below troubleshooting steps 

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261 

    Menahile Kindly disable  the SNMP Monitroing and check for the Cpu observation for a while 

     

     

    If this solves your problem, please mark this post as "Accepted Solution".
    If you think that my answer was helpful, please spend some Kudos.

     

    Regards,

    Deeksha P