Junos OS

Expand all | Collapse all

High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

Jump to Best Answer
  • 1.  High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

    Posted 06-27-2019 01:20

    Hey community,

     

    we have a non-mixed VC with 5x EX4300 series switches. Since roughly 20 days the cpu usage is higher than it should be. In this timeframe we had a migration from one NetApp to another and we thought this would cause the additional utilization.

     

    Unfortunately the migration is completed and the high cpu usage is still there.

     

    After some research i found the following KBs:

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB31605&cat=&actp=LIST

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261

     

    All the VC members are connected via DAC cables and there are currently only 3 SFP (2x 10g, 1x 1g) connections on the EX4300-32f.

     

     

    Here are some informations about the REs:

     

    root@vx-core-b6-01> show chassis routing-engine
    Routing Engine status:
    Slot 0:
    Current state Master
    Temperature 63 degrees C / 145 degrees F
    CPU temperature 63 degrees C / 145 degrees F
    DRAM 2048 MB
    Memory utilization 70 percent
    5 sec CPU utilization:
    User 65 percent
    Background 0 percent
    Kernel 27 percent
    Interrupt 8 percent
    Idle 0 percent
    Model EX4300-48T
    Serial ID PE3717480060
    Start time 2019-04-15 06:16:00 UTC
    Uptime 73 days, 1 hour, 57 minutes, 26 seconds
    Last reboot reason Router rebooted after a normal shutdown.
    Load averages: 1 minute 5 minute 15 minute
    1.55 1.52 1.25
    Routing Engine status:
    Slot 1:
    Current state Backup
    Temperature 56 degrees C / 132 degrees F
    CPU temperature 56 degrees C / 132 degrees F
    DRAM 2048 MB
    Memory utilization 51 percent
    5 sec CPU utilization:
    User 15 percent
    Background 0 percent
    Kernel 13 percent
    Interrupt 0 percent
    Idle 72 percent
    Model EX4300-48T
    Serial ID PE3717480078
    Start time 2019-04-15 06:16:00 UTC
    Uptime 73 days, 1 hour, 57 minutes, 26 seconds
    Last reboot reason Router rebooted after a normal shutdown.
    Load averages: 1 minute 5 minute 15 minute
    0.52 0.41 0.35

     

     

    After checking the usage of the individual processes via "top", i got the following output:

     

    last pid: 72354; load averages: 1.11, 0.97, 0.99 up 73+01:55:52 08:11:22
    71 processes: 2 running, 68 sleeping, 1 zombie
    CPU states: 34.6% user, 0.0% nice, 20.0% system, 4.0% interrupt, 41.4% idle
    Mem: 1070M Active, 88M Inact, 144M Wired, 324M Cache, 112M Buf, 235M Free
    Swap:

    PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
    1787 root 2 -52 -52 564M 209M select 549.0H 38.09% pfex_junos
    72354 root 1 44 0 23784K 2280K RUN 0:00 3.00% top
    1781 root 1 43 0 67352K 32676K select 36.9H 0.98% chassisd
    1844 root 1 42 0 54488K 36532K select 20.6H 0.00% l2ald
    1865 root 1 40 0 196M 99344K select 751:58 0.00% authd
    1842 root 1 40 0 59072K 38708K select 552:06 0.00% mib2d
    1338 root 1 41 0 28700K 14288K select 411:25 0.00% eventd
    1876 root 2 4 0 14064K 3276K select 406:15 0.00% repd
    1861 root 1 4 0 65700K 37552K kqread 363:04 0.00% l2cpd
    1847 root 1 40 0 41232K 23148K select 349:52 0.00% pfed
    1864 root 2 40 0 108M 28624K select 341:01 0.00% jdhcpd
    1877 root 1 4 0 77560K 31632K kqread 294:27 0.00% mcsnoopd
    1788 root 1 4 0 36024K 14444K kqread 276:43 0.00% chassism
    1878 root 1 41 0 26456K 14864K select 246:50 0.00% license-check
    1841 root 1 40 0 35760K 25056K select 177:30 0.00% snmpd
    1873 root 1 4 0 75288K 31036K kqread 129:59 0.00% dot1xd-secure
    1852 root 1 40 0 25448K 14652K select 113:30 0.00% ppmd
    1786 root 1 4 -20 57788K 23800K kqread 93:34 0.00% vccpd
    1870 root 1 40 0 21508K 8208K select 66:33 0.00% shm-rtsdbd
    1867 root 1 40 0 19724K 6224K select 64:56 0.00% bdbrepd
    1850 root 2 40 0 34244K 19512K select 39:06 0.00% cosd
    1843 root 2 4 0 155M 58804K kqread 30:22 0.00% rpd
    1836 root 1 40 0 2752K 1280K select 30:07 0.00% bslockd
    1838 root 1 40 0 24552K 14424K select 27:23 0.00% alarmd
    1785 root 1 40 0 12856K 5404K select 20:16 0.00% irsd
    1860 root 1 40 0 28476K 15748K select 20:10 0.00% lacpd
    1846 root 1 40 0 3476K 1500K select 15:32 0.00% tnp.sntpd
    1869 root 1 40 0 27044K 10016K select 13:05 0.00% smid
    1857 root 1 40 0 22096K 12032K select 11:19 0.00% bfdd
    1889 root 1 40 0 64788K 24164K select 10:39 0.00% dcd
    1882 root 1 40 0 23152K 12384K select 10:18 0.00% pkid
    1863 root 1 40 0 22992K 12980K select 9:52 0.00% lfmd
    1862 root 1 40 0 21740K 11848K select 5:33 0.00% oamd
    1840 root 1 40 0 9792K 8844K select 5:07 0.00% ntpd
    1851 root 1 40 0 35036K 16556K select 4:59 0.00% kmd
    1859 root 1 40 0 61360K 18220K select 4:50 0.00% dfcd
    1849 root 1 40 0 28256K 14760K select 4:16 0.00% rmopd
    1848 root 1 40 0 21432K 11936K select 3:59 0.00% ilmid
    1845 root 1 25 -15 21572K 11832K select 3:54 0.00% apsd
    1839 root 1 40 0 21436K 11088K select 3:51 0.00% craftd
    1853 root 1 40 0 13228K 5644K select 2:56 0.00% lmpd
    1783 root 1 42 0 7168K 2596K select 2:06 0.00% inetd
    1782 root 1 40 0 58424K 28672K select 1:44 0.00% mgd
    2141 root 1 40 0 74804K 41744K select 1:03 0.00% cli
    1855 root 1 40 0 42212K 25888K select 0:59 0.00% dfwd
    1623 root 1 8 0 2324K 684K nanslp 0:22 0.00% cron
    2152 root 1 40 0 59232K 13352K select 0:11 0.00% mgd
    1789 root 1 8 0 2724K 1292K nanslp 0:10 0.00% getty
    1790 root 1 8 0 2724K 1304K nanslp 0:10 0.00% getty
    1835 root 1 40 0 2328K 952K select 0:08 0.00% watchdog
    1858 root 1 40 0 16336K 6376K select 0:06 0.00% rdd
    1911 root 1 40 0 2596K 1412K select 0:02 0.00% rlogind
    1910 root 1 5 0 2400K 1276K ttyin 0:01 0.00% rlogin
    71957 root 1 40 0 11692K 5540K select 0:00 0.00% sshd
    1913 root 1 8 0 8908K 3336K wait 0:00 0.00% login
    1868 root 1 4 0 14476K 5272K select 0:00 0.00% sendd
    1866 root 1 4 0 47956K 5176K select 0:00 0.00% mplsoamd
    71964 root 1 40 0 4888K 3180K pause 0:00 0.00% csh
    1914 root 1 4 0 2400K 300K sbwait 0:00 0.00% rlogin

     

     

    In addition to that top output:

     

    root@vx-core-b6-01> show system processes extensive
    last pid: 72363; load averages: 1.58, 1.15, 1.06 up 73+01:56:56 08:12:26
    180 processes: 6 running, 152 sleeping, 22 waiting

    Mem: 1101M Active, 87M Inact, 145M Wired, 324M Cache, 112M Buf, 203M Free
    Swap:


    PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
    1787 root 76 0 564M 209M RUN 549.0H 39.01% pfex_junos
    1787 root -52 -52 564M 209M ll_blo 549.0H 29.44% pfex_junos
    1844 root 72 0 54488K 36532K RUN 20.6H 5.18% l2ald
    45 root 12 0 0K 16K sleep 44.5H 4.25% netdaemon
    10 root 155 52 0K 16K RUN 895.6H 3.37% idle
    72358 root 41 0 74716K 41332K select 0:01 2.25% cli
    35 root -60 -163 0K 16K WAIT 26.9H 1.76% swi1: ipfwd
    1781 root 48 0 67352K 32676K RUN 36.9H 1.07% chassisd
    11 root -56 -159 0K 16K ll_blo 17.8H 0.93% swi2: netisr 0
    2 root -16 0 0K 16K jfe_jo 973:19 0.00% jfe_job_0_0
    1865 root 40 0 196M 99344K select 751:59 0.00% authd
    2135 root 4 0 0K 16K select 714:08 0.00% ppt_07_80000005
    1842 root 41 0 59072K 38708K select 552:06 0.00% mib2d
    23 root -68 -171 0K 16K WAIT 516:06 0.00% irq41: besw0
    12 root -36 -139 0K 16K RUN 476:38 0.00% swi7: clock
    1338 root 41 0 28700K 14288K select 411:26 0.00% eventd
    1876 root 40 0 14064K 3276K select 406:15 0.00% repd
    1876 root 4 0 14064K 3276K select 406:15 0.00% repd
    1861 root 4 0 65700K 37552K kqread 363:05 0.00% l2cpd
    1847 root 40 0 41232K 23148K select 349:52 0.00% pfed
    1864 root 40 0 108M 28624K select 341:01 0.00% jdhcpd
    1864 root 40 0 108M 28624K select 341:01 0.00% jdhcpd
    1877 root 4 0 77560K 31632K kqread 294:27 0.00% mcsnoopd
    1788 root 4 0 36024K 14444K kqread 276:43 0.00% chassism
    1878 root 42 0 26456K 14864K select 246:50 0.00% license-check
    1841 root 40 0 35772K 25068K select 177:30 0.00% snmpd
    14 root -16 0 0K 16K - 149:52 0.00% rand_harvestq
    8 root 8 0 0K 16K - 133:23 0.00% thread taskq
    1873 root 4 0 75288K 31036K kqread 129:59 0.00% dot1xd-secure
    1852 root 40 0 25448K 14652K select 113:30 0.00% ppmd
    1918 root 4 0 0K 16K select 106:36 0.00% ppt_11_80000013
    36 root -16 0 0K 16K client 101:42 0.00% ifstate notify
    1919 root 4 0 0K 16K select 96:56 0.00% ppt_11_80000014
    1786 root 4 -20 57788K 23800K kqread 93:34 0.00% vccpd
    1940 root 4 0 0K 16K select 91:50 0.00% ppt_11_80000010
    1922 root 4 0 0K 16K select 79:59 0.00% ppt_11_80000011
    1921 root 4 0 0K 16K select 74:58 0.00% ppt_11_80000012
    1870 root 40 0 21508K 8208K select 66:33 0.00% shm-rtsdbd
    1867 root 40 0 19724K 6224K select 64:56 0.00% bdbrepd
    1850 root 40 0 34244K 19512K select 39:06 0.00% cosd
    1850 root 40 0 34244K 19512K select 39:06 0.00% cosd
    3 root -16 0 0K 16K jfe_jo 35:18 0.00% jfe_job_1_0
    1843 root 4 0 155M 58804K kqread 30:22 0.00% rpd
    1843 root 4 0 155M 58804K kqread 30:22 0.00% rpd
    1836 root 40 0 2752K 1280K select 30:07 0.00% bslockd
    1838 root 40 0 24552K 14424K select 27:23 0.00% alarmd
    1785 root 40 0 12856K 5404K select 20:16 0.00% irsd
    1860 root 40 0 28476K 15748K select 20:10 0.00% lacpd
    1846 root 40 0 3476K 1500K select 15:32 0.00% tnp.sntpd
    63 root 12 0 0K 16K - 14:29 0.00% schedcpu
    41 root -4 0 0K 16K syncer 14:13 0.00% syncer
    38 root 155 52 0K 16K pgzero 13:54 0.00% pagezero
    1869 root 40 0 27044K 10016K select 13:05 0.00% smid
    42 root 40 0 0K 16K vnlrum 11:59 0.00% vnlru_mem
    1857 root 40 0 22096K 12032K select 11:19 0.00% bfdd
    1889 root 40 0 64788K 24164K select 10:39 0.00% dcd
    1882 root 40 0 23152K 12384K select 10:18 0.00% pkid
    1863 root 40 0 22992K 12980K select 9:52 0.00% lfmd
    57 root -16 0 0K 16K psleep 8:16 0.00% vmkmemdaemon
    4 root -8 0 0K 16K - 5:53 0.00% g_event
    5 root -8 0 0K 16K - 5:48 0.00% g_up
    1862 root 40 0 21740K 11848K select 5:33 0.00% oamd
    1840 root 40 0 9792K 8844K select 5:07 0.00% ntpd
    1851 root 40 0 35036K 16556K select 4:59 0.00% kmd
    6 root -8 0 0K 16K - 4:54 0.00% g_down
    1859 root 40 0 61360K 18220K select 4:50 0.00% dfcd
    1849 root 40 0 28256K 14760K select 4:16 0.00% rmopd
    1848 root 40 0 21432K 11936K select 3:59 0.00% ilmid
    1845 root 25 -15 21572K 11832K select 3:54 0.00% apsd
    1839 root 40 0 21436K 11088K select 3:51 0.00% craftd
    43 root -16 0 0K 16K sdflus 3:41 0.00% softdepflush
    27 root -84 -187 0K 16K WAIT 3:28 0.00% irq96: fman0
    21 root -68 -171 0K 16K WAIT 3:01 0.00% irq38: i2c0 i2c1
    1853 root 40 0 13228K 5644K select 2:56 0.00% lmpd
    1783 root 52 0 7168K 2596K select 2:06 0.00% inetd
    9 root 8 0 0K 16K - 1:57 0.00% kqueue taskq
    1782 root 40 0 58424K 28672K select 1:44 0.00% mgd
    54 root 40 0 0K 16K select 1:38 0.00% jsr_kkcm
    50 root 4 0 0K 16K select 1:34 0.00% devrt_kthread
    28 root -80 -183 0K 16K WAIT 1:24 0.00% irq44: ehci0
    34 root -52 -155 0K 16K WAIT 1:07 0.00% swi3: ip6opt ipopt
    2141 root 40 0 74804K 41744K select 1:03 0.00% cli
    1855 root 40 0 42212K 25888K select 0:59 0.00% dfwd
    19 root -44 -147 0K 16K WAIT 0:52 0.00% swi5: cambio
    40 root -4 0 0K 16K vlruwt 0:34 0.00% vnlru
    1 root 8 0 1632K 736K wait 0:33 0.00% init
    53 root -16 0 0K 16K psleep 0:32 0.00% vmuncachedaemon
    39 root -16 0 0K 16K psleep 0:31 0.00% bufdaemon
    1623 root 8 0 2324K 684K nanslp 0:22 0.00% cron
    64 root 12 0 0K 16K no_rs 0:22 0.00% rtimeshare_thr
    65 root -8 0 0K 16K mdwait 0:17 0.00% md0
    490 root -8 0 0K 16K mdwait 0:12 0.00% md22
    2152 root 40 0 59232K 13352K select 0:11 0.00% mgd
    1789 root 8 0 2724K 1292K nanslp 0:10 0.00% getty
    1790 root 8 0 2724K 1304K nanslp 0:10 0.00% getty
    327 root -8 0 0K 16K mdwait 0:08 0.00% md14
    1835 root 40 0 2328K 952K select 0:08 0.00% watchdog
    37 root -16 0 0K 16K psleep 0:07 0.00% pagedaemon
    1823 root 4 0 0K 16K select 0:07 0.00% ppt_11_80000010
    1858 root 40 0 16336K 6376K select 0:06 0.00% rdd
    58 root 8 0 0K 16K ifscli 0:03 0.00% ifsclientclosed
    1911 root 40 0 2596K 1412K select 0:02 0.00% rlogind
    356 root -8 0 0K 16K mdwait 0:02 0.00% md16
    1910 root 5 0 2400K 1276K ttyin 0:01 0.00% rlogin
    182 root -8 0 0K 16K mdwait 0:01 0.00% md4
    32 root -8 0 0K 16K usbevt 0:01 0.00% usb1
    29 root -8 0 0K 16K usbevt 0:01 0.00% usb0
    153 root -8 0 0K 16K mdwait 0:00 0.00% md2
    454 root -8 0 0K 16K mdwait 0:00 0.00% md21
    5736 root -8 0 0K 16K mdwait 0:00 0.00% md34
    641 root -8 0 0K 16K mdwait 0:00 0.00% md29
    317 root -8 0 0K 16K mdwait 0:00 0.00% md13
    22 root -64 -167 0K 16K WAIT 0:00 0.00% swi0: uart uart
    71957 root 40 0 11704K 5552K select 0:00 0.00% sshd
    269 root -8 0 0K 16K mdwait 0:00 0.00% md10
    1913 root 8 0 8908K 3336K wait 0:00 0.00% login
    211 root -8 0 0K 16K mdwait 0:00 0.00% md6
    1868 root 4 0 14476K 5272K select 0:00 0.00% sendd
    1866 root 4 0 47956K 5176K select 0:00 0.00% mplsoamd
    72359 root 41 0 58464K 6160K select 0:00 0.00% mgd
    71964 root 40 0 4888K 3180K pause 0:00 0.00% csh
    1914 root 4 0 2400K 300K sbwait 0:00 0.00% rlogin
    2103 root 64 0 5120K 3056K pause 0:00 0.00% csh
    1837 root 4 0 4812K 2020K select 0:00 0.00% tnetd
    415 root -8 0 0K 16K mdwait 0:00 0.00% md20
    1856 root 4 0 8332K 2752K select 0:00 0.00% rtspd
    1791 root 8 0 2956K 1492K wait 0:00 0.00% sh
    346 root -8 0 0K 16K mdwait 0:00 0.00% md15
    654 root -8 0 0K 16K mdwait 0:00 0.00% md30
    0 root 12 0 0K 0K WAIT 0:00 0.00% swapper
    5726 root -8 0 0K 16K mdwait 0:00 0.00% md33
    628 root -8 0 0K 16K mdwait 0:00 0.00% md28
    386 root -8 0 0K 16K mdwait 0:00 0.00% md18
    288 root -8 0 0K 16K mdwait 0:00 0.00% md11
    72363 root 44 0 23560K 1944K RUN 0:00 0.00% top
    240 root -8 0 0K 16K mdwait 0:00 0.00% md8
    589 root -8 0 0K 16K mdwait 0:00 0.00% md25
    376 root -8 0 0K 16K mdwait 0:00 0.00% md17
    667 root -8 0 0K 16K mdwait 0:00 0.00% md31
    3223 root 40 0 58432K 2804K select 0:00 0.00% mgd
    201 root -8 0 0K 16K mdwait 0:00 0.00% md5
    405 root -8 0 0K 16K mdwait 0:00 0.00% md19
    680 root -8 0 0K 16K mdwait 0:00 0.00% md32
    602 root -8 0 0K 16K mdwait 0:00 0.00% md26
    230 root -8 0 0K 16K mdwait 0:00 0.00% md7
    1854 root 8 0 2400K 1228K nanslp 0:00 0.00% smartd
    563 root -8 0 0K 16K mdwait 0:00 0.00% md23
    143 root -8 0 0K 16K mdwait 0:00 0.00% md1
    576 root -8 0 0K 16K mdwait 0:00 0.00% md24
    172 root -8 0 0K 16K mdwait 0:00 0.00% md3
    615 root -8 0 0K 16K mdwait 0:00 0.00% md27
    259 root -8 0 0K 16K mdwait 0:00 0.00% md9
    298 root -8 0 0K 16K mdwait 0:00 0.00% md12
    51 root 4 0 0K 16K select 0:00 0.00% if_pic_listen0
    56 root 12 0 0K 16K condsl 0:00 0.00% delayedexecd
    31 root -80 -183 0K 16K WAIT 0:00 0.00% irq45: ehci1
    52 root 4 0 0K 16K purge_ 0:00 0.00% kern_pir_proc
    25 root -84 -187 0K 16K WAIT 0:00 0.00% irq16: bman0 qman0+
    7 root 8 0 0K 16K - 0:00 0.00% mastership taskq
    49 root 4 0 0K 16K pfenoc 0:00 0.00% if_pfe_listen
    48 root 4 0 0K 16K dump_r 0:00 0.00% kern_dump_proc
    59 root 8 0 0K 16K - 0:00 0.00% nfsiod 0
    1449 root 8 0 0K 16K crypto 0:00 0.00% crypto
    60 root 8 0 0K 16K - 0:00 0.00% nfsiod 1
    61 root 8 0 0K 16K - 0:00 0.00% nfsiod 2
    62 root 8 0 0K 16K - 0:00 0.00% nfsiod 3
    1450 root 8 0 0K 16K crypto 0:00 0.00% crypto returns
    55 root -20 0 0K 16K jsr_js 0:00 0.00% jsr_jsm
    30 root 8 0 0K 16K usbtsk 0:00 0.00% usbtask
    17 root -28 -131 0K 16K WAIT 0:00 0.00% swi9: Giant taskq
    18 root -28 -131 0K 16K WAIT 0:00 0.00% swi9: task queue
    16 root -32 -135 0K 16K WAIT 0:00 0.00% swi8: +
    13 root -40 -143 0K 16K WAIT 0:00 0.00% swi6: vm
    33 root -48 -151 0K 16K WAIT 0:00 0.00% swi4: ip6_mismatch+
    44 root -56 -159 0K 16K WAIT 0:00 0.00% swi2: ndpisr-E
    46 root -56 -159 0K 16K WAIT 0:00 0.00% swi2: ndpisr-I
    20 root -68 -171 0K 16K WAIT 0:00 0.00% irq25: syspld0
    15 root -84 -187 0K 16K WAIT 0:00 0.00% mpfe_drv_taskq16: +
    24 root -84 -187 0K 16K WAIT 0:00 0.00% irq105: bman0
    26 root -84 -187 0K 16K WAIT 0:00 0.00% irq104: qman0

     

     

    Can someone clarify why there are two pfex_junos processes running and what the state "ll_blo" means - I found nothing clarifying that? Is there maybe a hung process, which is causing the additional utilization?

     

    Thanks in advance,

    Julian



  • 2.  RE: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

     
    Posted 06-27-2019 07:59

    Hi,

     

    PFEX is the Packet Forwarding Engine Manager process on EX4300. EX4300 is a microkernel CPU based platform, and require this process make BSD OS calls. The cause of the CPU spike in this process requires further troubleshooting, perhaps a JTAC case is needed.

     

    Following symptomts data could be checked - 

    1. Any MAC move notifications on the system?

    2. Any scheduler slips?

    3. Non Juniper/unsupported optics? 

           - request pfe execute command "show sfp list" target fpc<no>

           -  show chassis pic fpc-slot <> pic-slot <>

    4. Check uKernel CPU util for all FPCs in VC

          - request pfe execute command "show threads" target fpc<no>

          - request pfe execute command "show sched" target fpc<no>

          - request pfe execute command "show sfp list" target fpc<no>



  • 3.  RE: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

    Posted 06-28-2019 01:01

    Hello pranavs,

     

    thank you for your answer!

     

     

    1)

    I have configured this VC to send syslog messages to our graylog. This is the current configuration:

     

    root@vx-core-b6-01# show system syslog
    user * {
    any emergency;
    }
    host 192.168.1.100 {
    authorization error;
    daemon error;
    ftp none;
    security error;
    kernel error;
    user none;
    dfc error;
    external error;
    firewall none;
    pfe error;
    conflict-log error;
    change-log any;
    interactive-commands none;
    source-address 192.168.1.1;
    }
    file messages {
    any notice;
    authorization info;
    }
    file interactive-commands {
    interactive-commands any;
    }

     

    In order to see MAC move notifications in our graylog, do I have to explicit configure the following or will the VC send strange MAC movements nevertheless to our graylog:

     

    [edit switch-options]
    user@switch# set mac-notification
     
    Currently there is nothing regarding mac move notifications configured and I see no messages regarding mac move in our graylog.
     
     
    2)
    Regarding scheduler slips - there are no syslog entries regarding scheduler slips. Can you maybe clarify what to check?
     
     
    3)
    After checking - there are indeed unsupported optics in use. There are even multiple syslog messages regarding that:
     
    vx-core-b6-01 fpc1 [EX-BCM PIC] ex_bcm_pic_optics_diags: Failed to read eeprom of Optic 1 of Pic 2
     
    and 
     
    vx-core-b6-01 fpc1 Link 51 FAILED
     
    But the device, which is on that optic/sfp, is reachable and functional - Should I change the optic as soon as possible?
     
    Here are some informations on that sfp:

    {master:0}

    root@vx-core-b6-01> show interfaces diagnostics optics ge-1/2/1

    (no output here, because the VC cannot read data from the unsupported sfp?)

     

    {master:0}
    root@vx-core-b6-01> show interfaces ge-1/2/1
    Physical interface: ge-1/2/1, Enabled, Physical link is Up
    Interface index: 857, SNMP ifIndex: 813
    Description: L12/08
    Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 1000mbps, Duplex: Full-Duplex, BPDU Error: None, Loop Detect PDU Error: None, Ethernet-Switching Error: None,
    MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online, Media type: Fiber
    Device flags : Present Running
    Interface flags: SNMP-Traps Internal: 0x0
    Link flags : None
    CoS queues : 12 supported, 12 maximum usable queues
    Current address: c0:42:d0:c0:d2:78, Hardware address: c0:42:d0:c0:d2:78
    Last flapped : 2019-04-15 06:20:36 UTC (10w4d 00:15 ago)
    Input rate : 83192 bps (66 pps)
    Output rate : 2520680 bps (254 pps)
    Active alarms : None
    Active defects : None
    PCS statistics Seconds
    Bit errors 0
    Errored blocks 0
    Ethernet FEC statistics Errors
    FEC Corrected Errors 0
    FEC Uncorrected Errors 0
    FEC Corrected Errors Rate 0
    FEC Uncorrected Errors Rate 0
    Interface transmit statistics: Disabled

    Logical interface ge-1/2/1.0 (Index 776) (SNMP ifIndex 814)
    Flags: Up SNMP-Traps 0x0 Encapsulation: Ethernet-Bridge
    Input packets : 2774878
    Output packets: 333076
    Protocol eth-switch, MTU: 1514

     
    Here is some additional output regarding the pic-slot 2 fpc-slot 1:
     

    root@vx-core-b6-01> show chassis pic pic-slot 2 fpc-slot 1
    FPC slot 1, PIC slot 2 information:
    Type 4x 1G/10G SFP/SFP+
    State Online
    PIC version 3.2
    Uptime 74 days, 18 minutes, 28 seconds

    PIC port information:
    Fiber Xcvr vendor Wave- Xcvr JNPR
    Port Cable type type Xcvr vendor part number length Firmware Rev
    0 10GBASE SR MM OEM GBIC-SFP+-SR-JEX 850 nm 0.0 REV 01
    1 GIGE 1000LX10 SM FINISAR CORP. FTRJ1319P1BTL-PT 1310 nm 0.0
    2 GIGE 1000SX MM Juniper OEM SFP-GE-SX-J 850 nm 0.0 REV 01
    3 GIGE 1000SX MM Juniper OEM SFP-GE-SX-JEX 850 nm 0.0 REV 01

     

    The "faulty" sfp should be the one from "FINISAR CORP." - I have read in another thread, that some "LX10" Gbics are not supported and could cause trouble, but Iam unable to find the thread right now.

     

    4)

    I am not able to use these commands. Can you please clarify?

    Our VC uses Junos with ELS.

     

    Greetings,

    Julian_V



  • 4.  RE: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

     
    Posted 06-28-2019 07:38

    Based on the data you have shared, it makes sense to atleast temporarily remove the optics on ge-1/2/1 and see whether CPU util subsides.

     

    For the "show sched" and "show thread" commands, what is the exact command you are using? If you suspect FPC1 contributing to the high CPU, you want to use the following -

     

    labroot@router> start shell pfe network fpc1
    
    
    BSD platform (QorIQ P202H processor, 0MB memory, 0KB flash)
    
    (vty)# show threads
    PID PR State     Name                   Stack Use  Time (Last/Max/Total) cpu
    --- -- -------   ---------------------  ---------  ---------------------
      1 H  asleep    Maintenance            896/50272  0/8/8 ms  0%
      2 L  running   Idle                  1976/50280  2/726/3561366458 ms 84%
      3 H  asleep    Timer Services        1944/50280  0/76/5849591 ms  0%
    <...>
    
    (vty)# show threads    
    PID PR State     Name                   Stack Use  Time (Last/Max/Total) cpu
    --- -- -------   ---------------------  ---------  ---------------------
      1 H  asleep    Maintenance            896/50272  0/8/8 ms  0%
      2 L  running   Idle                  1976/50280  0/726/3561489801 ms 84
    <...>
    
    (vty)# show sfp list    
    SFP Toolkit summary:
    wakeup count: 4188089, debug: 0, periodic enabled: 1, diagnostics enabled: 1
      thread: 0x049ec378, itable: 0x049df0f8, itable iterator: 0, sem: 0x049f96f8
      polling interval delay:  1000 ms, polling max cpu:  100 ms
      poll for diags every  3 wakeups , SFPs polled for diags last time:  4
      last periodic CPU time:    1 ms, maximum periodic CPU time:     21 ms
      max SFPs to poll per interval: 0
    SFP Toolkit syslog throttling parameters:
      period: 120 samples , disable threshold:  10, enable threshold   0
      SFP periodic diagnostics polling count: 65535
    
                                                                             diag
    Index   Name             Presence     ID Eprom  PNO         SNO          calibr   Toxic
    -----   --------------   ----------   --------  ----------  ----------- -------  -------
        1   Uplink SFP+ PIC(0)      Present   Complete  740-031980  AA1745300XB     int    Unknown
        2   Uplink SFP+ PIC(1)      Present   Complete  740-021308  ANM1PUP         int    Unknown
        3   Uplink SFP+ PIC(2)      Present   Complete  740-031980  MUC2CL8         int    Unknown
        4   Uplink SFP+ PIC(3)      Present   Complete  740-031980  ALP12X4         int    Unknown
    
    I2C Acceleration table 
    Index   Name             Presence     ID Eprom     Reg ID    I2C Master    I2C Group
    -----   --------------   ----------   --------    --------    ------------   -------
    <...>
    
    From the above output, get the index for ge-1/2/1, and run "show sfp <index> info", like following example- 
    
    (vty)# show sfp 1 info      
      index:                  0x01
      sfp name:               Uplink SFP+ PIC
      pic context:            0x05B6CE30
      id mem scanned:         true
      linkstate:              Up
      sfp_present:            true
      sfp_changed:            false
      i2c failure count:      0x0
      diag polling count:     0x5BC1
      no diag polling from RE:0x0
      run_periodic:           false
    
    

     

     

    Regarding you questy about the sched slips, one of the symptoms that occur with high kernel CPU usage are messages with RPD_SCHED_SLIP in the logs, that is what you could grep on.

    show log messages | match RPD_SCHED_SLIP

    Jul 30 12:24:11 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 7 sec scheduler slip, user: 1 sec 339119 usec, system: 0 sec, 0 usec       Jul 30 12:25:29 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 1 sec 431622 usec, system: 0 sec, 0 usec       Jul 30 12:25:37 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 1 sec 528918 usec, system: 0 sec, 74784 usec 
    Jul 30 17:47:55 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 0 sec 526784 usec, system: 0 sec, 4608 usec

    Jul 30 17:48:03 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 1 sec 285283 usec, system: 0 sec, 19077 usec

     



  • 5.  RE: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

    Posted 06-30-2019 23:53

    Hello pranvas,

     

    thank you very much for your insightful and very detailed reply.

     

    I could collect the following informations:

     

     

    root@vx-core-b6-01> start shell pfe network fpc1

     

    BSD platform (QorIQ P202H processor, 0MB memory, 0KB flash)

    (vty)# show threads
    PID PR State Name Stack Use Time (Last/Max/Total) cpu
    --- -- ------- --------------------- --------- ---------------------
    1 H asleep Maintenance 904/50272 0/0/0 ms 0%
    2 L running Idle 41864/50280 1/859/5810875810 ms 87%
    3 H asleep Timer Services 2880/50272 0/119/7273751 ms 0%
    5 L asleep Ukern Syslog 832/52328 0/0/0 ms 0%
    6 L asleep Sheaf Background 2008/50272 0/54/13518 ms 0%
    7 M asleep mac_db 808/56416 0/0/0 ms 0%
    8 M asleep QSFP 41568/52328 0/231/2530975 ms 0%
    9 M asleep SFP 6744/52320 0/96/232068 ms 0%
    10 L asleep Firmware Upgrade 832/52320 0/0/0 ms 0%
    11 L asleep Syslog 3456/52328 0/48/3257 ms 0%
    12 M asleep Periodic 1928/56416 0/115/192475 ms 0%
    13 H asleep TTP Receive 3608/52328 0/142/14222264 ms 0%
    14 H asleep TTP Transmit 808/52320 1/1/1 ms 0%
    15 L asleep DFW Alert 2336/52320 0/49/59387 ms 0%
    16 M asleep L2ALM Manager 43904/56424 0/222/4657739 ms 0%
    17 M asleep ARP Proxy 2136/56424 0/104/713219 ms 0%
    18 M asleep RPM Msg thread 816/56416 0/0/0 ms 0%
    19 H asleep Pfesvcsor 1936/56424 0/118/41958 ms 0%
    20 L asleep NH Probe Service 816/52320 0/0/0 ms 0%
    21 L asleep ICMP6 Input 1080/52320 2/2/2 ms 0%
    22 L asleep IP6 Option Input 3360/52328 1/1/1 ms 0%
    23 L asleep ICMP Input 3384/52328 2/2/2 ms 0%
    24 L asleep IP Option Input 3344/52320 0/0/0 ms 0%
    25 M asleep IGMP Input 3344/52328 0/0/0 ms 0%
    26 M asleep PRIVATE PFE Manager 4568/64608 0/46/1759 ms 0%
    27 M asleep PPM Manager 6712/56416 0/100/103003 ms 0%
    28 M asleep PPM Data thread 2080/64608 0/111/498920 ms 0%
    29 M asleep VRRP Manager 3384/56424 8/8/8 ms 0%
    30 L ready Virtual Console 3648/64616 0/24/45 ms 0%
    31 L asleep Virtual Console (private) 2560/64616 0/0/0 ms 0%
    33 L asleep Console 3624/64616 4/4/4 ms 0%
    34 M asleep PFE Manager 54776/64608 0/521/26630505 ms 0%
    35 L asleep SFlow sample deamon 2008/56424 0/117/257284 ms 0%
    36 M asleep bcmDPC 976/57440 0/0/0 ms 0%
    37 M asleep socdmadesc.0 2184/57440 0/120/4403265 ms 0%
    38 M asleep bcmIbodSync.0 2104/57448 0/116/2243868 ms 0%
    40 M asleep bcmXGS3AsyncTX 1080/57440 2/2/2 ms 0%
    41 M asleep bcmL2MOD.0 2520/57448 0/119/1835539 ms 0%
    42 L asleep pfe_bcm_age 6320/64616 0/148/878017 ms 0%
    43 L asleep pfe_bcm_sync 1592/56416 0/117/114075 ms 0%
    44 M asleep PIC Periodic 2992/56424 1/678/22693966 ms 0%
    45 M asleep PIC 808/52320 0/0/0 ms 0%
    46 M asleep bcmLINK.0 43472/106600 2/729/383510049 ms 5%
    47 M asleep bcmCNTR.0 42152/57448 0/540/36695080 ms 0%
    48 M asleep CMVC 42376/64608 1/15/33 ms 0%
    49 M asleep TNPC CM 45512/64608 0/120/1850993 ms 0%
    50 M asleep PFE Statistics 3584/64616 1/122/6608168 ms 0%
    51 L asleep AS Packet Process 2912/56416 7/7/12 ms 0%
    52 M asleep bulkget Manager 6952/56416 0/58/9020 ms 0%
    53 L asleep Cattle-Prod Daemon 3744/64608 0/0/0 ms 0%
    54 L asleep Virtual Console 2560/64616 0/0/0 ms 0%

     


    (vty)# show sfp list
    SFP Toolkit summary:
    wakeup count: 6645201, debug: 0, periodic enabled: 1, diagnostics enabled: 1
    thread: 0x4e2fe80, itable: 0x4e22bc0, itable iterator: 0, sem: 0x4e30250
    polling interval delay: 1000 ms, polling max cpu: 100 ms
    poll for diags every 3 wakeups , SFPs polled for diags last time: 4
    last periodic CPU time: 1 ms, maximum periodic CPU time: 96 ms
    max SFPs to poll per interval: 0
    SFP Toolkit syslog throttling parameters:
    period: 120 samples , disable threshold: 10, enable threshold 0
    SFP periodic diagnostics polling count: 65535

    diag
    Index Name Presence ID Eprom PNO SNO calibr Toxic
    ----- -------------- ---------- -------- ---------- ----------- ------- -------
    1 Uplink SFP+ PIC(0) Present Complete 740-021308 M4301605 int Unknown
    2 Uplink SFP+ PIC(1) Present Complete NON-JNPR P8R11ST EXT** Unknown
    3 Uplink SFP+ PIC(2) Present Complete 740-011613 1171324439 int Unknown
    4 Uplink SFP+ PIC(3) Present Complete 740-011613 1174522093 int Unknown

    I2C Acceleration table
    Index Name Presence ID Eprom Reg ID I2C Master I2C Group
    ----- -------------- ---------- -------- -------- ------------ -------


    This should be a supported Gbic:

    (vty)# show sfp 1 info
    index: 0x01
    sfp name: Uplink SFP+ PIC
    pic context: 0x5f370e8
    id mem scanned: true
    linkstate: Up
    sfp_present: true
    sfp_changed: false
    i2c failure count: 0x0
    diag polling count: 0xE198
    no diag polling from RE:0x0
    run_periodic: false


    This should be the unsupported one:

    (vty)# show sfp 2 info
    index: 0x02
    sfp name: Uplink SFP+ PIC
    pic context: 0x5f370e8
    id mem scanned: true
    linkstate: Up
    sfp_present: true
    sfp_changed: false
    i2c failure count: 0x1
    diag polling count: 0xCCB1
    no diag polling from RE:0x0
    run_periodic: false

     

    (vty)# show sched
    Total uptime 77+00:18:38, (6653918068 ms), 3496177384 thread dispatches
    CPU load is 26% (5 second), 12% (1 minute)
    Total network interrupt time 0 (usec)

    CPU Name Time(ms)
    87% Idle 5811219219
    12% Threads 842698849
    0% ISR 0


    root@vx-core-b6-01> show log messages | match RPD_SCHED_SLIP

    {master:0}

    No output here.

     

    Can you see something strange or do you think the Gbic could be the cause? Right now Iam not able to unplug the gbic because there is a storage behind it which is in use.

     

    And again, thank you for your time,

    Julian_V



  • 6.  RE: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)
    Best Answer

     
    Posted 07-01-2019 07:02

    From this output, i2c failure count = 0x1, points to failure in I2C bus - which is used to provide temperature, voltage and IDEEPROM info. Rather than unsupported optic, this optic may have a minor HW fault. When its feasible, it would be worthwile to try and un-plug this optic and monitor the CPU usage and PFEX process usage to confirm or rule out this being the root cause.

     

     

    (vty)# show sfp 2 info
    index: 0x02
    sfp name: Uplink SFP+ PIC
    pic context: 0x5f370e8
    id mem scanned: true
    linkstate: Up
    sfp_present: true
    sfp_changed: false
    i2c failure count: 0x1
    diag polling count: 0xCCB1
    no diag polling from RE:0x0
    run_periodic: false