We have currently had an issue on line card reeboted few times on differnet set of juniper mx's
Evertime we see the following message before the LC goes down:
Jun 16 11:11:59.000 CAMGW-R01 : %PFE-5: fpc4 user.notice logger: /usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elfJun 16 11:12:06.524 CAMGW-R01 chassisd: %DAEMON-5-CHASSISD_SNMP_TRAP10: SNMP trap generated: Fru Offline (jnxFruContentsIndex 7, jnxFruL1Index 5, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC: MPC9E 3D @ 4/*/*, jnxFruType 3, jnxFruSlot 4, jnxFruOfflineReason 23, jnxFruLastPowerOff 2928237, jnxFruLastPowerOn 1065858324)
Do you see any core dumps generated on the box? also check the nvram logs from the fpc shell to get more clues.
Below are the commands to check
show system core-dump <<< will show if there are any new core dumps generated
<<< To check the fpc nvram logs
start shell pfe network fpc<>
#show nvram <<< this will give us some clue on why the fpc rebooted
#show syslog messages
you may refer this kb to know the reason for FPC reboot, here the reason is reconnect
we might need to check the logs to confirm the exact reason for FRU reconnect
looks like a segmentation fault issue. please get the FPC syslog and nvram outputs:
start shell pfe network fpc<fpc number>
Nvram does shows segmentataion fault and there is a fpc core too along with CPU spikes . we have got our partner team to get a jtac case opened. Model: mx2020 Junos: 15.1F5-S4.6
Ukern boot[LOG] jdid_main: JDID mode check failed[LOG] iffpc_jam_core_module_init: Registering valid jam vectors[LOG] Set the IP IRI for table #1 to 0x80000014[LOG] IPV4 Init: Set the IP IRI to 0x80000014[LOG] ddos_issu_helper_register: issu state is 0 at ddos startup[LOG] ddos sock support init ...[LOG] ddos sock connection proto in use 7 ...[LOG] RSMON rsmon_msg_thread_init[LOG] if_module_init: Zeroing out jam vectors--------------------------------------Segmentation Fault!/usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elf
So this segmentataion fault is memory issue on the fpc.
Segmentation Fault is the fault raised by hardware whenever a program or a process tries to read or write a restricted memory Location. This fault will be notified to operating system by the OS kernel. OS kernel sends this fault to the offending process where the process after receiving the fault solves it within some time otherwise gets crashed. So in this case segmentation fault looks like to be the effect and not the cause. Since you are saying there was high CPU probably the process that caused the high cpu casued the crash/segfault. FPC core analysis would explain further what the offending process was, however, a core would be best analyzed on a JTAC case .
MPC9E with 15.1F5-S4.6 code I believe your MX's are seeing a lot of route churns causing the high cpu,perhaps you can take a look on this PR:
There can be multiple reasons for reboot. You need to provide moe info to further isolate
>check for core-dump using 'show system core-dump' and this file has to analysed.
>check for any errors below Cb and FPC link using "show chassis ethernet-switch statistics"
>Check nvram logs using request pfe execute target fpc4 command "show nvram"
> The issue could be with the improper seating of card in slot 4
> We cannot rule out a hardware problem if this is happening repeatedly.
Have seen such condition with Memory issue in the FPC. Check for any DDRIF memory issue or any UCODE data error in messages log or Syslog messages of the FPC. Also, check out for thread usage in the FPC, CPU graph or memory fragmentation of the FPC. Any core dump for this FPC? Any sort of DDOS violation for this FPC? We have seen issues in software pertaining to the thread usage/core dump generated. Please let know of the version involved.
/usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elf
[LOG] jdid_main: JDID mode check failed
[LOG] iffpc_jam_core_module_init: Registering valid jam vectors
[LOG] Set the IP IRI for table #1 to 0x22000080
[LOG] IPV4 Init: Set the IP IRI to 0x22000080
[LOG] ddos_issu_helper_register: issu state is 0 at ddos startup
[LOG] ddos sock support init ...
[LOG] ddos sock connection proto in use 7 ...
[LOG] RSMON rsmon_msg_thread_init
[LOG] if_module_init: Zeroing out jam vectors
yes saw similar nvram messages:
Not an issue to be discussed in JNET forum. Please open JTAC ticket.
Normally for a FPC card rebooting issue, we need to check more information from the PFE shell starting with "show syslog messages“ and "show nvram". Also there could be core dump file generated. It is better for you to open a JTAC ticket with RSI and /var/log uploaded to the case.