This message was posted by a user wishing to remain anonymous
Hi all,
I encountered a strange issue on an MX480 and would like to ask whether anyone has seen something similar.
Device information:
Platform: MX480
MPC: MPC10E
Junos: 23.4R2-S3.9
Issue summary:
At around May 19 14:32:04, I committed a configuration change related to an IPv4 firewall filter. The change was only to reorder terms in the firewall filter using insert; it was not a large-scale rule addition.
Shortly after the commit, both FPC0 and FPC1 crashed. Rolling back the configuration and manually restarting the FPCs did not recover the issue. The device only recovered after I manually shut down the BGP neighbor 10.0.0.1.
I am not sure whether the trigger was the IPv4 firewall filter change, the BGP neighbor behavior, or a combination of both.
Relevant logs are below.
From log messages:
May 19 14:32:04 MX480_23.4R2-S3.9 mgd[81708]: UI_COMMIT: User 'ADJohn' requested 'commit' operation (comment: none)
May 19 14:32:10 MX480_23.4R2-S3.9 ffp: "dynamic-profiles": No change to profiles
May 19 14:32:10 MX480_23.4R2-S3.9 mgd[81708]: UI_COMMIT_CONFIRMED_REMINDER: 'commit confirmed' must be confirmed within 5 minutes
May 19 14:32:10 MX480_23.4R2-S3.9 mgd[81708]: UI_COMMIT_COMPLETED: : commit complete
May 19 14:32:11 MX480_23.4R2-S3.9 rpd[21812]: bgp_handle_notify:5250: NOTIFICATION received from 10.0.0.1 (External AS 44324): code 6 (Cease) subcode 6 (Other Configuration Change)
May 19 14:32:11 MX480_23.4R2-S3.9 kernel: jsr_unreplicate: INFO: unreplicating handle 0x100027a0000001c, laddr 10.1.29.1, lport 179, faddr 10.0.0.1, fport 52977, rtb_idx 17, due to error 0, msg Application asked for unreplication
May 19 14:32:39 MX480_23.4R2-S3.9 rpd[21812]: BGP_NLRI_MISMATCH: bgp_process_caps: mismatch NLRI with 10.0.0.1 (External AS 44324): peer: <inet-unicast inet6-unicast>(17) us: <inet-unicast>(1) (instance Standard)
May 19 14:32:39 MX480_23.4R2-S3.9 kernel: jsr_iha_pri_sock_init: INFO: Invoked for pri handle 0x100039000000001, sec handle 0xffffffffffffffff, laddr 10.1.29.1, lport 51721, faddr 10.0.0.1, fport 179, rtb_idx 17
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 0 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 1 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 2 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 3 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 4 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 5 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 6 Errors thread timeout error
May 19 14:32:40 MX480_23.4R2-S3.9 fpc0 user.err ztchip-luss: ppe_error_interrupt(8377): ZT[0:0].slice[0]_PPE 7 Errors thread timeout error
...
From log chassisd:
May 19 13:56:48 ch_gencfg_chassis_startup_time_blob_set: Adding blob for chassis startup time for key aaaaaaaa keylen 4 , 1736521775.754294, blob pointer bf29a88
May 19 13:56:48 ch_gencfg_update_startup_time_blob: Updated hw.chassis.startup_time to 1736521775.754294 (RE)
May 19 13:56:51 ch_gencfg_chassis_startup_time_handler: master_re: true, GENCFG_CHASSIS_STARTUP_TIME, minor_type: 8
May 19 14:32:40 CHASSISD_FPC_ASIC_ERROR: <FPC 0> ASIC Error detected errorno 0x00040098 (null)
May 19 14:32:47 CHASSISD_FPC_ASIC_ERROR: <FPC 1> ASIC Error detected errorno 0x00040098 (null)
May 19 14:32:50 CHASSISD_FPC_ASIC_ERROR: <FPC 0> ASIC Error detected errorno 0x00040098 (null)
May 19 14:33:10 CHASSISD_FPC_ASIC_ERROR: <FPC 0> ASIC Error detected errorno 0x00040098 (null)
May 19 14:33:50 CHASSISD_FPC_ASIC_ERROR: <FPC 0> ASIC Error detected errorno 0x00040098 (null)
Timeline:
May 19 14:32:04
Committed an IPv4 firewall filter change. The main change was reordering terms in the firewall filter using insert.
May 19 14:32:10
Commit completed.
May 19 14:32:11
Received a BGP NOTIFICATION from peer 10.0.0.1:
code 6 Cease, subcode 6 Other Configuration Change.
May 19 14:32:39
BGP_NLRI_MISMATCH appeared:
the peer advertised <inet-unicast inet6-unicast>, while the local side had <inet-unicast>.
May 19 14:32:40
FPC0 started reporting ztchip-luss PPE thread timeout errors, followed by CHASSISD_FPC_ASIC_ERROR errorno 0x00040098.
May 19 14:32:47
FPC1 also reported the same CHASSISD_FPC_ASIC_ERROR errorno 0x00040098.
May 19 14:32:50 to 14:33:50
FPC0 continued reporting 0x00040098 ASIC errors.
Recovery attempts:
Rolled back the firewall filter configuration change, but the issue did not recover.
Manually restarted the FPCs, but the issue still did not recover.
The device recovered only after I manually shut down BGP neighbor 10.0.0.1.
What I find confusing:
The firewall filter change was only a term reorder. Why would this trigger PPE thread timeout errors or FPC crashes?
The BGP peer 10.0.0.1 sent a Cease / Other Configuration Change immediately after the commit, and then BGP_NLRI_MISMATCH appeared. Could this BGP behavior have contributed to the FPC/PPE issue?
Is errorno 0x00040098 a known ZT / LUSS / PPE timeout issue on MPC10E?
Since both FPC0 and FPC1 reported 0x00040098, does this suggest that the issue is less likely to be a single FPC hardware fault and more likely related to software, configuration, or a specific traffic/route-update trigger?
Why would rollback and manual FPC restart not recover the issue, while shutting down the BGP peer 10.0.0.1 did?
Has anyone seen similar ztchip-luss PPE thread timeout errors or CHASSISD_FPC_ASIC_ERROR errorno 0x00040098 on MX480 with MPC10E, especially after an IPv4 firewall filter change or during BGP capability/NLRI mismatch events?
Any suggestions on possible root cause or troubleshooting direction would be appreciated.
-------------------------------------------