bonsoir Luke and Tgreaser
I am pursuing the
https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1491905 possibility. Here is what I found today:
the old switch is still running after being disconnected and the tafic connected to the new switch. So I have the down switch still spitting messages and refusing to talk to anybody except from console.
the PR mentions ...
To check if the device has high CPU load due to this issue, the administrator can issue the following command:
user@host> show chassis routing-engine
Routing Engine status:
..
Idle 2 percent
the "Idle" value shows as low (2 % in the example above), and also the following command:
user@host> show system processes summary
..
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
11639 root 52 0 283M 11296K select 12:15 44.97% eventd
11803 root 81 0 719M 239M RUN 251:12 31.98% fxpc{FXPC}
the eventd and the fxpc processes might use higher WCPU percentage (respectively 44.97% and 31.98% in the above example).
here is what I found on my good switch ...
root@ReseauBiblio-SitePrincipal> show system processes summary
...
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 155 ki31 0K 16K RUN 129.6H 81.98% idle
10156 root -52 r0 727M 248M select 311:55 4.98% fxpc{fxpc}
10156 root 52 0 727M 248M select 387:50 3.96% fxpc{fxpc}
21 root -16 - 0K 16K - 179:28 0.98% rand_harvestq
10177 root 20 0 486M 131M select 64:30 0.98% authd
{master:0}
root@ReseauBiblio-SitePrincipal> show chassis routing-engine Routing Engine status:
....
5 sec CPU utilization:
User 6 percent
Background 0 percent
Kernel 9 percent
Interrupt 1 percent
Idle 83 percent
...
now here are the readings from the sick switch
root@ReseauBiblio-SitePrincipal> show system processes summary
last pid: 11709; load averages: 3.33, 3.49, 3.47 up 5+16:26:53 15:28:45
...
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
4894 root -52 r0 727M 249M select 56.9H 51.95% fxpc{fxpc}
4894 root 78 0 727M 249M RUN 24.9H 21.97% fxpc{fxpc}
4869 root 78 0 287M 12144K RUN 23.3H 19.97% eventd
21 root -16 - 0K 16K - 162:25 0.98% rand_harvestq
{master:0}
root@ReseauBiblio-SitePrincipal> show chassis routing-engine Routing Engine status:
...
5 sec CPU utilization:
User 69 percent
Background 0 percent
Kernel 30 percent
Interrupt 1 percent
Idle 0 percent
...
i'll send this later to JTAC to know what they think. Right now, I am hoping for this switch to go down like all the others. I'll then install 18.2R3S5 on the old one and transfer back the wiring and, hopefully, call it problem solved.
then all I have to do is to install the image on the other 229 .switches .. :-(
Michel
------------------------------
Michel Lapointe
------------------------------
Original Message:
Sent: 12-03-2020 15:32
From: Unknown User
Subject: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer
I had this happen once. It's a good example of the clear separation between the RE (control plane) and the PFE (data plane).
In my case, there was no fancy fix. I rebooted the switch, then I put the JTAC recommended Junos on it.
Haven't had the problem reoccur since.
Original Message:
Sent: 12-02-2020 15:02
From: Michel Lapointe
Subject: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer
Bonjour everybody,
I have 200+ EX2300 running smoothly at various client sites. Just letting trafic goes through between 2 interface so I can keep an eye on it.
recently, I lost remote contact with one switch. All trafic kept flowing, but no access to my management vlan.
plugging at console, I noticed all the log messages file absolutely filled with this .
Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 brcm_pkt_buf_alloc:393 (buf alloc) failed allocating packet bufferNov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 (buf alloc) failed allocating packet bufferNov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 brcm_pkt_buf_alloc:393 (buf alloc) failed allocating packet bufferNov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 (buf alloc) failed allocating packet bufferNov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 brcm_pkt_buf_alloc:393 (buf alloc) failed allocating packet bufferNov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 (buf alloc) failed allocating packet buffer
so I replaced switch A with switch B, same configuration.
switch A in the lab is back to normal, and switch B in the field crashed after 2 days ... trafic still flowing. . All messages log are filled with the same
lines,
client won't let me change the switch because all trafic keeps flowing perfectly :-)
at this point, I welcome any idea as to what can put a switch in such a state.
configuration is basically similar on all 200+ switches.
2 switches failed with the same problem on the same site.
opened case with JTAC mention "memory leak" but have not find a cause yet.
any clue ?
Michel
------------------------------
Michel Lapointe
------------------------------