Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

View Only

last person joined: yesterday

Ask questions and share experiences about EX and QFX portfolios and all switching solutions across your data center, campus, and branch locations.

Back to discussions

Expand all | Collapse all

Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

1. Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
LapointeMichel
Posted 12-02-2020 15:03

Reply Reply Privately
Bonjour everybody,
I have 200+ EX2300 running smoothly at various client sites. Just letting trafic goes through between 2 interface so I can keep an eye on it.
recently, I lost remote contact with one switch. All trafic kept flowing, but no access to my management vlan.
plugging at console, I noticed all the log messages file absolutely filled with this .

Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 brcm_pkt_buf_alloc:393 (buf alloc) failed allocating packet buffer Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 (buf alloc) failed allocating packet buffer Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 brcm_pkt_buf_alloc:393 (buf alloc) failed allocating packet buffer Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 (buf alloc) failed allocating packet buffer Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 brcm_pkt_buf_alloc:393 (buf alloc) failed allocating packet buffer Nov 18 13:05:01 ReseauBiblio-SitePrincipal fpc0 (buf alloc) failed allocating packet buffer

so I replaced switch A with switch B, same configuration.
switch A in the lab is back to normal, and switch B in the field crashed after 2 days ... trafic still flowing. . All messages log are filled with the same
lines,
client won't let me change the switch because all trafic keeps flowing perfectly :-)

at this point, I welcome any idea as to what can put a switch in such a state.

configuration is basically similar on all 200+ switches.
2 switches failed with the same problem on the same site.

opened case with JTAC mention "memory leak" but have not find a cause yet.

any clue ?
Michel

------------------------------
Michel Lapointe
------------------------------
2. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
F1ght3r
Posted 12-02-2020 16:17

Reply Reply Privately
Hello Michel,

although I do not know if this is a known issue (looks like a clear memory leak), I would suggest you to upgrade to a current JUNOS release (e.g. 19.4R3) and check if the issue still occurs. Sometimes this saves you from JTAC cases which can take a very long time.

------------------------------
------------------------------
If my answer provides the solution, please mark my post as "Accepted Solution".
If you think my answer helps, please spend some Kudos
------------------------------

Original Message
3. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
LapointeMichel
Posted 12-02-2020 17:11

Reply Reply Privately
Hello F1ght3r

I am running 182R3-S3 on all my 200+ switches. None of them behaved like that since the deployment that started in february.
so image upgrade is on hold for now.
I used the same config files on switch A and B, which is a carbon copy of the other switches , except for description and irb interface IP adress.
2 switches acting like this must have something in common. either the environnement or the config files -
but what kind of environnement or config would bring a switch in such a state ?
log messages are useless since they are fillled with teh same line.
Michel

------------------------------
Michel Lapointe
------------------------------

Original Message
4. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
F1ght3r
Posted 12-03-2020 04:03

Reply Reply Privately
Hello Michel,

this highly depends on the customer traffic. It is absolutely possible that one customer on each switch sends specific packets which may lead to the EX memory leak.
The long term resolution is solving this together with JTAC. The short term try can be using a current JUNOS release which includes the newest software fixed, to check if the issue still appears or not.

------------------------------
------------------------------
If my answer provides the solution, please mark my post as "Accepted Solution".
If you think my answer helps, please spend some Kudos
------------------------------

Original Message
5. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

1 Recommend
tgreaser
Posted 12-03-2020 08:43

Reply Reply Privately
Something lik this has been posted before in the community..
https://community.juniper.net/communities/community-home/digestviewer/viewthread?MID=73003
I will look in my emails for more detailed version but .. In the past we had this issue with 18 were just as you described the PFE / Data plane kept running but control plane did not. Reboot fixes this temporarily .
Ive been running 19.4R on my 2300s in stand a lone and VC with no issues . Uptimes since last upgrade 276 days - 294 days in 10 sites,

SIDE NOTE .. Do to the recent PR https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1491905
Ive upgrading mine as we have years of seeing mcast kill our 2300s in the way it described ..We have been bypassing this issue we think via our wireless controller dropping mcasts at the ap. We thought it was just do to the low capacity of the 2300s.. But here is to hoping this shows me im wrong and the 2300 can do a lot more ..

Original Message
6. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
LapointeMichel
Posted 12-03-2020 09:28
Edited by LapointeMichel 12-03-2020 09:30
| view attached

Reply Reply Privately
Hello tgreaser,
thanks for the answer -
I checked the first link and while scrolling down, and it did ring a bell. Then I saw myself in the communications from 2019!!!! turns out this was related to the infamous PR1442376 https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1442376&actp=SUBSCRIPTION that cause 2300 switches to go zombies: absolutely no comm whatsoever, but trafic kept flowing. this is not the case here since I have console access and the switch right side selector still works, all things that PR1442376 were covering. I was for a moment afraid that PR1442376 had raised it's ugly head and would be forced to act on the promise I made at the time and kill myself :-)

the second link looks very promising, though: turns out I am running 18.2R3S3, which is affected by the issue. The problem occurs at a specific site, for a specific little network. We moved trafic to another switch yesterday, but left the old one running and the same error message keep coming out (see video the PR mentionvery low idle cpu as a symptom_ so i'll go and check it this afternoon.

if the new switch goes berserk again, I'll upgrade the old one to 18.2R3S5 (the recommended EX2300-C) image) and move back the wiring to it. ... and keep you posted .

a video is worth a thousand words: I attached a clip of the old switch reacting to a monitor start messages command
Thanks again.
Michel

------------------------------
Michel Lapointe
------------------------------

Video

9de7276e-aa56-4a53-b4e8-0fd97a76ca0a_file

Original Message
7. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
Luke Robertson
Posted 12-03-2020 15:33

Reply Reply Privately
I had this happen once. It's a good example of the clear separation between the RE (control plane) and the PFE (data plane).

In my case, there was no fancy fix. I rebooted the switch, then I put the JTAC recommended Junos on it.
Haven't had the problem reoccur since.

Original Message
8. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

0 Recommend
LapointeMichel
Posted 12-03-2020 16:19

Reply Reply Privately
bonsoir Luke and Tgreaser
I am pursuing the https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1491905 possibility. Here is what I found today:
the old switch is still running after being disconnected and the tafic connected to the new switch. So I have the down switch still spitting messages and refusing to talk to anybody except from console.
the PR mentions ...

To check if the device has high CPU load due to this issue, the administrator can issue the following command: user@host> show chassis routing-engine Routing Engine status: .. Idle 2 percent the "Idle" value shows as low (2 % in the example above), and also the following command: user@host> show system processes summary .. PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 11639 root 52 0 283M 11296K select 12:15 44.97% eventd 11803 root 81 0 719M 239M RUN 251:12 31.98% fxpc{FXPC} the eventd and the fxpc processes might use higher WCPU percentage (respectively 44.97% and 31.98% in the above example).

here is what I found on my good switch ...

root@ReseauBiblio-SitePrincipal> show system processes summary

...

PID USERNAME PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND

   11 root     155 ki31     0K    16K RUN    129.6H 81.98% idle

10156 root     -52   r0   727M   248M select 311:55   4.98% fxpc{fxpc}

10156 root      52    0   727M   248M select 387:50   3.96% fxpc{fxpc}

   21 root     -16    -     0K    16K -      179:28   0.98% rand_harvestq

10177 root      20    0   486M   131M select 64:30   0.98% authd

{master:0}

root@ReseauBiblio-SitePrincipal> show chassis routing-engine Routing Engine status:

....

    5 sec CPU utilization:

      User                       6 percent

      Background                 0 percent

      Kernel                     9 percent

      Interrupt                  1 percent

      Idle                      83 percent

...
now here are the readings from the sick switch

root@ReseauBiblio-SitePrincipal> show system processes summary

last pid: 11709; load averages: 3.33, 3.49, 3.47 up 5+16:26:53    15:28:45

...

PID USERNAME PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND

4894 root     -52   r0   727M   249M select 56.9H 51.95% fxpc{fxpc}

4894 root      78    0   727M   249M RUN     24.9H 21.97% fxpc{fxpc}

4869 root      78    0   287M 12144K RUN     23.3H 19.97% eventd

   21 root     -16    -     0K    16K -      162:25   0.98% rand_harvestq

{master:0}

root@ReseauBiblio-SitePrincipal> show chassis routing-engine Routing Engine status:

...

    5 sec CPU utilization:

      User                      69 percent

      Background                 0 percent

      Kernel                    30 percent

      Interrupt                  1 percent

      Idle                       0 percent

...

i'll send this later to JTAC to know what they think. Right now, I am hoping for this switch to go down like all the others. I'll then install 18.2R3S5 on the old one and transfer back the wiring and, hopefully, call it problem solved.

then all I have to do is to install the image on the other 229 .switches .. :-(

Michel

------------------------------
Michel Lapointe
------------------------------

Original Message

Switching

Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

LapointeMichel12-02-2020 15:03

F1ght3r12-02-2020 16:17

LapointeMichel12-02-2020 17:11

F1ght3r12-03-2020 04:03

tgreaser12-03-2020 08:43

LapointeMichel12-03-2020 09:28

Luke Robertson12-03-2020 15:33

LapointeMichel12-03-2020 16:19

1. Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

2. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

3. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

4. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

5. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

6. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

7. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer

8. RE: Ex2300 keeps switching but ...fpc0 (buf alloc) failed allocating packet buffer