Anybody seen this behaviour before:
We have 10x SRX300 in 7 locations (3 in cluster configuration), The 2 clusters with version 19.1R1.6 are random crashing with kernel panic error below, support asks for RMA but it sounds to me the version could be the cause and not the hardware. The frequency of crashing raises so the suggestion of replacing sounds reasonable.
xhci_process_cmd_event+0x124 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x8016d420 sz 64xhci_scan_ring_event+0xd8 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x8016ee8c sz 64xhci_intr+0x2a0 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80aeaabc sz 32mips_handle_this_interrupt+0x8c (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80aeab48 sz 40mips_handle_interrupts+0x58 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80aeaf6c sz 48mips_interrupt+0x224 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80e4bf5c sz 32MipsUserIntr+0x1a8 (0xc6d02800,0xc6d51110,0x18411603,0x4038a298) ra 0 sz 0pid 2057, process: flowd_octeon_hmcpu:0-Trap cause = 3 (TLB miss (store) - kernel mode)badvaddr = 0x1010, pc = 0x8016c234, ra = 0x8016d420, sr = 0x508008e3panic: trapcpuid = 0KDB: stack backtrace:0x4038a298+0x0 (0,0,0,0) ra 0x4038b6d0 sz 00x4038b67c+0x54 (0,0,0,0) ra 0x4027fb94 sz 480x4027fa68+0x12c (0,0,0,0) ra 0 sz 0pid 2057, process: flowd_octeon_hmUptime: 10d13h13m33sCannot dump. No dump device defined.
Anybody seen this behaviour before?
Yes, I expect this happens due to defunct flash storage in the SRX300 series devices shipped before June 2019. They haven't been of good enough quality and fails too quickly. All RMAs and new devices since June 2019 has been with a updated and more durable flash storage chip.
I just returned a SRX300 for RMA this morning with the exact same error.
Even that you are at Junos 19.x I will still mention that all RMA devices has to run at least 15.1X49-D150, 17.4R3, 18.2R2 or 18.3R1 as per https://kb.juniper.net/InfoCenter/index?page=content&id=TSB17581
Thanks for the clearification, its a schame we have to wait then with 10x srx300 until the flash dies before we can create a RMA and replace the srx300 😞
is it possible to just replace the flash drive? saves a lot of time and effort
The flash chip on the SRX300 series are soldered so no way to replace them easily.
I will suggest to reach out to your local Juniper account manager and ask them to help doing a proactive exchange of your devices instead of taking them case by case. The SE can support the dialog with JTAC about this.
Hello, These messages appear to be hardware problem but can you / did you try to downgrade software on this box to see if these messags go away. That will be quicker and easier test, provided its not in active production.
is there any core dump generated ?
show system core-dumps
It's a hardware issue, with as Jonas said, the only solution is a RMA to replace the SRX300
still JTAC keeps the PR hidden as confidential. I suppose a proactive recall of alle SRX300 globally is too expensive.
should be nice to know when the storage will be at its end of its lifetime.
We replaced now 6 srx300 in 3 clusters / locations with 6 RMA's.
really bad for juniper reputation, no proactive attitude at JTAC.