SRX

Expand all | Collapse all

random kernel panic srx300 with 19.1R1.6

  • 1.  random kernel panic srx300 with 19.1R1.6

    Posted 12-20-2019 00:24

    Hi

    Anybody seen this behaviour before:

    We have 10x SRX300 in  7 locations (3 in cluster configuration), The 2 clusters with version 19.1R1.6 are random crashing with kernel panic error below, support asks for RMA but it sounds to me the version could be the cause and not the hardware.  The frequency of crashing raises so the suggestion of replacing sounds reasonable.

     

    xhci_process_cmd_event+0x124 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x8016d420 sz 64
    xhci_scan_ring_event+0xd8 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x8016ee8c sz 64
    xhci_intr+0x2a0 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80aeaabc sz 32
    mips_handle_this_interrupt+0x8c (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80aeab48 sz 40
    mips_handle_interrupts+0x58 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80aeaf6c sz 48
    mips_interrupt+0x224 (0xc6d02800,0xc6d51110,0x18411603,0x13c0401) ra 0x80e4bf5c sz 32
    MipsUserIntr+0x1a8 (0xc6d02800,0xc6d51110,0x18411603,0x4038a298) ra 0 sz 0
    pid 2057, process: flowd_octeon_hm
    cpu:0-Trap cause = 3 (TLB miss (store) - kernel mode)
    badvaddr = 0x1010, pc = 0x8016c234, ra = 0x8016d420, sr = 0x508008e3
    panic: trap
    cpuid = 0
    KDB: stack backtrace:
    0x4038a298+0x0 (0,0,0,0) ra 0x4038b6d0 sz 0
    0x4038b67c+0x54 (0,0,0,0) ra 0x4027fb94 sz 48
    0x4027fa68+0x12c (0,0,0,0) ra 0 sz 0
    pid 2057, process: flowd_octeon_hm
    Uptime: 10d13h13m33s
    Cannot dump. No dump device defined.

     

    Anybody seen this behaviour before?



  • 2.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 12-20-2019 02:25

    Yes, I expect this happens due to defunct flash storage in the SRX300 series devices shipped before June 2019. They haven't been of good enough quality and fails too quickly. All RMAs and new devices since June 2019 has been with a updated and more durable flash storage chip.

     

    I just returned a SRX300 for RMA this morning with the exact same error.

     

    Even that you are at Junos 19.x I will still mention that all RMA devices has to run at least 15.1X49-D150, 17.4R3, 18.2R2 or 18.3R1 as per https://kb.juniper.net/InfoCenter/index?page=content&id=TSB17581

     

     



  • 3.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 12-20-2019 05:44

    Hi Jonas,

     

    Thanks for the clearification, its a schame we have to wait then with 10x srx300 until the flash dies before we can create a RMA and replace the srx300 😞



  • 4.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 12-20-2019 06:00

    is it possible to just replace the flash drive? saves a lot of time and effort



  • 5.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 12-20-2019 11:00

    The flash chip on the SRX300 series are soldered so no way to replace them easily.

     

    I will suggest to reach out to your local Juniper account manager and ask them to help doing a proactive exchange of your devices instead of taking them case by case. The SE can support the dialog with JTAC about this.



  • 6.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 12-31-2019 16:33

    Hello, These messages appear to be hardware problem but can you / did you try to downgrade software on this box to see if these messags go away. That will be quicker and easier test, provided its not in active production. 



  • 7.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 12-31-2019 16:34

    is there any core dump generated ? 

    show system core-dumps



  • 8.  RE: random kernel panic srx300 with 19.1R1.6

    Posted 01-16-2020 11:14

    It's a hardware issue, with as Jonas said, the only solution is a RMA to replace the SRX300

    still JTAC keeps the PR hidden as confidential. I suppose a proactive recall of alle SRX300 globally is too expensive.

     

    should be nice to know when the storage will be at its end of its lifetime.

    We replaced now 6 srx300 in 3 clusters / locations with 6 RMA's.

     

    really bad for juniper reputation, no proactive attitude at JTAC.