SRX

  • 1.  Restoring SRX boot

    Posted 12-02-2020 06:59
    I've encountered some branch SRX'es over the past years that would not boot correctly after a power failure. Most of the time they will boot from secundairy partitition and a "clean" reboot is enough to fix this. However sometimes it will not boot at all and get stuck at the bootloader with the following message:

    can't load '/kernel'
    can't load '/kernel.old'
    Press Enter to stop auto bootsequencing and to enter loader prompt.
    ​

    Now, the first thing most people do is search for solutions, and so did i. There are a ton of posts on this, but none of them would give me satisfying results.

    So i want to share what i have tried on various occasions and what won't work. At the moment i have 2 separate SRX300's with this problem and i've decided to tackle this problem and get the procedure right once and for all. The SRX in question is running junos-srxsme-15.1X49-D170.4-domestic

    I've started with formatting a 16GB USB stick with FAT32 and copying junos-srxsme-15.1X49-D170.4-domestic.tgz onto it. Booted the SRX300 into the bootloader and trying:

    install file:///junos-srxsme-15.1X49-D170.4-domestic.tgz

    It starts the installation and looks promising, until:

    Starting JUNOS installation:
        Source Package: disk1:/junos-srxsme-15.1X49-D170.4-domestic.tgz
        Target Media  : internal
        Product       : srx300
    Computing slice and partition sizes for /dev/da0 ...
    awk: division by zero
     input record number 1, file
     source line number 3
    The target media /dev/da0 (0 bytes) is too small.
    The installation cannot proceed
    ERROR: Target media is too small​

    Now, some suggest putting --format in the command, but this won't work and i've found somewhere that this isn't support on SRX (branch?) devices. Also on the other faulty SRX300 i've gotten the message:

    Target device selected for installation: internal media
    cannot open package (error 22)​


    I'll get back to this after i fix the current one.

    There are some posts that suggest making a bootable USB using a healty SRX and booting from that. I reformated my 16GB USB disk to FAT32 and plugged into a healty SRX300 and used the following command: "run request system snapshot media usb"

    When it finished after a while i plugged it into the faulty one and rebooted:

    SPI stage 1 bootloader (Build time: May  3 2016 - 23:48:30)
    early_board_init: Board type: SRX_300
    
    U-Boot 2013.07-JNPR-3.1 (Build time: May 03 2016 - 23:48:31)
    
    SRX_300 board revision major:1, minor:9, serial #: CV3417AF1962
    OCTEON CN7020-AAP pass 1.2, Core clock: 1200 MHz, IO clock: 600 MHz, DDR clock: 667 MHz (1334 Mhz DDR)
    Base DRAM address used by u-boot: 0x10fc00000, size: 0x400000
    DRAM: 4 GiB
    Clearing DRAM...... done
    Using default environment
    
    SF: Detected MX25L6405D with page size 256 Bytes, erase size 64 KiB, total 8 MiB
    Found valid SPI bootloader at offset: 0x90000, size: 1481840 bytes
    
    
    U-Boot 2013.07-JNPR-3.1 (Build time: May 03 2016 - 23:50:19)
    
    Using DRAM size from environment: 4096 MBytes
    checkboard siege
    SATA0: not available
    SATA1: not available
    SATA BIST STATUS = 0x0
    SRX_300 board revision major:1, minor:9, serial #: CV3417AF1962
    OCTEON CN7020-AAP pass 1.2, Core clock: 1200 MHz, IO clock: 600 MHz, DDR clock: 667 MHz (1334 Mhz DDR)
    Base DRAM address used by u-boot: 0x10f000000, size: 0x1000000
    DRAM: 4 GiB
    Clearing DRAM...... done
    SF: Detected MX25L6405D with page size 256 Bytes, erase size 64 KiB, total 8 MiB
    PCIe: Port 0 link active, 1 lanes, speed gen2
    PCIe: Link timeout on port 1, probably the slot is empty
    PCIe: Port 2 not in PCIe mode, skipping
    Net:   octeth0
    Interface 0 has 1 ports (SGMII)
    Type the command 'usb start' to scan for USB storage devices.
    
    Boot Media: eUSB usb
    Found TPM SLB9660 TT 1.2 by Infineon
    TPM initialized
    Hit any key to stop autoboot:  0
    SF: Detected MX25L6405D with page size 256 Bytes, erase size 64 KiB, total 8 MiB
    SF: 1048576 bytes @ 0x200000 Read: OK
    ## Starting application at 0x8f0000a0 ...
    Consoles: U-Boot console
    Found compatible API, ver. 3.1
    USB1:
    Starting the controller
    USB XHCI 1.00
    scanning bus 1 for devices... 2 USB Device(s) found
    USB0:
    Starting the controller
    USB XHCI 1.00
    scanning bus 0 for devices... 2 USB Device(s) found
           scanning usb for storage devices... Device NOT ready
       Request Sense returned 02 3A 00
    2 Storage Device(s) found
    
    FreeBSD/MIPS U-Boot bootstrap loader, Revision 2.8
    (slt-builder@svl-ssd-build-vm06.juniper.net, Tue Feb 10 00:32:30 PST 2015)
    Memory: 4096MB
    SF: Detected MX25L6405D with page size 256 Bytes, erase size 64 KiB, total 8 MiB
    [2]Booting from usb slice 1
    \
    can't load '/kernel'
    can't load '/kernel.old'
    Press Enter to stop auto bootsequencing and to enter loader prompt.
    

    So, no joy there.. After some more searching i found a post and tried something else:

    I stopped autoboot and got into the shell starting with "Octeon srx_300_ram#"

    Octeon srx_300_ram# env print

    autoload=n
    baudrate=9600
    boardname=srx_300
    boot.btsq.len=0x00010000
    boot.btsq.start=0x007e0000
    boot.current=primary
    boot.devlist=eUSB:usb
    boot.env.size=0x00002000
    boot.env.start=0x007f0000
    boot.upgrade.loader=0x00200000
    boot.upgrade.loader.data=0x00200000
    boot.upgrade.loader.hdr=0x002fffc0
    boot.upgrade.uboot=0x00000000
    boot.upgrade.uboot.data=0x00000100
    boot.upgrade.uboot.hdr=0x00000030
    boot.upgrade.uboot.maxsize=0x00200000
    boot.upgrade.uboot.secondary=0x00000000
    boot.upgrade.ushell=0x00300000
    boot.ver=3.1
    bootcmd=sf probe; sf read 0x100000 $(boot.upgrade.loader) 0x100000; bootelf 0x100000
    bootdelay=5
    disk.install=disk1
    dram_size_mbytes=4096
    ethact=octeth0
    ethaddr=58:00:bb:a0:f1:00
    loadaddr=0x20000000
    loaddev=disk0:
    numcores=2
    octeon_failsafe_mode=0
    octeon_ram_mode=1
    serial#=CVXXXXXXXXXX
    stderr=serial
    stdin=serial
    stdout=serial
    ver=U-Boot 2013.07-JNPR-3.1 (Build time: May 03 2016 - 23:50:19)
    wmem_selector=9448
    
    Environment size: 1010/8188 bytes​

    Octeon srx_300_ram# setenv boot.current alternate
    Octeon srx_300_ram# boot

    This time it would boot all the way, but the funny thing is, that it booted from USB that was still plugged in. The faulty SRX had the config of the healthy SRX i used to create the partition.

    Every few minutes a message shows in console:

    (da0:umass-sim0:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
    (da0:umass-sim0:0:0:0): CAM Status: SCSI Status Error
    (da0:umass-sim0:0:0:0): SCSI Status: Check Condition
    (da0:umass-sim0:0:0:0): NOT READY asc:3a,0
    (da0:umass-sim0:0:0:0): Medium not present
    (da0:umass-sim0:0:0:0): Unretryable error
    Opened disk da0 -> 6
    ​

    So, where do i go from here? Also setenv boot.current alternate and the rebooting with the USB removed results in the kernel.old message.

    Kind regards,

    Jeroen R