Switching

Expand all | Collapse all

EX4200 VC won't commit after a member core dump

Jump to Best Answer
  • 1.  EX4200 VC won't commit after a member core dump

     
    Posted 10-22-2020 09:43

    I've got a 3 member virtual-chassis EX4200 that won't commit its config. This morning, member0 rebooted and dumped core immediately after a commit was performed. Console showed a lot of this, repeatedly:

     

    I=16105
    UNEXPECTED SOFT UPDATE INCONSISTENCY

    CLEAR? yes

    along with:

     

    DIRECTORY CORRUPTED I=8205 OWNER=0 MODE=40755
    SIZE=512 MTIME=Jun 14 02:58 2013
    DIR=?

    UNEXPECTED SOFT UPDATE INCONSISTENCY

    SALVAGE? yes

    There are also some errors having to do with FIPS failures before the boot finished. Now when I attempt to commit, I get:

     

    # commit check
    fpc1:
    configuration check succeeds
    fpc0:
    2020-10-22 11:57:46 EDT: Running FIPS Self-tests
    veriexec: /boot/loader: No such file or directory/sbin/kats/file_integrity: cannot open /boot/loader: No such file or directoryFailed SHA1 checksum of /boot/loader@ 1603382266 [2020-10-22 11:57:46] fips-error[1690]: FIPS Error 1: File integrity test failedAbort trap (core dumped)
    2020-10-22 11:57:47 EDT: FIPS Self-tests Failed
    error: configuration check-out failed
    fpc1:
    error: remote commit-configuration failed on fpc0
    fpc2:
    configuration check succeeds
    fpc1:
    error: configuration check-out failed

    I also ran a disk check and I'm seeing 'bad read' errors:

     

    % nand-mediack -C
    Media check on da0 on ex platforms
    Zone 06 Block 0186 Addr 18ba00 : Bad read
    Zone 06 Block 0502 Addr 19f600 : Bad read
    Zone 06 Block 0523 Addr 1a0b00 : Bad read
    Zone 06 Block 0700 Addr 1abc00 : Bad read
    Zone 06 Block 0807 Addr 1b2700 : Bad read

    I am pretty sure I need to re-install Junos from USB and format the disk in the process, but is there any safe way to commit pending changes without doing this in the meantime? 



  • 2.  Re: EX4200 VC won't commit after a member core dump
    Best Answer

     
    Posted 10-22-2020 11:14

    Hello Evt,

     

    - Unfortunate, configuration changes cannot be committed as long as this member report HW failure.

    -Re-installing the Junos with Format is mandatory as the trail before declaring this switch as dead hence RMA will be needed. 

     

    you may use the steps described in this KB for Recovery

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB29069

     

    Best wishes 

    Bemwa 



  • 3.  Re: EX4200 VC won't commit after a member core dump

     
    Posted 10-22-2020 11:17

    Hi evt,

     

    I would think that commit will not go through until you fix that member, but you can try the following,

    #commit full force (this will make all daemons to check the config applied)

     

    Besides that you can try removing the member giving issues, in this case it looks like fpc0 is the one not allowing the commit, so you can remove the vc cables from this member and try to commit again, commit will be made for the other members, if this is the master you can try to swap the mastership first to avoid issues.

    >request chassis routing-engine master switch

     

    Regards,

    Jeff



  • 4.  Re: EX4200 VC won't commit after a member core dump

     
    Posted 10-22-2020 12:14

    Thanks for the response. What about a snapshot on the alternate slice, rebooting, then snapshotting onto the primary slice? Is that worth a shot or just go right to install via the loader from USB?



  • 5.  Re: EX4200 VC won't commit after a member core dump

     
    Posted 10-23-2020 01:09

    Hello Evt,

     

    I am afraid to say that as long as bad block recovery commands like "nand-mediack / fsck" didn't help, So your switch is experiencing real bad blocks.

     

    However, snapshotting from the alternate slice is worth a trail as last resort 

     

    Please refer to the KB https://kb.juniper.net/InfoCenter/index?page=content&id=KB23180 to restore main slice.

    1) snapshot from backup slice to the main slice
    request system snapshot media internal slice alternate

     

    2) check result
     

    3) reboot from the main slice
    request system reboot slice alternate media internal

     

    But I would recommend being ready to install from USB in case of failure 

     

     

    Hope this helps.  😎

    Please mark "Accept as solution" if this answers your query.  Kudos are appreciated too! 

     

    Regards,

    Bemwa 



  • 6.  Re: EX4200 VC won't commit after a member core dump

    Posted 10-23-2020 01:18

    Hi evt,

     

    On legacy EX switches, file system check (fsck) is run with the -C option, which skips the file system corruption check if the partition has been marked clean during the boot "nand-media" check. Due to this, there have been multiple instances where the partition has had file system issues even when cleanly shut down.

     

    Reference:

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB33996&actp=RSS

     

    In the rare instance that the file system check (fsck) is completed and file system corruptions continue to be seen, you would need to perform an install -format. This will format the file system and all file system corruptions will be removed, along with any previous logs and configuration. To perform format install, refer to

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB20643

     

    HTH

     

    Mark this as an "Accepted Solution" so that it can help others.



  • 7.  Re: EX4200 VC won't commit after a member core dump

     
    Posted 10-27-2020 05:37

    Unfortunately, the booting to the alternate slice did not help. I ended up having to install from USB and formatting the disk in the process:

     

    https://kb.juniper.net/InfoCenter/index?page=content&id=KB20643

     

    This process was relatively painless and very simple.