Last week I had to reroute some power cables on some devcies in the server room and our Juniper SRX100H accidently had its power removed, my fault. After noticing it was no longer powered up when I did apply power it didnt come alive for about 20 minutes or so. After panicking it finally had all its port lights on but after logging back into it I discovered I corrupted the device do to removal of the power cord.
I am new to this particular device but Im very nervous attempting the repair without having a background using the Juno software and I'm afraid I'll default the system to factory bringing the entire network to a halt.
How do you recommend I approach this type of repair. Thank you.
Use that link for information and how to do
Thank you alexander.
Alexander must I be the root user? The company who managed this devcie simply made me an account to login, but I'm not sure if my admin account has the same credentials as an actual root user.
Thank you again.
You are good to go. You can use the command >request system snapshot slice alternate, to repair the primary partition. The system is working as designed with the dual root partition. it is for exactly this sort of situation. But I like to use this method outlined here in the article below, to make sure that it is properly formatted and installed, and after that I can use the snapshot method cleanly. Save the complete configuration file. After using the USB, it will create the dual root partition and you can simply load the config again. There is also an option to create a spefic file and add to the usb with the config, so it will format and copy the cofig automatically.
to see if your user account has admin privelleges, execute the command #show system login
Here is an example of the command and it completed before I finished typing this reply:
lab@srxF-2# run request system snapshot slice alternateFormatting alternate root (/dev/da0s2a)...Copying '/dev/da0s1a' to '/dev/da0s2a' .. (this may take a few minutes)The following filesystems were archived: /
You can view the results with any of the following:lab@srxF-2# run show system snapshot slice alternate media internalInformation for snapshot on internal (/dev/da0s1a) (primary)Creation date: Feb 26 11:21:00 2016JUNOS version on snapshot: junos : 12.1X46-D15.3-domesticInformation for snapshot on internal (/dev/da0s2a) (backup)Creation date: Mar 7 17:47:03 2016JUNOS version on snapshot: junos : 12.1X46-D15.3-domesticlab@srxF-2# run show system snapshot slice 1 media internalInformation for snapshot on internal (/dev/da0s1a) (primary)Creation date: Feb 26 11:21:00 2016JUNOS version on snapshot: junos : 12.1X46-D15.3-domesticInformation for snapshot on internal (/dev/da0s2a) (backup)Creation date: Mar 7 17:47:03 2016JUNOS version on snapshot: junos : 12.1X46-D15.3-domesticlab@srxF-2# run show system snapshot slice 2 media internalInformation for snapshot on internal (/dev/da0s1a) (primary)Creation date: Feb 26 11:21:00 2016JUNOS version on snapshot: junos : 12.1X46-D15.3-domesticInformation for snapshot on internal (/dev/da0s2a) (backup)Creation date: Mar 7 17:47:03 2016JUNOS version on snapshot: junos : 12.1X46-D15.3-domestic
You can reboot after this to get it to use the promary partition again
Thank you Lyndidon.
I'm so nervous even attempting this, I've only just started here and this device, if screwed up controls everything here, including our phones so I'm a bit apprensive right now. Wish I had a spare one to at least practice with. 🙂
Please run the snapshot command as soon as possible. This is NOT service affecting.
AND YOU ARE AT RISK NOW UNTIL YOU DO.
The dual boot partitions are there so that if a power failure occurs at a moment when the disck becomes corrupted the SRX can still boot. This is what happened to your SRX so it booted from the backup copy.
Now you have a working copy and a corrupted copy. If you had a second failure your SRX will NOT boot.
By running the snapshot repair you will restore full dual boot resiliency.
Thank you spuluka. Forgive my inexperience, is the Snapshot repair accomplished thru the GUI or CLI, thank you.
I ran the System Snapshot to an external USB drive.
It looks like the command for the alternate partition is not available inthe web interface. You can run this only on the CLI
request system snapshot slice alternate
This repairs the alternate boot partition on the internal flash that is currently corrupted from the power event.
Thank you so much for your help. I issued the command thru CLI, I'm assuming I need to reboot to take effect. I'm going to do that first thing in the morning. The company is so busy that although are here at 6am its my best time to reboot then with as little of impact. Will the SRX simply now see the repaired slice or do I need to run a special reboot command to mount the repaired slice.
Thank you Spuluka for all your help and solution. And thanks to all who left help comments, I'm sure they all would have worked as well but with my limited knowledge right now, this was my best solution.
It's all happy now. Chassis warning light also out now after autorecover repair.
Very strange occurance today. Everything appeared to be working since the slice repair on Friday, today I discovered the Juniper just was not allowing traffic to come externally. We had our ISP on site to swap out one of their switches and after rebooting we had Internet but our phones did not work.
I had the phone technician here and we did some probing and everything appeared right, so then found out we were not receiving emails. Servers checked out ok, the only thing I could think of was the Juniper and sure enough it wasn't allowing traffic in. The were no indications of an issue, becasue I did a snapshot last week on a USB key we ended up booting the Juniper from the external USB and it worked. Have no idea why the internal media is not working correctly.
Very odd issue, from the description I doubt this is media related at all. Sound more like some kind of bug or a resource exhaustion, you might get this behavior if the session table was full and no more resources were available. A normal reboot probably would have cleared this.
Were you able to log into the device during the incident?
How did the performance metrics look?
Any change you looked at the sessions table size?
Until I rebooted to the USB I had web access but no longer do. I can SSH into the system but being new to this device its a bit much for me at the moment. I have had an email form one of our VPN users that they no longer can connect so I'm trying to get the person who setup the system come and help me out if he can.
I can't afford to have an outage like yesterday so I'm treading lightly because I don't know if my accidental power loss has ruined the entire unit. Thanks