@LapointeMichel wrote:
hello Dante et Rjtaylor
so as of today sep 4, 2019: 236 of my EX2300 are running on image 15.1X53 D591.1 (released in may 2019) . 100 have been running no problem for the last 4 days. I have set the arbitrary target of 5 days with no zombies to call this a success. It's based on my previous experiences since last june. I'll update if a single one goes down.
===================================================================================================
hello again Dante, RJTaylor and all,
so here we are Monday sept 9: 236 EX2300-C have now been running for at least 5 days (for 130, more than 5 days) and not a single one of them has gone zombie on me. Since last june, more than 100 switch running anything else higher (18.1, 18.3, 19.1...) was a sure recipe for an average of one switch a day going down. "going down" meaning losing the ability to connect with the switch: no console, no SSH, impossible to connect anything else than what was already connected. By having some switches in production going down, we also found that trafic was still going through seemingly unaffected. The only way to get back control of the switch was disconnecting AC.
After sending numerous log files and RSI (Req Support Information) files to JTAC, they said they could not find any indication of cause. I was supplied in early july with a beta Image that, when I installed it, seemed to solve the problem: not a single switch equipped with this image went down in 2 months testing.
Since this image is not in production, no date for this is available, and even getting an official release means testing to make sure it works, we plan to go back to 15.1X53-D591.1, which I understand is the latest version of the image the switches were delivered with last year. Next steps include checking with Juniper how long they plan to support 15.1. close the case I opened in june with JTAC, and make sure the QinQ config we designed for the switches using images above 15 are still compatible through the downgrade. So far we found that only a jsd disable command does not downgrade (irrelevant to our purpose) and we have to remember that going above 15 may mean losing ssh root access if system ssh root login allow is not set (or something like that, I have to check my notes ...)
When we were about to deploy 250 switches, I thought it was a good idea to upgrade them to the latest release. I still think it was, but it wasn't ! when I want to scare myself, I think of what would have happened deploying the switches to 250 different sites last june and losing access to them one after the other.
Please comment and suggest, but that case in now closed for me and I'll get on with my life 🙂 Thanks to Dante and RJTaylor to have opened and commented this issue on the forum
Michel Lapointe