I have configured chassis cluster in SRX 240 , 240 , now need to make both of them as active/active, how to make them in active /active?plz anyone with same scenario help for me.
Hi Shyan ,
If you put all the reth interfaces at one redundancy group , it is A/P
To have active/active , put some reth interfaces at redundancy group 1 & some reth interfaces at redundancy group2 , then let redundancy group 1 have higher priority on node 0 while redundancy group2 have higher priority on node1
************** Click on the button saying " Accept as Solution" if My Post solved your problem **************
Thanks for for your early efforts.
I have already configured different reth in redundant group and assign priority in same node only. Now I change the priority of node for differ. But how to configure all interface in active/active so that I can access the all interface for worst case also.
Let's say that you assigned 2 reth interfaces to RG1 which is active ( Higher priority )on node 0 & you assigned 2 reth interfaces to RG2 which is active (Higher Priority ) on node 1
By That you have active/active scenario as at the same time RG1 is active on node1 & RG2 is active on node2
can I assign same RG in active/active in both device?If not, is there any solution for this case, or how to make active/active for same device.
If assign same priority for same RG, is it active/active in both device?
"If assign same priority for same RG, is it active/active in both device?"
No , If the priority is the same , other methods will be used to elect the active ( node id , first up device ,.... )
So , the result is : RG is active only on one node
To achieve active / active you can have one RG active on node 0 & another different RG active on node 1 , You cannot have one RG active on both nodes at the same time
Very thank you for your efforts.hope your help be continue in future also.
as far as i understood an srx240 cluster has only one routing group (on rg0), and therefore cannot be run in real active/active mode (in fact thats the same with all the "branch" srx series routers.)
(active/active meaning both machines in the cluster use the other one for fallback, in this case the routing group fallback ("failover") only works one way.)
correct me if i misunderstood this;)
What it comes down to is this question: Why active/active?
In 10 years of working IT security, I have come across three possible answers:
a) To boost throughput
b) To resolve issues with asymmetric routing
c) So the 2nd unit "doesn't just sit there"
c) can be knocked out quickly. Unless there is a measurable advantage to active/active, having the 2nd unit "just sit there" is just fine. HA is implemented so there is failover, and because HA w/ NBD service is often more advantageous in the long run than a single unit with 4-hr service. This isn't about some nebolous feel-good advantage of active/active, this is about clear and measurable benefits that flow down to the bottom line.
b) can be a legitimate workaround. Ultimately, I far prefer to resolve the asymmetric routing situation. active/active is harder to troubleshoot than active/passive, and asymmetric routing doesn't make it any easier. From a TCO perspective, resolving asymmetric routing issues is preferable.
a) needs to be examined very closely, along several vectors. It has to be measurable and have a positive contribution to the company's bottom line. a1) Are we truly boosting network throughput? The Juniper design of active/active means you could, as long as ingress and egress ports are on the same unit. Once you have to traverse the fabric link, you're losing that theoretical advantage. Also, if one unit can handle all the traffic you are throwing at it, then there's no need for active/active a2) Is it acceptable to be running at the speed of one unit during a failure scenario? For how long? Is NBD acceptable, or do we need to go to 4-hr for both units now? (Higher cost - it may be more advantageous to buy the more performant units and stay with NBD service)
a3) Was the intent to boost UTM/IDP throughput? By how much are we boosting it? What does that mean in a failure scenario (a2 all over again)? And is that even supported in active/active? (Currently: No)
And then you need to carefully think about the possible drawbacks of active/active:
- Which features become unsupported, and did we need those features? (IDP and UTM, others?)
- Is the added complexity of troubleshooting worth the measurable benefit of active/active?
- What is the impact on TCO? Consider not just the possible added time spent troubleshooting, but also the skill level of your network engineers. Will you need to hire more costly resources to support this infrastructure? How about designing expansion of the infrastructure as time goes on and making sure that the benefits of active/active remain through that expansion?
I'll spare you the head-scratching and come right out with it: I have yet to see an environment where active/active is the right answer. I've seen it implemented only for ill-defined reasons such as c), and have yet to see anyone implement active/active for a clear, measurable benefit.
Of course active/active can be the right answer, as long as the lower performance in failure state is acceptable, and that failure state is less of a drag on the bottom line for its duration than just buying "the bigger box" would be. I just haven't seen the environment where that was the right answer, once people got over c). And boy do I hear c) a lot. 🙂
What a good post. As an engineer for a reseller I deal with this question pretty frequently. I think you did a great job of summarizing the issues. Thanks for sharing!
Thank you for the kudos. I should add:
That discussion was SRX-specific, and can be adapted to any device that is in the path and clustered on L2 (same subnet for both members of the cluster). Even for out-of-band devices, some of these considerations apply.
The one exception to the rule are Juniper SSL VPNs. If they are deployed on separate subnets, and the customer desires stateful failover, then active/active can be the right answer. There are product-specific caveats there, too: Internal link has to be stable, low latency (well under 100ms) and high throughput (over 2MBit). WAN links are officially unsupported - but with MPLS these days, they will work. An external load balancer that can perform health checks is needed. That could be as simple as round-robin DNS, or it could be an F5 GTM or similar.
But even there, more often than not, a simple landing page for end users (SSL VPN East/West, EMEA, APAC) is the better choice, and doesn't require external load balancers.
Pre-sales kind of expression 🙂
I'll really appreciate if you describe me a real world usecase of a-symmetric routing based on active-passive design (by means of SRX )
Active-Active scenario will be helpful if we have geographically stretched cluster of 2 SRX devices (this is a supported I think configuration at least I've read something like that a year ago) or in most used situations where we have more than one ISPs and dynamic routing protocol between ISPs and SRX HA Cluster
DC1 | ISP1 - SRX1.Cluster1 | --- long distance DF or Layer 2 link b/n datacenteres --- | SRX2.Cluster1 - ISP2 | DC2
In this example Can I have active dynamic routing with both ISPs to advertise them my APP-SRV IP sitting behind SRX?
Or another one:
same DC and same rack - 2x SRX in HA Cluster setup + 2 ISP. Where I should terminate links from ISP1 and ISP2 to avoid device single point of failure ? ISP1 on node0 and ISP2 on node1 ? What happen with my dynamic routing then (as node1 will not terminate any traffic) ? Adding minimum 2 (EX ?) switches in VC/VCF/whatever_stacking and terminate Internet link there - too costly solution
Not sure if this will be of any benefit or not.
I have just finished configuring and testing 2 x SRX1500 for A/P and A/A. My results may surprise you, and in fact is not uncommon (apparently).
Given that there are two points of protection ---- The Control Plane and the Data Plane ----- I have found that the SRX actually is always in an A/A state from a Data Plane perspective. When I disconnected the Control Plane the HA, as it should, failed and immediately placed the Secondary into "Ineligeble" for 3 minutes, where, if the Data Plane is not lost, it then places it into disable. This was because the primary still sees itself as operational and so therefore assumes all responsibility and removes the secondary from the equation. If within those three minutes we also disconnect the Data Plane, then the secondary becomes the primary as it assumes the original primary is lost.
Now, what is important to understand here is that the Control Plane is purely for the chassis (hence the removal of the HA connection only affects the failover for the chassis), while the Fabric Ports are for the RTOs (Real Time Objects) synchronisation (VPNs, NAT, etc etc). The Reth ports are for the actual data.
What I found was, unless I disconnected everything, the ports for the data were always UP, even if the HA port was disconnected. That's why, I believe, it runs in A/A, from a data perspective, even if you believe it is A/P.... it is only truly A/P from a Control Plane perspective.
Obviously, I can only come to these conclusions from the testing I have just completed and the results and behaviour of the SRX from those tests.
If you want me to post the Config I used then please let me know (if that helps) 🙂