I am posting to seek for advices and recommendation about HA design on QFX5110. I have multiple QFX5110 across multiple data center and connected with ring topology. There are two solutions that I am aware of:
2. Virtual Chassis
Which one should I go for and why?
You may consider MCLAG scenario as well.
Can you be a bit more specific regarding number of datacenters and number of QFX5110's per datacenter?
In general i will not recommend stretching virtual-chassis across datacenters as it creates on logical failure domain (software error can bring down multiple sites).
MC-LAG is only for a pair of switches - but if they are placed in "sets of two" it can be a feasible solution.
If you have two datacenters with two QFX5110's per site, I would go with either a virtual-chassis or mc-lag pair per site and then interconnect them via a LAG of minimum two links. Then you can upgrade one site with fear of downtime on opposite site.
But please explain your topology a bit deeper and I will try to give a recommendation.
Dear Ajo and Jonas,
Thanks for your reply first of all.
Pleaser refer to below diagram about the connectivity. I am having 4 QFX5110 across 3 different data center and they are connected in ring topology.
I believe VC is not a good option here but still need your advice and recommendation for a better solution.
Thanks and feel free to let me know if more info is needed.
in this scenario I think that ERPS is the best overall solution.
You can consider configuring the two QFX5110's at DC-B as a virtual chassis to simplify configuration on that site. That will require two links between QFX5110-2 and QFX5110-3 to properly form the virtual chassis.
Note: If you want to go with ERPS and virtual-chassis then the links has to be configured as ae's to function properly in case of RE failover. These can just be single 10G link in a LAG (Ref: https://kb.juniper.net/InfoCenter/index?page=content&id=KB31481)
You can get the same functionality with MC-LAG at DC-B instead of virtual chassis but two links between the switches will be needed.
If you go with virtual chassis you will have one configuration for DC-B and make it possible to create LAGs across both switches. Doing it via MC-LAG you will have individual configuration per switch and decide which ae's should be working across the two switches...
I hope this helps you decide the right design.
Really appreciate your suggestoin and advice about this. Either option has its own pro and con. I prefer to go with ERPS and do virtual chassis at DC-B, but I am not clear about your point "Note: If you want to go with ERPS and virtual-chassis then the links has to be configured as ae's to function properly in case of RE failover. " Are you refering to link between QFX5110-2 and QFX5110-3 or all inter-connect links in the ring topology?
Thanks and Regards,
The links between members in a virtual chassis are called "vcp links". You should have two vcp links between QFX5110-2 and QFX5110-3 to ensure they do go into a split-brain scenario in case of a single interface error. Also remember the 'virtual-chassis no-split-detection' when doing a two member VC.
The ae's i'm mentioning is the link between QFX5110-1<->QFX5110-2 and QFX5110-3<->QFX5110-4. There the east and west bound interface in the ERPS ring should be a LAG interface even that it only contains one link. Is due to the event of a RE failover where the logical interface name has to stay the same to avoid stability issues. That's what mentioned in KB31481.
On a different note; I thought of a bit different approach: Make all links between the QFX5110's as Layer3, enable OSPF and full-mesh iBGP and run all vlans as EVPN-VXLAN across the switces. It's somewhat more complicated than the ERPS ring and VC and will also require PFL licenses on all QFX5110's. Anyway, just wanted to mention a completely different approach 🙂
That's pretty clear! Thank you very much and really appreciate for your input.
I have gone through your reply, it is pretty useful for me.
I would like to try to build a layer 2 network without STP to achive sub-second failover and would like to use ERPS and MC-LAG. I have attached my topology herewith, would you please have a look of and let me know if this topology works.
first of all - next time please create a new thread to avoid discussing several issues on top of each other. Makes it easier for people to search for the right information later on instead of looong threads 🙂
For your question; I think you will have problems with this design as ERPS relies on defining east/west physical interfaces and link-down on each of these will trigger a notification for the other members of the ring. With MC-LAG you create a aggregated ethernet link with at least two physical ports between your QFX switches to avoid have the inter-chassis-link down at any time. This design works against the ERPS design.
Furthermore ERPS is not supported on QFX5000 series or EX4600 in virtual chassis - so doing a virtual chassis per site and then ERPS between the VC's won't be an option either. I suggest this isn't supported due to the architecture with a Junos RE VM running on-top of the physical hardware with a risk of too high latency in ERPS keep alive/notifications.
Only viable design I can think of with the proposed hardware and physical design, would be to have layer3 links betweens each site and then do your stretched layer2 via EVPN-VXLAN. Requires licenses on the QFX5110's and quite more configuration on each switch.. but it would do the job.