Im working on a set of QFX 10008s running VRRP over IRB interfaces to provide a gateway for the connected Vlans. We have an issue where some random traffic is getting dropped. While troubleshooting with ATAC. We noticed that both nodes seem to respond to arp for the virtual mac address. For instance -- while in a lab if i ssh to the gateway of the network i am on, the node that responds is always the first node it hits via the lag. So if i remove the link to Node1 VRRP MASTER, Node 2 Responds to the ssh request? Im a little fuzzy on how this Active - Active VRRP over IRB config is supposed to work but does that sound right to anyone?
Also We have more than 255 routed vlans on here and we are limited by vrrp groups up to 255. We have are currently using group 10 for all vrrp config. Is that ok considering they are all in a different broadcast domain?
Thanks for your help!!
Not sure your concern regarding the ARPs, as both devices should respond with same [well known] MAC for the VRRP Gateway address. Client should have no issue handling this.
As for VRRP, Group # only has local significance for the VLAN, so yes same number can be used. People generally use different numbers (like map to VLAN #) to aid in any troubleshooting. The Group # will affect the format of the well known MAC (01:00:5E . . .)
Harold, are you with Met Museum in NYC?
Just wondering, . . .
Yes i am.
Hi Harold,As the name suggesst in Active-Active mode any peer can respond to ARP and forward the traffic.
If you are getting confused because VRRP state on the device is backup but its responding to ARP/Forwarding traffic, you should know that in a VRRP-based Layer 3 solution, even the VRRP backup node forwards traffic. VRRP comes handy in assinging a VIP and VMAC but unlike VRRP the ARP entries are synced on both the MC-LAG peers which makes the active-active forwarding possible.PS: Please accept my response as solution if it answers your query, kudos are appreciated too!
This makes Perfect Sense.
Also, one more question about this setup.
A-TAC is recommending we place the iccp-peer-down prefer status control active on both nodes. However Many documents say dont do this. Is this a best Practice?
Hi Harold,I'm not familiar with the context of your conversation with TAC Team so can't comment on that.However, in general active is configured only on one of the peer so that traffic gets forwarded via one peer and split brain scenario doesn't arise when the ICCP peer goes down.
Thanks for the info on this. We removed that line and all appears to be stable now.
I have always liked this document, it explains very well some of the concerns mentioned here in your post. if you haven't, you should take a look.
Best Practices and Usage Notes