This should be a simple question, but hours of googling and reading of reference designs still haven't given me a decent answer. Could anyone enlighten me a bit? Basically the question is: How do you implement the (preferably in-band) management access to the EVPN/VXLAN based network with QFX switches?
We are migrating a legacy L2 datacenter network to a newer design, pretty much based on Juniper bridged overlay reference design. EBGP as the underlay, IBGP with spines as route reflectors as the overlay. The actual production traffic routing is not done on the switches, but on SRX devices connected to a pair of leaf switches.
The legacy network switches have management address in one subnet in a management VLAN and that's pretty much what I would like to do in the new network also at least during the migration period. The new network will span two datacenters (not really independent ones though, routing is done only in the primary one at this phase so they can be treated just as a bit physically separate racks) and some of the racks will be physically located a bit further in another datacenter tunnel from the spine switches, so at least for those switches I would prefer to avoid extra out-of-band management cabling.
My first logical approach would have been to add an irb interface on a switch with IP address in the management subnet, set it up as l3 interface of a VLAN/VNI and tag that VLAN on the trunk link connected to the legacy network. That doesn't seem to work so easily, I can see MAC addresses of the router and the switch in respective ARP tables, but no traffic is passing through and the switch is adding EVPN/7 routes in inet.0 routing table for all the IP addresses it sees on that VLAN. Also, later on I realized this approach wouldn't work very well anyway, because I guess we wouldnt want to bring the overlay VLANs (and VTEPs) to the spine switches.
Other options I can think of would be managing the switches via the overlay loopback addresses, but that has quite a few problems too. First of all, if the overlay network is down also the management is down. We will anyway have a separate backup management route via the em0.0 interfaces though, so it's maybe not a huge problem. The second problem is that we would have to somehow connect the routing device (SRX) to the overlay to actually be able to access the management addresses from the management workstations.
Another option would be to use entirely separate switches and do the management out of band via separate network. For the datacenter interconnect that wouldn't be so huge problem, could just run it on a VLAN. The lone switch pairs in physically distant locations (another tunnel) would be an annoyance though, connecting them would require extra cabling. We will anyway be creating a separate backup management infrastructure for the critical switches, but I would love to have the primary management done in-band and the out-of-band connection be just for the backup use.
Might be a bit confusing explanation, but I hope someone could point me into a correct direction. The reference designs don't really seem to clarify how is the management access handled.
Update to this: I managed to clear my thoughts a bit with some help. First of all, the IRB interface not working was my own stupidity. The subnet was actually in different VLAN than what I was trying to use, and also the connection to the legacy network via a multihomed trunk caused part of the problem.
Then to the actual solution, at least one kind of: The loopback addresses are not actually on the overlay, but exported to the underlay and seems like they are a good way to do the inband management. I ended up joining a routing instance on the SRX to the underlay. There's a subnet between an VLAN interface on the SRX and an IRB interface on two leaf switches and the underlay loopback addresses are advertised to the SRX via that link. It should be sufficient for the migration use, and later on the IRBs will be replaced by direct physical interface connection to the SRX.
The out of band management network will also be built, but it will be used for backup purposes and maybe not connected to every switch in the fabric. Only to the ones which are critical and which can be cabled with reasonable effort.
If someone is wondering the same problem later on I'm not saying this is the best solution, but at least it works. Comments and better suggestions are welcome.
The loopback addresses are not actually on the overlay, but exported to the underlay and seems like they are a good way to do the inband management. => BINGO!!!
Just took some time (and a hint) to get my brain around it. 🙂