Hello everyone,
We are experiencing a very problem inside our network. We are a tier 3 ISP and have different locations where we run our routers. These routers are currently being connected to each other using Dark Fibre. In our data centre we have two main routers where we peer with our ISP's. All routers are connected to each other using OSPF. We are also using MPLS and L3VPN. For years this setup has worked for us. Our Dark Fibre supplier decided (about four months ago) to change one of their switches. They are running an Extreme switch and EVPN. All of our locations that are connected to this new switch have connection problems (things like connection interruption and speeds). The other locations are working just fine.
When we do a test (such as file copy, SCP) on one of the locations that experience this problem, the copy stops at 2% and sometimes even stuck at 0%. Our internal services run in different zones which are attached to VRF's. We have WAN instances on our data centre routers for these VRF's. We haves tested many many things, but now comes the weird part. When we add an extra push label, we've noticed that this issue is gone! We did this by disabling all the VRF's on the second router in our data centre. This way anything that wants to reach the internal services HAS to go to the first router (which has enabled and running VRF's) and then through the second router (which has everything the first router has expect for the VRF's, those are things like OSPF, LDP, MPLS and BGP so no VRF's). We verified this by checking the routing tables of MPLS and BGP.L3VPN. How can this happen? and is there a solution for this problem other than disabling the VRF's to create the extra hop? How can the new switch of our ISP cause this?
Thanks in advance!
Best regards,
------------------------------
Mohammad Ayash
------------------------------