Log in to ask questions, share your expertise, or stay connected to content you value. Don’t have a login? Learn how to become a member.
We have a network setup between two locations. Those locations are connected to each other using a transit network provider. We are running MPLS, LDP, BGP and L3VPN. Whenever we try to copy something as a test over our L3VPN-VRF (a file, doesn't matter what the type is) from location A to location B, the traffic drops. This is random, there is no pattern here where it fails, sometime it fails around 2% and sometimes at 99%. When we do the same test from location B to location A, the test succeeds. Strange isn't it?
So we did a Wireshark capture on two virtual machines connected directly to the routers on both locations ( so no firewalling here for the sake of eliminating anything that would block the traffic, packets, etc.). When we do the test from location A to B, we see a lot of TCP DUP ACKs and then fast retransmissions and the eventually spurious retransmissions and here is where the copy test fails. When we do the same test from location B to A we only see DUPs and ACKs, but the test succeeds. This is also over the VRFs. We did a layer 2 test, so no VRF's or VPNs and that also succeeded. So there is a problem with the L3VPN and our provider is doing something with the packets what we don't understand and no the provider didn't really want to help claiming they doing "nothing" (PS: this phenomenon began to happen when they decided to change one of their switches with a new one, Extreme Switch).
So we decided to dig deeper and found out that when we issue "ping mpls l3vpn prefix xxx.123.123.2 test-vrf.inet.0 sweep" from location A to B, the sweep says mtu size is 0, which means the traffic is dropped, whenever we specify bottom-label-ttl 2 in the ping command, the sweep succeeds as expected. We did the same from location B to A WITHOUT the bottom-label-ttl and it succeeded, this is also where the copy test is working. How could someone change the TTL of the L3VPN? We tested the no-propagate-ttl under MPLS of router A and it didn't seem to have changed anything, also did this on VRF-level. Maybe we are doing this on the wrong place/router? Any ideas?