Switching

Expand all | Collapse all

MTU issue from VXLAN/EVPN to legacy network via VC

Jump to Best Answer
  • 1.  MTU issue from VXLAN/EVPN to legacy network via VC

    Posted 02-12-2020 03:48

    Anyone could share a hint on this?

     

    I had a direct L2 connection between legacy L2-only network and an EVPN VXLAN based newer implementation, described in the drawing below. Everything was working fine until we advanced the deployment to the next phase, which is described in the second diagram. This phase consisted of adding a virtual chassis in between the networks to provide redundancy (Long story short: QFX in EVPN mode drops STP packets, but VC can handle STP to provide redundancy for the legacy side. To VC we can connect via ESI LAG for redundancy on the new side).

     

    Now a standard Windows server in the new network (left side) cannot ping or otherwise access some hosts on the right side with packets larger than 1422 bytes. Some it can, but some not in the same VLAN. Doesn't depend on the hardware, it cannot for example ping a management interface of a random firewall or a VM running on Xenserver. Another VM on Vmware or another Xenserver pools answers just fine. The problem is that it was working just fine before adding the VC and nothing else changed in the old or new network.

    I did a packet capture (also below) on a right-side virtual machine running on Xenserver and it does in fact receive the packets just fine and answers them but the return packets are lost. When the ping size is at or below 1422 bytes the return packets do not get lost.

     

    First phase, everything works. Left side host can ping anything in right side:

     

    Untitled Diagram (2).png

     

    Second phase: Left side host cannot ping some stuff in the right side network if packet size over 1422:

     

    Untitled Diagram.png

     

    Packet capture on 172.16.4.2 host. 172.16.4.37 is a physical Windows server on the left side behind EVPN/VXLAN, 172.16.4.2 is right side, a Windows VM on a Xenserver host. First 1400 bytes packets where replies make all the way through, then 1500 bytes which is obviously received and replied but 172.16.4.37 only sees timeout:

    Image Pasted at 2020-2-12 12-59.png

    C:\Users\Administrator.MGMT>ping -l 1400 172.16.4.2
    Pinging 172.16.4.2 with 1400 bytes of data:
    Reply from 172.16.4.2: bytes=1400 time=2ms TTL=128
    Reply from 172.16.4.2: bytes=1400 time=2ms TTL=128
    
    C:\Users\Administrator.MGMT>ping -l 1500 172.16.4.2
    Pinging 172.16.4.2 with 1500 bytes of data:
    Request timed out.

     

    However, 172.16.4.37 can ping an another similar Windows VM running on Vmware on right side network just fine:

    C:\Users\Administrator.MGMT>ping -l 1500 172.16.4.3
    Pinging 172.16.4.3 with 1500 bytes of data:
    Reply from 172.16.4.3: bytes=1500 time=11ms TTL=128

    Relevant configuration from vc1, leaving mtu definitions out does not change behaviour:

    xe-0/2/1 {
        description dc1-Core1-D2;
        mtu 9216;
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members all;
                }
                storm-control default;
                recovery-timeout 900;
            }
        }
    }
    
    xe-1/2/1 {
        description dc1-Core2-D2;
        mtu 9216;
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members all;
                }
                storm-control default;
                recovery-timeout 900;
            }
        }
    }
    
    ae0 {
        mtu 9216;
        aggregated-ether-options {
            lacp {
                active;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members all;
                }
                storm-control default;
                recovery-timeout 900;
            }
        }
    }
    
    > show configuration vlans 
    vlan_12 {
        description xxxxxx;
        vlan-id 12;
    }

     

    Leaf1 configuration relevant parts:

    ### OLD INTERFACE which worked directly connected ###
    xe-0/0/47 {
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members VNI_12;
                }
                storm-control default;
                recovery-timeout 3600;
            }
        }
    
    ae0 {
        mtu 9216;
        esi {
            00:00:00:00:00:00:00:00:00:01;
            all-active;
        }
        aggregated-ether-options {
            lacp {
                active;
                system-id 00:00:00:00:00:01;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members all;
                }
                storm-control default;
                recovery-timeout 3600;
            }
        }
    }
    }
    
    > show configuration vlans 
    VNI_12 {
        vlan-id 12;
        vxlan {
            vni 12;
        }
    }
    VNI_1_DONOTUSE {
        vlan-id 1;
        vxlan {
            vni 1;
        }
    }

     

    Now the question is.. How did the situation change when we added the VC in between? Nothing else was changed in either networks or hosts, but the behaviour changed. ESI LAG shouldn't be adding any overhead, neither the VC, so.. what happens then?



  • 2.  RE: MTU issue from VXLAN/EVPN to legacy network via VC
    Best Answer

    Posted 02-12-2020 04:56

    This was resolved. Links between Leaf2 and the spines were MTU 1514, so combined with the LAG hashing algorithm that caused the packets of some hosts to be dropped.

     

    However I wonder why the traffic ever went that route. The 172.16.4.37 host is only connected to an another leaf connected only to leaf1. Going via leaf2 adds an additional hop to pass through the spine, but.. Well, that's an another story to resolve.

     

    Figured out that too. It's just the hashing algorithm of the LAG, and the problem only occures with reply packets because outgoing packets are going leaf1->vc->legacy always in this case, but return packets are distributed on both LAG links. The host connection via single leaf is  a temporary configuration which will be dismantled in a month and replaced with dual links, so won't bother more with that.