Blog Viewer

Troubleshooting: SSH/TCP traffic fails in Contrail due to checksum errors

By Erdem posted 09-23-2015 13:56

  

SSH/TCP Traffic Flow Fails Between Virtual Machines

 

In Contrail, SSH/TCP traffic flow may fail in between virtual machines (VMs) hosted on different compute nodes because of checksum errors.

 

SSH/TCP traffic may fail between the following compute nodes:

 

  • VMs on different compute nodes
  • Compute nodes and the MX Series router

Use Case Example

 

The following use case shows how SSH traffic may fail between different VM hosts:

 

You have two host compute nodes:

 

  • Compute: Host1 ( VM1, VM2 in network A)
  • Compute: Host2 ( VM3, VM4 in network A)

You observe the following traffic flow behavior between the four VMs:

 

  • VM1 to VM2 - SSH traffic successful
  • VM1 to VM3 - SSH traffic fails
  • VM2 to VM4 - SSH traffic fails
  • VM3 to VM4 - SSH traffic successful

The SSH traffic flow from VM1 to VM2 and from VM3 to VM4 passes through successfully as the traffic is processed at the vRouter within the compute node, but the traffic won't pass through the physical NIC of the compute node because the VM pair is hosted on the same compute node.

 

However, SSH traffic flow from VM1 to VM3, and from VM2 to VM4 must pass through the physical NIC of the compute nodes. Sometimes traffic is dropped when SSH traffic has to pass through the physical node.

 

Some causes for the dropped traffic may include the following:

  • Nested VMs
  • VMs running in ESXi Hypervisor
  • NIC cards that do not support checksum and generate an incorrect checksum, such as Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe NIC cards

Checking for checksum Errors

 

To check for any incorrect checksum errors on the host and the VM, enter:

 

tcpdump -i <interface> -v -nn | grep -i incorrect

In Contrail, the vRouter uses NIC cards to perform a checksum offload procedure (after adding the header to the packet) for the inner packet. However, some NIC cards do not perform this offload procedure correctly.

 

If the tcpdump command output generates any errors, you should turn off/disable the tx-checksumming parameter on the compute node data interface using the ethtool utility from the CLI.

 

Important: Before turning off/disabling the tx-checksumming parameter, you should note the driver name and hardware currently in use.

 

By default, the ethtool utility is not installed on the servers, and you must first install it on the compute nodes.

 

  • If the OS is CentOS, you can install it by entering:

yum install ethtool
  •  If the OS is Ubuntu, you can install it by entering:
sudo apt-get install ethtool

 After you complete the ethtool utility installation, follow these steps to disable tx-checksumming:

 

  1. Determine the interface used for data traffic by mapping the MAC address of vhost0 to the physical interface's MAC address on the compute node.
  2. Enter the following command to turn off/disable tx-checksumming where eth1 is the data interface:

    ethtool -K eth1 tx off 

 Repeat steps 1 and 2 for each compute node.

 

To verify the status of the tx-checksumming parameter, enter:

 

ethtool -k <interface_name> 

The generated output indicates whether the tx-checksumming parameter is on or off.

 

By disabling the tx-checksumming parameter, the SSH/TCP traffic is permitted to flow, particularly in failed use cases, without any issues.


#TCP
#Contrail
#checksumerrors
#ssh
#How-To

Permalink