To troubleshoot reachability issues (ping failures) using network policy to exchange routes between virtual networks:
Before doing anything else, check the status of the source and destination virtual machines.
Check the virtual machine status in the Contrail UI:
Check the tap interface status in the http agent introspect:
When the virtual machine status is verified Up, and the tap interface is Active, you can focus on other factors that affect traffic, including routing, network policy, security policy, and service instances with static routes.
With the virtual machine status verified Up, the next step is to validate all of the routing and reachability factors.
Use the following troubleshooting guidelines whenever you are experiencing ping failures on virtual network routes that are connected by means of network policy.
Check the network policy configuration:
Use the following sequence in the Contrail UI to check policies, attachments, and traffic rules:
Check VN1-VN2 ACL information from the compute node:
Check the virtual network policy configuration with route information:
Check the VN1 route information for VN2 routes:
If a route is missing, ping fails. Flow inspection in the compute node displays Action: D(rop).
Repeated dropstats commands confirms the drop by increasing the Flow Action Drop counter with each iteration of dropstats.
Flow and dropstats commands issued at compute node:
To help in debugging flows, you can use the detailed flow query from the agent introspect page for the compute node.
Fields of interest –
Inputs [from flow –l output]: src/dest ip, src/dest ports, protocol, and vrf
Output from detailed flow query: short_flow, src_vn, action_str->action…
Flow command output:
Fetching details of a single flow:
Output from FetchFlowRecord shows unresolved IPs:
You can also retrieve information about unresolved flows from the Contrail UI, as shown in the following:
If you are still experiencing reachability issues, troubleshoot any protocol-specific action, where routes are exchanged, but only specific protocols are allowed.
The following shows a sample query on a protocol-specific flow in the agent introspect:
The following shows that although the virtual networks are resolved (not __UNKNOWN__), and not a short flow (the flow entry exists for a defined aging time) the policy action clearly displays deny as the action.
This example described debugging for policy-based routing, only. However, in a complex system, a virtual network might have one or more configuration methods combined that influence reachability and routing.
For example, a scenario might have a virtual network VN-X configured with policy-based routing to another virtual network VN-Y.
At the same time, there are a few virtual machines in VN-X that have a floating IP to another virtual network VN-Z that is connected to VN-XX via a NAT service instance.
In a complex scenario, you need to debug step-by-step, taking into account all of the features working together.
Additionally, there are other considerations beyond routing and reachability that can affect traffic flow. The rules of network policies and security groups can affect traffic to the destination. Also, if multi-path is involved, then ECMP and RPF need to be taken into account while debugging.