Ever since the inception of Kubernetes, the growth in the adoption of containerized application has been in leaps and bounds. Kubernetes takes care of automating deployment, scaling, and management of containerized applications. Workloads such as pods run on compute nodes in a cluster. The process of assigning the pods to a node is called scheduling.
While Kubernetes provides a default scheduler which considers CPU and memory usage on nodes to find the best fit to run pods, running modern applications not just requires CPU and memory but also network resources. Network resources on a node are limited and the default scheduler does not consider the usage of these resources. By ignoring the network state, an application could be placed on a node which cannot provide the required network resources.
Consider an example, where a node has sufficient CPU and memory to run an application but there could be a possibility that there is network congestion on it. By ignoring this, pods could be placed on this node and would result in further decrease in performance and degrade the application.
Juniper's CN2 provides powerful CNI capabilities to deliver feature-rich networking capabilities and possesses a deep understanding of network metrics and its utilization. This is where Juniper has taken a step-forward to extend the functionality of the Kubernetes default scheduler and develop plugins to create a custom scheduler which will not just consider the traditional resources but also utilize the network metrics to make intelligent decisions to schedule workloads/pods on nodes. CN2 Network Aware Pod Scheduling inherits all the functionality of the default scheduler provided by Kubernetes and both schedulers can run side-by-side in a cluster.
Let's take a closer look to understand how the custom scheduler enables the network resources to be utilized in an efficient manner and reduce the chances of failures that might arise. Consider a scenario where users want to run their applications on highly performant network node featuring DPDK. The default scheduler doesn't know about network resources or limitation and does not consider these metrics for scheduling pods on nodes. If users had to just use the default scheduler there may be a possibility that pods can be placed on a node where resources have already reached threshold. This results in losing pod connectivity making your workloads useless and debugging this issue would also be challenging. CN2 Network Aware pod scheduling solves this issue by introducing a VMICapacity plugin which takes into consideration the network interfaces and resources on nodes and limits scheduling of pods on nodes which has already hit the threshold. This plugin ensures that there is balance on pods-to-nodes for the VMI metric. Nodes with least active network would always be ranked higher for scheduling pods. It also reports the kubelet if none of the nodes are available for scheduling with a clear message. Blocking the scheduling of pods in such scenario helps preserve the health of the cluster.
As seen in the diagram above, metrics are collected from individual nodes through metrics-collector and stored in central collector, aka centralized storage. The custom scheduler can use these metrics and rank the nodes accordingly to influence the scheduling decisions. Some of the metrics collected are: Active flows, Active VMIs, Bandwidth Usage, etc. Users can configure the scheduler to make use of all these metrics or combination of these to schedule pods.
-
Deploy the CN2 custom scheduler.
-
Create pods from manifests, providing the schedulerName configured for the scheduler.
Now every time a pod is created using the custom scheduler, it will also consider the network usage of nodes as per configuration before binding the pods to a node. This ensures that the pods are always scheduled on the nodes with less network usage from the pool of qualified nodes.
------------------------------
Chandrabhan Singh
------------------------------