TechPost

 View Only

Multi-Geo Overlay Procedure

By Steve Onishi posted 09-14-2025 09:51

  

Multi-Geo Overlay Procedure

The procedure for building a multi-geography multi-cluster from three Red Hat OpenShift Container Platform (RHOCP) clusters. The constructed multi-cluster will be capable of supporting Broadband Edge (BBE) cloud-native applications such as BNG CUPS Controller and Address Pool Manager (APM) in a geo-redundant capacity.

Introduction

A typical Kubernetes cluster is comprised of at least 3 control-plane sites/nodes and 3 worker sites/nodes. For cost and simplicity, worker and control-plane functions may be combined into a hybrid node at the expense of redundancy. The cluster is accessed from a remote host through the Kubernetes API and the container registry (for pushing images). The remote host is often referred to as a jump or bastion host (see Figure 1).

Figure 1. Logical Representation of a Kubernetes Cluster

Figure 1. Logical Representation of a Kubernetes Cluster

Multi-geography redundancy for Juniper Cloud-native Network Functions (CNFs) such as APM and BNG CUPS Controller, requires a topology of three Kubernetes clusters. Each cluster has at least three control-plane sites and three worker sites (or simply 3 combined control-plane/worker or hybrid nodes) to support basic redundancy and availability requirements. Each cluster is in its own geography or availability zone. 

One cluster is designated as the Management Cluster and has reachability to the other two clusters which are designated as Workload Clusters. All three clusters are joined into a multi-cluster with the addition and setup of Karmada and Submariner Open Source Software (OSS) packages. A separate jumphost or bastion host with access to all three clusters will be used as the installation platform (see Figure 2).

Figure 2. A Multi-Geo Multi-Cluster

Figure 2. A Multi-Geo Multi-Cluster

The procedure outlined in this document describes the steps needed to install and deploy the OSS such that a multi-geo multi-cluster is realized. Instructions for performing air-gapped and non-air-gapped installation are provided. Procedures for air-gapped are denoted in light-blue text.

Pre-Requisites

Jumphost

A bastion host or jump host serves as a secure location for installing, configuring, and operating the OSS on the K8s clusters that will form the multi-cluster. The jumphost has administrative access to the K8s REST APIs of all three clusters.

Hardware dimensions

  • CPU: 2 cores
  • Memory: 8 GiB
  • Storage: 128 GiB

Software dimensions

  • OS: Ubuntu 22.04 LTS
  • User account with sudo privileges 

Network access

For non-air-gapped installation, Internet access will be needed to download OSS components. Specifically, to the following sites:

The jumphost must be able to access the Container registry and the K8s REST API of the Management Cluster and each of the Workload Clusters.

Management Cluster

The Management Cluster hosts the Karmada OSS in a separate Kubernetes context. The regular context can host CNF application components. The Management Cluster is a redundant cluster (minimum of three hybrid nodes). The Dimensions of each node are as follows.

Hardware Dimensions

  • CPU: 8 cores
  • Memory: 24 GiB
  • Storage: 256 GiB

Software Dimensions

  • OS: RHOCP 4.16 or later with 
    • CNI: OVN
    • Registry: Openshift Image Registry
    • NLB: MetalLB
  • Pod & Service CIDRs: default

Network Access

Internet access will be needed to download OSS components. Specifically, to the following sites:

  • docker.io
  • registry.k8s.io
  • quay.io

The Management Cluster must be able to access the K8s REST API of each Workload Cluster.

Workload Clusters

The Workload Clusters host the bulk of the Multi-geo enabled CNF Application micro-services. Each Workload Cluster is a redundant cluster (minimum of three hybrid nodes). The Dimensions of each node are as follows:

Hardware Dimensions

  • CPU: 16 cores (if only APM application deployment, this can be scaled down to 8 cores)
  • Memory: 64GiB
  • Storage: 512 GiB

Software Dimensions

  • OS: RHOCP 4.16 or later with 
    • CNI: OVN
    • Registry: Openshift Image Registry
    • NLB: MetalLB
    • CSI: Longhorn
  • Pod & Service CIDRs: Each Workload cluster must have different (non-overlapping) POD and Service CIDRs (since Submariner will join the two Workload Cluster internal networks, each internal network address space must not overlap with the other)

Network Access

Internet access will be needed to download OSS components. Specifically, to the following sites:

  • quay.io

Each Workload Cluster must be able to reach its peer Workload Cluster. Latency between Workload Clusters must be below 200ms.

Preparing the Jumphost

Kubeconfigs

Set up the kubeconfig on the jumphost such that it contains an admin context for each of the three clusters. As these context names may also be incorporated into pod names to distinguish them in the multi-cluster, it is critical to ensure that context names are simple and restricted to lowercase characters and hyphens (‘-‘). OpenShift generates contexts upon ‘oc login’ that incorporates the login name, namespace, port numbers, and host name delimited by ‘/’ and ‘:’ characters; do NOT use these contexts when constructing the multi-cluster. In this procedure, we will create contexts named mgmt, workload-a, and workload-b to represent the three clusters. The following best-practice procedure is recommended:

  • 1. Generate a new admin kubeconfig for each of the clusters using the oc config new-admin-kubeconfig command. Store each generated kubeconfig in $HOME/.kube as mgmt.yaml, workload-a.yaml, and workload-b.yaml, with 0o600 permissions respectively. 
  • 2. In each of the kubeconfigs, rename the cluster and context to match the corresponding filenames (sans extension/type) and the user to the same name with a suffix of “-admin” (e.g. mgmt.-admin). Dumping each context using the associated kubconfig should appear as:
root@jumphost:~# oc --kubeconfig .kube/mgmt.yaml config get-contexts
CURRENT   NAME   CLUSTER   AUTHINFO     NAMESPACE
*         mgmt   mgmt      mgmt-admin
root@jumphost:~# oc --kubeconfig .kube/workload-a.yaml config get-contexts
CURRENT   NAME         CLUSTER      AUTHINFO           NAMESPACE
*         workload-a   workload-a   workload-a-admin
root@jumphost:~# oc --kubeconfig .kube/workload-b.yaml config get-contexts
CURRENT   NAME         CLUSTER      AUTHINFO           NAMESPACE
*         workload-b   workload-b   workload-b-admin
  • 3. Merge the three kubeconfigs into the default kubeconfig.
root@jumphost:~# export KUBECONFIG=".kube/mgmt.yaml:.kube/workload-a.yaml:.kube/workload-b.yaml" 
root@jumphost:~# oc config view --raw --flatten >.kube/config
root@jumphost:~# chmod 0o600 .kube/config
root@jumphost:~# unset KUBECONFIG     ## to use the merged config by default
root@jumphost:~# oc config get-contexts
CURRENT   NAME         CLUSTER      AUTHINFO           NAMESPACE
*         mgmt         mgmt         mgmt-admin
          workload-a   workload-a   workload-a-admin
          workload-b   workload-b   workload-b-admin

Utilities

Install the following utilities on the jumphost:

  • Helm v3
  • Kubernetes CLI (kubectl): 1.25 or later
  • Openshift CLI: 4.16 or later

Karmada kubectl plug-in

Install the Karmada kube plug-in with the following commands:

$ # Download the tarball
$ curl -fsSL "https://github.com/karmada-io/karmada/releases/download/v1.13.1/kubectl-karmada-linux-amd64.tgz" -o kubectl-karmada-linux-amd64.tgz
$ # Extract the contents
$ tar xfz kubectl-karmada-linux-amd64.tgz
$ # Install the binary
$ sudo install kubectl-karmada /usr/local/bin

Verify the plug-in version as 1.13.1:

$ kubectl karmada version 
kubectl karmada version: version.Info{GitVersion:"v1.13.1", GitCommit:"7c4af1bc914f4998893ce3e5b69baf7dc619803b", GitTreeState:"clean", BuildDate:"2025-03-29T03:27:17Z", GoVersion:"go1.22.12", Compiler:"gc", Platform:"linux/amd64"}

Submariner

Download v0.20.0 of the submariner utility, subctl and install it to /usr/local/bin:

$ # Download the tarball
$ curl -fsSL "https://github.com/submariner-io/releases/releases/download/v0.20.0/subctl-v0.20.0-linux-amd64.tar.xz" -o subctl-v0.20.0-linux-amd64.tar.xz
$ # Extract the contents
$ tar xfJ subctl-v0.20.0-linux-amd64.tar.xz
$ # Install the binary
sudo install subctl-v0.20.0/subctl /usr/local/bin

Preparing the Registries (Air-gapped)

This step is only needed for air-gapped installations.

Pull or transfer the following public images to the jumphost:

quay.io/submariner/lighthouse-agent:0.20.0
quay.io/submariner/lighthouse-coredns:0.20.0
quay.io/submariner/nettest:0.20.0
quay.io/submariner/submariner-gateway:0.20.0
quay.io/submariner/submariner-operator:0.20.0
quay.io/submariner/submariner-route-agent:0.20.0
registry.k8s.io/kube-apiserver:v1.31.3
registry.k8s.io/kube-controller-manager:v1.31.3
registry.k8s.io/etcd:3.5.16-0
docker.io/nginxinc/nginx-unprivileged:1.27.5
docker.io/karmada/karmada-aggregated-apiserver:v1.13.1
docker.io/karmada/karmada-controller-manager:v1.13.1
docker.io/karmada/karmada-scheduler:v1.13.1
docker.io/karmada/karmada-webhook:v1.13.1
docker.io/karmada/karmada-operator:v1.13.1
docker.io/karmada/karmada-descheduler:v1.13.1
docker.io/karmada/karmada-scheduler-estimator:v1.13.1

Create the namespaces that Karmada and Submariner will be deployed to on each cluster:

oc --context mgmt create ns karmada-system
oc --context mgmt create ns submariner-operator
oc --context workload-a create ns submariner-operator
oc --context workload-b create ns submariner-operator

Re-tag the following images for the Management Cluster, taking note that the repository for each image must be adjusted to match the associated namespace name:

<mgmt.cluster.registry.address>/submariner-operator/submariner-operator:0.20.0
<mgmt.cluster.registry.address>/karmada-system/kube-apiserver:v1.31.3
<mgmt.cluster.registry.address>/karmada-system/kube-controller-manager:v1.31.3
<mgmt.cluster.registry.address>/karmada-system/etcd:3.5.16-0
<mgmt.cluster.registry.address>/karmada-system/nginx-unprivileged:1.27.5
<mgmt.cluster.registry.address>/karmada-system/karmada-webhook:v1.13.1
<mgmt.cluster.registry.address>/karmada-system/karmada-aggregated-apiserver:v1.13.1
<mgmt.cluster.registry.address>/karmada-system/karmada-controller-manager:v1.13.1
<mgmt.cluster.registry.address>/karmada-system/karmada-scheduler:v1.13.1
<mgmt.cluster.registry.address>/karmada-system/karmada-operator:v1.13.1
<mgmt.cluster.registry.address>/karmada-system/karmada-descheduler:v1.13.1
<mgmt.cluster.registry.address>/karmada-system/karmada-scheduler-estimator:v1.13.1

Re-tag the following images for each of the workload clusters, taking note that the repository for each image must be adjusted to match the associated namespace name:

<workload-a.cluster.registry.address>/submariner-operator/submariner-operator:0.20.0
<workload-a.cluster.registry.address>/submariner-operator/lighthouse-agent:0.20.0
<workload-a.cluster.registry.address>/submariner-operator/lighthouse-coredns:0.20.0
<workload-a.cluster.registry.address>/submariner-operator/nettest:0.20.0
<workload-a.cluster.registry.address>/submariner-operator/submariner-gateway:0.20.0
<workload-a.cluster.registry.address>/submariner-operator/submariner-operator:0.20.0
<workload-a.cluster.registry.address>/submariner-operator/submariner-route-agent:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/submariner-operator:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/lighthouse-agent:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/lighthouse-coredns:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/nettest:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/submariner-gateway:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/submariner-operator:0.20.0
<workload-b.cluster.registry.address>/submariner-operator/submariner-route-agent:0.20.0

Push all the re-tagged images to the cluster registries.

Installing Submariner 0.20.0

Submariner is a CNCF Sandbox Project. Submariner enables the interconnection of the internal networks of two K8s clusters through an L3 tunnel. The interconnection enables cluster-internal communication between the application workloads on each cluster.

The Submariner Broker will be installed on the Management Cluster while a Submariner instance will be installed on each of the Workload Clusters. For the Submariner instances to reach the Broker's API, they must be able to resolve the Kubernetes API (e.g. api.<clusterName>.<domain>) on the Management Cluster. This DNS name needs to be resolved through a proper DNS lookup. Add the following A-records to the DNS server whose IP address was specified during ISO creation:

api.<mgmtClusterName>.<domain>       A    <ClusterMgmtIP>
*.apps.<mgmtClusterName>.<domain>    A    <ClusterMgmtIP>   

From the shell of one of the Workload Cluster nodes (for RHOCP Workload clusters access the shell via ‘oc debug’) perform an nslookup against the DNS name to ensure it resolves.

sh-5.1# nslookup api.<mgmtClusterName>.<domain>      
.
.
Name:    api.<mgmtClusterName>.<domain>
Address: <ClusterMgmtIP>

Next, deploy the broker to the Management Cluster context using the management cluster’s kubeconfig:

Air-gapped:

subctl deploy-broker --context mgmt --repository image-registry.openshift-image-registry.svc:5000/submariner-operator

Non-air-gapped:

$ subctl deploy-broker –-context mgmt

If successful, the broker deployment will create a file called broker-info.subm. Run subctl --context mgmt show brokers to verify the installation. 

✓ Detecting broker(s)
NAMESPACE               NAME                COMPONENTS                        GLOBALNET   GLOBALNET CIDR   DEFAULT GLOBALNET SIZE   DEFAULT DOMAINS   
submariner-k8s-broker   submariner-broker   service-discovery, connectivity   no          242.0.0.0/8      65536

Now, join each of the WL Clusters together via the Broker:

Air-gapped (add the --repository flag to use the cluster's private registry):

$ subctl join --context workload-a --air-gapped --repository image-registry.openshift-image-registry.svc:5000/submariner-operator --natt=false broker-info.subm
$ subctl join --context workload-b --air-gapped --repository image-registry.openshift-image-registry.svc:5000/submariner-operator --natt=false broker-info.subm

Non-air-gapped:

$ subctl join --context workload-a --air-gapped --natt=false .local/jnpr-multicluster-submariner/broker-info.subm
$ subctl join --context workload-b --air-gapped --natt=false .local/jnpr-multicluster-submariner/broker-info.subm

If both WL Clusters are joined successfully, we can verify the Submariner deployment by running a set of unit tests via subctl. These tests take about 5 minutes to run. Disable disruptive verifications when prompted. The following command tests the connectivity from the workload-a Cluster’s K8s context to workload-b Cluster’s context (see kubectl config get-contexts’):

Air-gapped:

$ # Warnings displayed about "system:authenticated" not being found are expected and can be ignored
$ oc --context workload-a -n submariner-operator policy add-role-to-group system:image-puller system:authenticated
$ oc --context workload-b -n submariner-operator policy add-role-to-group system:image-puller system:authenticated

Verify the installation:

$ subctl verify --context <workload1Context> --tocontext <workload2Context> 
? You have specified disruptive verifications (gateway-failover). Are you sure you want to run them? (y/N) N

Currently, there is a known issue with Submariner tests with RHOCP 4.18 (see Connectivity test fails with OCP 4.18). We expect the following test failures:

Summarizing 2 Failures:
    [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane, basic]
    github.com/submariner-io/shipyard@v0.20.0/test/e2e/tcp/connectivity.go:72
    [FAIL] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod [dataplane, basic]
    github.com/submariner-io/shipyard@v0.20.0/test/e2e/tcp/connectivity.go:72
Ran 17 of 48 Specs in 318.692 seconds
FAIL! -- 15 Passed | 2 Failed | 0 Pending | 31 Skipped

These failures are benign and are to be ignored.

Installing Karmada 1.13.1

Karmada is a CNCF (Cloud Native Computing Foundation) Incubating Project. Karmada enables workload scheduling across multiple K8s clusters and/or clouds.

The Karmada operator will be used to install Karmada on the Management Cluster. Note that in the yaml definitions below there are certain lines marked as being needed for air-gapped installs, which can be omitted otherwise.

  • 1. Create the ‘karmada-system’ namespace/project on the Management Cluster
$ oc new-project karmada-system –context mgmt
  • 2. Bind the privileged Security Context Constraint (SCC) to the karmada-operator and default ServiceAccounts (does NOT require the karmada-operator ServiceAccount to exist) in the karmada-system namespace on the management context:
$ oc --context mgmt -n karmada-system adm policy add-scc-to-user privileged -z karmada-operator
$ oc --context mgmt -n karmada-system adm policy add-scc-to-user privileged -z default
$ curl -fsSL "https://github.com/karmada-io/karmada/releases/download/v1.13.1/karmada-operator-chart-v1.13.1.tgz" -o karmada-operator-chart-v1.13.1.tgz
$ curl -fsSL "https://github.com/karmada-io/karmada/releases/download/v1.13.1/crds.tar.gz" -o 1.13.1-crds.tar.gz
  • 4. Create a values file (values.yaml) for deploying the operator:
global: # For air-gapped
  imageRegistry: "image-registry.openshift-image-registry.svc:5000" 
installCRDs: false
operator:
  image:
    repository: karmada-system/karmada-operator # For air-gapped
    tag: v1.13.1     # Version of container image to pull
  podAnnotations:
  openshift.io/required-scc: privileged # Operator needs to store CRs to container FS
  • 5. Helm-install the operator with the above values file.
$ helm --kube-context mgmt install -n karmada-system karmada-operator --values values.yaml ./karmada-operator-chart-v1.13.1.tgz --wait 
  • 6. Verify the operator reaches a running state
$ kubectl get pods -n karmada-system –context mgmt
NAME                                                  READY   STATUS    RESTARTS   AGE
karmada-operator-6cb799fdfc-792qh                     1/1     Running   0          2m
  • 7. Once installed, the operator introduces several Custom Resource Definitions (CRDs) that can be used to configure the various Karmada components.
$ oc explain karmada.spec.components
GROUP:      operator.karmada.io
KIND:       Karmada
VERSION:    v1alpha1
FIELD: components <Object>
DESCRIPTION:
    Components define all of karmada components.
    not all of these components need to be installed.
.
.

Air-gapped:

  • For air-gapped installs, the CRDs tarball must be hosted in an offline webserver accessible by the operator.
$ # Create a ConfigMap containing the CRD tarball:
$ oc --context mgmt -n karmada-system create cm karmada-crds --from-file=./1.13.1-crds.tar.gz
  • Create a manifest (crd-webserver.yaml) for an Nginx instance serving the CRD tarball via a ClusterIP service inside Karmada's namespace:
# crd-webserver.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: karmada-crds
  namespace: karmada-system
  labels:
    app: karmada-crds
spec:
  selector:
    matchLabels:
      app: karmada-crds
  replicas: 1
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: karmada-crds
      labels:
        app: karmada-crds
    spec:
      containers:
        - name: karmada-crds
          image: image-registry.openshift-image-registry.svc:5000/karmada-system/nginx-unprivileged:1.27.5
          imagePullPolicy: IfNotPresent
          livenessProbe:
            tcpSocket:
              port: 8080
            initialDelaySeconds: 5
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3
            periodSeconds: 10
          ports:
            - containerPort: 80
              name: karmada-crds
          volumeMounts:
            - name: crds
              mountPath: /usr/share/nginx/html
      volumes:
        - configMap:
            defaultMode: 420
            name: karmada-crds
          name: crds
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: karmada-crds
  namespace: karmada-system
spec:
  selector:
    app: karmada-crds
  type: ClusterIP
  ports:
    - name: karmada-crds
      protocol: TCP
      port: 80
    targetPort: 8080
  • Apply the manifest
$ oc --context mgmt -n karmada-system apply -f crd-webserver.yaml
  • Verify that the NGINX instance is in a running state
$ oc --context mgmt -n karmada-system get pods
NAME                            READY   STATUS    RESTARTS       AGE
karmada-crds-667fc8979f-s6jcd   1/1     Running   0              10d
  • 8. Create a custom resource YAML, karmada-instance.yaml, file (replacing the certSAN elements for the karmadaAPIServer and the default storage class) as follows:
apiVersion: operator.karmada.io/v1alpha1
kind: Karmada
metadata:
  name: karmada
  namespace: karmada-system
spec:
  crdTarball: # Air-gapped
    httpSource: # Air-gapped
      url: http://karmada-crds/1.13.1-crds.tar.gz # Air-gapped
  components:
    karmadaDescheduler:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-descheduler
    imageTag: v1.13.1
    karmadaAggregatedAPIServer:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-aggregated-apiserver # Air-gapped
      imageTag: v1.13.1
      annotations:
        openshift.io/required-scc: privileged
      featureGates:
        Failover: true
    karmadaControllerManager:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-controller-manager # Air-gapped
      imageTag: v1.13.1
      featureGates:
        Failover: true
    karmadaScheduler:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-scheduler # Air-gapped
      imageTag: v1.13.1
      featureGates:
        Failover: true
    karmadaWebhook:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-webhook # Air-gapped
      imageTag: v1.13.1
    karmadaMetricsAdapter:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-metrics-adapter # Air-gapped
      imageTag: v1.13.1
      annotations:
        openshift.io/required-scc: privileged
    kubeControllerManager:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/kube-controller-manager # Air-gapped
      imageTag: v1.31.3
    karmadaAPIServer:
      imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/kube-apiserver # Air-gapped
      imageTag: v1.31.3
      serviceType: NodePort # Expose the API server to the WL clusters with external IP
      certSANs: # Add SANs so that the WL Clusters can accept the Mgmt Cluster's certificate
        - <MgmtClusterVIP>  # Mgmt Cluster Mgmt VIP/apiserver
      annotations:
        openshift.io/required-scc: privileged
    #
    # Default storage for etcd will be to use hostPath which 
    # OpenShift will not tolerate so we add a 5Gi PVC
    # against the default storageClass (longhorn)
    #
    etcd:
      local:
        imageRepository: image-registry.openshift-image-registry.svc:5000/karmada-system/etcd # Air-gapped
        imageTag: 3.5.16-0
        replicas: 3
        volumeData:
          volumeClaim:
            spec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                storage: 5Gi             storageClassName: <defaultStorageClass>

Note: that for most of the Karmada components, we add an annotation that allows them to run in privileged mode. Without these annotations, Openshift will apply the most restrictive SCC (most interactions with the OS or the container filesystem will be curtailed).

  • 9. Apply the CR
$ oc --context mgmt apply -n karmada-system -f karmada-instance.yaml 
  • 10. Monitor the Karmada deployment
$ oc get pods -n karmada-system -w –context mgmt
NAME                                                  READY   STATUS    RESTARTS   AGE
karmada-aggregated-apiserver-7ff75d6979-wgk6z      1/1     Running   0          63m
karmada-apiserver-6df7fd8bdc-2dcct                 1/1     Running   0          63m
karmada-controller-manager-648c44f5c8-pglr9        1/1     Running   0          63m
karmada-etcd-0                                     1/1     Running   0          63m
karmada-kube-controller-manager-544f955cbd-79nkd   1/1     Running   0          63m
karmada-metrics-adapter-5bb86d4b9-pj6lr            1/1     Running   0          63m
karmada-metrics-adapter-5bb86d4b9-qjzgx            1/1     Running   0          63m
karmada-scheduler-7bd65745b9-sgkmn                 1/1     Running   0          63m
karmada-webhook-998985fb-trnt8                     1/1     Running   0          63m
karmada-operator-6cb799fdfc-792qh                  1/1     Running   0          145m
  • 11. Once all pods have reached a running state, verify that the health of the Karmada deployment is good
$ oc get karmada -n karmada-system –context mgmt
NAME         READY   AGE
karmada   True    86m 

Preparing the Karmada Context

  • 1. The WL Clusters, as part of the join process, will need to access the Karmada API server. Verify that the API Server has a NodePort address and is reachable:
$ oc get services -n karmada-system --kubeconfig <mgmtKubeconfig>
NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
karmada -aggregated-apiserver   ClusterIP   172.30.29.244    <none>        443/TCP             91m
karmada-apiserver              NodePort    172.30.11.192    <none>        5443:31463/TCP      91m
karmada-etcd                   ClusterIP   None             <none>        2379/TCP,2380/TCP   92m
karmada-etcd-client            ClusterIP   172.30.62.228    <none>        2379/TCP            92m
karmada-metrics-adapter        ClusterIP   172.30.37.238    <none>        443/TCP             91m
karmada-webhook                ClusterIP   172.30.196.143   <none>        443/TCP             91m
$ nc -zvw2 <mgmtClusterNodeIP> 31463
Connection to <mgmtClusterNodeIP> (10.9.177.57) 31463 port [tcp/*] succeeded!
  • 2. Fetch the kube config for the newly created Karmada context
$ oc -n karmada-system get secrets karmada-admin-config  -o jsonpath='{.data.karmada\.config}' –-context mgmt | base64 -d > .kube/karmada.yaml 
  • 3. Edit the cluster and context names in the karmada kubeconfig to ‘karmada-apiserver’ and change the cluster address to match the Management Cluster’s HA (VIP) address.
  • 4. Add the new kubeconfig to the default kubeconfig (see step 3 in Kubeconfigs for an example of how to flatten the kubeconfigs) 
root@jumphost:~# oc config get-contexts
CURRENT   NAME                CLUSTER             AUTHINFO           NAMESPACE
*         karmada-apiserver   karmada-apiserver   karmada-admin
          mgmt                mgmt                mgmt-admin
          workload-a          workload-a          workload-a-admin
          workload-b          workload-b          workload-b-admin
  • 5. The Management Cluster will need to be able to reach each of the WL Clusters through a DNS lookup. Ensure that the DNS Server serving the Management Cluster has entries for the WL Cluster API servers. E.g. 
<WLaClusterMgmtIP> api.<WLaClusterDNSName> 
<WLbClusterMgmtIP> api.<WLbClusterDNSName>

Joining the Workload Clusters

  • 1. Ensure that the Karmada-context is the working context
$ oc config use-context karmada-apiserver 
  • 2. Create a certificate-based kubeconfigs for each Workload Cluster:
$ oc --context workload-a config new-admin-kubeconfig >.kube/wla-admin.yaml
$ oc --context workload-b config new-admin-kubeconfig >.kube/wlb-admin.yaml
  • 3. JOIN Workload Cluster A - Join the WL Cluster 
$ oc karmada join workload-a --cluster-context workload-a 
  • 4. Enable the Karmada Scheduler Estimator for Workload Cluster A
$ oc karmada addons enable karmada-scheduler-estimator --karmada-kubeconfig .kube/config --context mgmt -C workload-a --member-kubeconfig .kube/wla-admin.yaml --member-context admin --karmada-scheduler-estimator-image=’image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-scheduler-estimator:v1.13.1
  • 5. JOIN Workload Cluster B - Join the WL Cluster
$ oc karmada join workload-b --cluster-context workload-b
  cluster workload-b is joined successfully
  • 6. Enable the Karmada Scheduler Estimator for Workload Cluster B
$ oc karmada addons enable karmada-scheduler-estimator --karmada-kubeconfig .kube/config --context mgmt -C workload-b --member-kubeconfig .kube/wlb-admin.yaml --member-context admin --karmada-scheduler-estimator-image=’image-registry.openshift-image-registry.svc:5000/karmada-system/karmada-scheduler-estimator:v1.13.1
  • 7. Verify both constituent clusters of the multi-cluster are READY
$ kubectl get clusters --context karmada-apiserver  
NAME               VERSION   MODE   READY   AGE
workload-a   v1.31.6   Push   True    123m
workload-b   v1.31.6   Push   True    123m

and that the scheduler-estimators are in a running state in the Management Cluster’s context, e.g.:

$ kubectl --context mgmt -n karmada-system get pods
NAME                                                      READY   STATUS    RESTARTS       AGE
karmada-aggregated-apiserver-6ddb95dfb5-vr7kx             1/1     Running   12 (24d ago)   24d
karmada-apiserver-6d5498c75c-r6822                        1/1     Running   7 (24d ago)    24d
karmada-controller-manager-6767545f84-5nnkt               1/1     Running   0              9d
karmada-descheduler-56f49d77f5-9dp6w                      1/1     Running   1 (24d ago)    24d
karmada-etcd-0                                            1/1     Running   0              24d
karmada-etcd-1                                            1/1     Running   0              24d
karmada-etcd-2                                            1/1     Running   0              24d
karmada-kube-controller-manager-797dd75f96-xw6zg          1/1     Running   10 (24d ago)   24d
karmada-metrics-adapter-75d478fb7-7964b                   1/1     Running   11 (24d ago)   24d
karmada-metrics-adapter-75d478fb7-pdqjz                   1/1     Running   11 (24d ago)   24d
karmada-operator-7c8b9b8f69-wdlhx                         1/1     Running   1 (24d ago)    24d
karmada-scheduler-55cdf456d9-mxcdz                        1/1     Running   0              10d
karmada-scheduler-estimator-workload-1-7d595559f4-p5sw9   1/1     Running   0              10d
karmada-scheduler-estimator-workload-2-667fc8979f-s6jcd   1/1     Running   0              10d
karmada-webhook-6c775f68c8-blprz                          1/1     Running   1 (24d ago)    24d

Operational Tips

Application Support

BBE Applications which include support for a multi-geo multi-cluster include:

  • BNG CUPS Controller – release 24.4R2 and later
  • Address Pool Manager (APM) – release 3.4.0 and later

Karmada kubeconfig

Keep track of the kube config file for the Management Cluster that you generated in step 2 of Preparing the Karmada Context. You will need to use this to generate a secret for the application’s Observer micro-service to monitor Karmada scheduling events. See application installation guide for details on creating the kube config secret.

Template File for Multi-geo

Application setup (APM of BNG CUPS Controller) has a lot more values to collect from the operator. At a minimum, registry push/pull addresses and the Karmada kubeconfig secrets can be put in a template file to be passed to the utility script’s setup step (--template).

In the example below, the Management Cluster’s Push FQDN is default-route-openshift-image-registry.apps.wf-mg-rh-kd-mdr.englab.juniper.net, Workload-a’s Push FQDN is default-route-openshift-image-registry.apps.wf-mg-rh-wla-mdr.englab.juniper.net and Workload-b’s FQDN is default-route-openshift-image-registry.apps.wf-mg-rh-wlb-mdr.englab.juniper.net:

   global:
     jnpr:
       karmada:
         registries:
           wf-mg-rh-wlb-mdr: image-registry.openshift-image-registry.svc:5000   # backupClusterName: imagePullTransportAddress
         backup_clusters:
         - wf-mg-rh-wlb-mdr                                                     # - backupClusterName
         primary_cluster: wf-mg-rh-wla-mdr                                      # primaryClusterName
       registry:
         push:
         - default-route-openshift-image-registry.apps.wf-mg-rh-kd-mdr.englab.juniper.net    #MgmtClusterRegistryPushFQDN
       - default-route-openshift-image-registry.apps.wf-mg-rh-wla-mdr.englab.juniper.net   #PrimaryWlClusterRegistryPushFQDN
         - default-route-openshift-image-registry.apps.wf-mg-rh-wlb-mdr.englab.juniper.net   #BackupWlClusterRegistryPushFQDN
     registry:
       pull: image-registry.openshift-image-registry.svc:5000               # primaryClusterRegistryPullTransportAddress
   observer:
     registry:
       pull: image-registry.openshift-image-registry.svc:5000               # managementClusterRegistryPullTransportAddress
     secrets:
       kubeconfig: karmada-kconf                                            # name of the Karmada kubeconfig secret object

Switchover vs Failover

The applications include a ‘multi-cluster switchover’ command with their utility script. The application’s micro-service charts carry an application-specific cluster toleration (1 second). The utility script’s switchover command applies a “NoExecute” taint to the cluster against the application-specific toleration in-order to trigger a switchover event. Micro-services that only exist on one Workload Cluster will be recreated on the other Workload Cluster and de-scheduled on their original Workload Cluster.

Karmada initiates failover procedures when it detects that a workload cluster is no longer viable for running workloads.  Micro-service multi-cluster policies are re-evaluated; micro-services that only exist on one Workload Cluster will be re-scheduled on the other Workload Cluster. When the original Workload Cluster becomes reachable, those micro-services will be de-scheduled.

Monitoring and Troubleshooting 

Generally running ‘kubectl get clusters’ against the Karmada context will give you an overview of the ready state of the two Workload Clusters it is monitoring. The Ready state of both Workload Clusters should be True.

$ kubectl get clusters –-context <karmadaContextName>
NAME              VERSION   MODE   READY   AGE
<wlaClusterName>  v1.31.6   Push   True    64d
<wlbClusterName>  v1.31.6   Push   True    64d

Karmada also supports a Prometheus endpoint for access to various alerts and key metrics. The following references are useful for setting up Prometheus monitoring:

Application workloads are tracked by ResourceBinding objects in the application namespace of the Karmada context. The ResourceBinding objects provide information on where the workload is scheduled and other useful meta-data about the workload. For example, 

$ kubectl get ResourceBinding -n jnpr-apm –-context <KarmadaContext>
NAME                                                                        SCHEDULED   FULLYAPPLIED   AGE
apm-apmi-<wlaClusterName>-service                                            True        True           30h
apm-apmi-<wlbClusterName>-service                                            True        True           30h
.
.
.

lists the resource Bindings for APM. Any ResourceBinding with a FULLYAPPLIED status of False may be suspect and worth delving into by describing the object. For example, describing a ResourceBinding for the provman deployment tells us which Workload Cluster Karmada expects this deployment to be scheduled on.

$ kubectl describe ResourceBinding -n jnpr-apm jnpr-apm-provman-deployment --context <KarmadaContext>
Name:         jnpr-apm-provman-deployment
Namespace:    jnpr-apm
Labels:       propagationpolicy.karmada.io/permanent-id=5c93ec57-e25f-4466-aa54-13f2a6f1289d
              resourcebinding.karmada.io/permanent-id=fca6e395-b54a-4aff-b5ba-bed8dcbfd6da
Annotations:  policy.karmada.io/applied-placement:
              {"clusterAffinities":[{"affinityName":"primary-clusters","clusterNames":["<workload-a>"]},{"affinityName":"backup-clusters","clusterNam...
              propagationpolicy.karmada.io/name: provman-prop
              propagationpolicy.karmada.io/namespace: jnpr-apm
              resourcebinding.karmada.io/dependencies: null
API Version:  work.karmada.io/v1alpha2
Kind:         ResourceBinding
.
.
.
Status:
  Aggregated Status:
    Applied:       true
    Cluster Name:  <workloadClusterName>
    Health:        Healthy
    Status:
      Available Replicas:            1
      Generation:                    1
      Observed Generation:           1
      Ready Replicas:                1
      Replicas:                      1
      Resource Template Generation:  2
      Updated Replicas:              1
  Conditions:
    Last Transition Time:             2025-06-05T14:38:39Z
    Message:                          All works have been successfully applied
    Reason:                           FullyAppliedSuccess
    Status:                           True
    Type:                             FullyApplied
    Last Transition Time:             2025-06-05T14:38:39Z
    Message:                          Binding has been scheduled successfully.
    Reason:                           Success
    Status:                           True
    Type:                             Scheduled
  Last Scheduled Time:                2025-06-05T14:38:39Z
  Scheduler Observed Generation:      3
  Scheduler Observing Affinity Name:  primary-clusters

Conclusion

The construction of a multi-geography multi-cluster from three separate single-geography Kubernetes clusters enables control-plane redundancy for Broadband Edge applications such as BNG CUPS Controller and Address Pool Manager. In the multi-geography multi-cluster, the clusters take on one of two roles: Karmada or Management Cluster, and Workload Cluster. The two Workload Clusters run the bulk of the application workloads. The cluster-internal networks of the two Workload Clusters are interconnected by a layer-3 secure tunnel established by Submariner to enable pod-to-pod communications. The Management Cluster monitors the state of the Workload clusters and the workloads (applications) that are running on them. Should one workload cluster become unviable for supporting workloads consistent with the propagation policies defined in their Helm charts, Karmada will try to honor the propagation policy on the other workload cluster (failover).

Useful Links

Glossary

  • APM – Address Pool Manager
  • BBE – Broadband Edge
  • BNG – Broadband Network Gateway
  • CIDR – Classless Inter-domain routing
  • CNF – Cloud-native Network Function
  • CNCF – Cloud Native Computing Foundation
  • CNI – Container Network Interface
  • CR – Custom Resource
  • CRD – Custom Resource Definition
  • CSI – Container Storage Interface
  • CUPS – Control/User Plane Separation
  • FQDN – Fully Qualified Domain Name
  • K8s – Kubernetes
  • NLB – Network Load Balancer
  • OCP – Openshift Container Platform (see RHOCP)
  • OSS – Open-Source Software
  • REST – Representational State Transfer
  • RHOCP – Red Hat Openshift Container Platform
  • SCC – Security Context Constraint
  • WL – Workload
  • YAML - data serialization language (YAML Ain’t Markup Language)
  • VIP – Virtual IP 

Acknowledgements

Many thanks to Allen Horine for his expertise in constructing and operating multi-clusters using Karmada and Submariner, Mike Zeimbekakis for his expertise in how application propagation policies work to define workload failover, and the BBE development team for defining and building applications to take advantage of the redundancy offered by a multi-geo multi-cluster.

Comments

If you want to reach out for comments, feedback or questions, drop us a mail at:

Revision History

Version Author(s) Date Comments
1 Steve Onishi September 2025 Initial Publication


#SolutionsandTechnology


#SRX Series
0 comments
39 views

Permalink