Testing Autoscaling Functionality
Warning
As of April 2023, Autoscaling is a yet to be validated feature and it is not intended for production purposes.
For updates follow this Nutanix Opendocs document.
Section Flow
stateDiagram
direction LR
state TestScalingEvents {
direction LR
ScaleUp --> ScaleDown
}
We will run with the following configuration:
- The Management cluster runs the Autoscaler
- The worker nodes runs the workloads
- The Autoscaler on the management cluster monitors the resource requirements of the workloads in the workload cluster and scales up and scales in the worker nodes in the workload cluster
graph LR
B[Management Cluster - AutoScaler ] -->|"[kubeconfig]"| C[Workload Cluster - Workloads ];
Deploy AutoScaler on Management Cluster
-
Make sure your Kubernetes context is that of the your Management cluster.
-
Download the AutoScaler manifest
-
Replace the environment variable in the dowloaded one to suit our workload cluster namespace environemnt variabele ${WORKLOAD_CLUSTER_NS}
-
Set your environment variables for the Autoscaler image you would like to use
Info
Other versions of
Autoscalerimages can be found here. -
Modify and add (where necessary) the following highlighted sections of the
deployment.yamlfile (specifically inside the container section)containers: - image: ${AUTOSCALER_IMAGE} name: cluster-autoscaler command: - /cluster-autoscaler args: - --cloud-provider=clusterapi - --kubeconfig=/mnt/kubeconfig/kubeconfig.yml - --clusterapi-cloud-config-authoritative - -v=1 volumeMounts: - mountPath: /mnt/kubeconfig name: kubeconfig readOnly: true volumes: - name: kubeconfig secret: secretName: ${WORKLOAD_CLUSTER_NAME}-kubeconfig # (1) items: - key: value path: kubeconfig.yml- Make sure to use the name/env. variable of your workload cluster. Another easy way to find the secret name, to be used, is to run the command
kubectl get secrets -n $AUTOSCALER_NS.
- Make sure to use the name/env. variable of your workload cluster. Another easy way to find the secret name, to be used, is to run the command
-
Check the manifest to make sure you have included all details, and apply the
Autoscalerdeployment manifestMake sure the autoscaler pod is running
k get po -n ${WORKLOAD_CLUSTER_NS} # NAME READY STATUS RESTARTS AGE cluster-autoscaler-6dbb469585-4ggtd 1/1 Running 0 7sWarning
Do not proceed if the
Autoscalerpod has issues starting, troubleshoot and fix before moving on to the next section.
We have prepared the Autoscaler resource to manage resources in our workload cluster.
Capacity Management
As a responsible Solution Architect and an Administrator you need to do plan for resources and manage capacity as well.
For this reason let us assume that you have decided that you will provide the following resources to the application team.
| VMs | |
|---|---|
| High Limit | 5 |
| Low Limit | 1 |
To apply this capacity in the Autoscaler we need to set annotations in MachineDeployment resource.
Edit the MachineDeployment resource using the following command
metadata section paste the following two lines:
cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5"
cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
Your MachineDeployment metadata section would look something like this
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
annotations:
cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "5" # (1)
cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1" # (2)
- Specifies maximum number of worker nodes to scale up to
- Specifies minimum number of worker nodes
Testing Scaling Events
Now comes the fun part that we have been setting up for.
Let us deploy a test workload on our workload cluster and check if scaling events actually work.
-
Apply the following workload manifest
k --kubeconfig ${WORKLOAD_CLUSTER_NAME}.cfg apply -f https://k8s.io/examples/application/php-apache.yamlThis will start just one pod.
-
Scale up this Deployment to 100 pods which will require more than one worker node worth of resources
-
Watch the AutoScaler
PHASEcolumn in the output -
Watch the AutoScaler logs by running the following command
k logs cluster-autoscaler-6dbb469585-4ggtd -n ${WORKLOAD_CLUSTER_NS} -f # # I0413 09:32:53.208911 1 scale_up.go:472] Estimated 5 nodes needed in MachineDeployment/kubevip3ns /kubevip3-wmd I0413 09:32:53.406155 1 scale_up.go:595] Final scale-up plan: [{MachineDeployment/kubevip3ns /kubevip3-wmd 1->5 (max: 5)}] I0413 09:32:53.406215 1 scale_up.go:691] Scale-up: setting group MachineDeployment/$kubevip3ns /kubevip3-wmd size to 5 W0413 09:33:04.902674 1 clusterapi_controller.go:469] Machine "kubevip3-wmd-57fcdf9f7xbgz8z-kcx9h" has no providerID W0413 09:33:04.902700 1 clusterapi_controller.go:469] Machine "kubevip3-wmd-57fcdf9f7xbgz8z-m88fl" has no providerID W0413 09:33:04.902708 1 clusterapi_controller.go:469] Machine "kubevip3-wmd-57fcdf9f7xbgz8z-mlwb6" has no providerID W0413 09:33:04.902713 1 clusterapi_controller.go:469] Machine "kubevip3-wmd-57fcdf9f7xbgz8z-rthbj" has no providerID -
You can also watch all the 100 pods now running using the following command
NAME READY STATUS RESTARTS AGE php-apache-698db99f59-24mqp 1/1 Running 0 17m php-apache-698db99f59-2pcpz 1/1 Running 0 17m php-apache-698db99f59-2vt28 1/1 Running 0 17m php-apache-698db99f59-2x8zs 1/1 Running 0 17m php-apache-698db99f59-44tf4 1/1 Running 0 17m php-apache-698db99f59-45vcp 1/1 Running 0 17m php-apache-698db99f59-4f9j6 1/1 Running 0 17m -
Let us check the number of nodes in the workload cluster and see that it has been scaled up to 5
k --kubeconfig ${WORKLOAD_CLUSTER_NAME}.cfg get nodes NAME STATUS ROLES AGE VERSION kubevip3-kcp-jnhf5 Ready control-plane 22h v1.24.11 kubevip3-kcp-p56j9 Ready control-plane 22h v1.24.11 kubevip3-kcp-z8pqj Ready control-plane 22h v1.24.11 kubevip3-wmd-57fcdf9f7xbgz8z-9mf59 Ready <none> 7m51s v1.24.11 kubevip3-wmd-57fcdf9f7xbgz8z-kcx9h Ready <none> 19m v1.24.11 kubevip3-wmd-57fcdf9f7xbgz8z-m88fl Ready <none> 19m v1.24.11 kubevip3-wmd-57fcdf9f7xbgz8z-mlwb6 Ready <none> 19m v1.24.11 kubevip3-wmd-57fcdf9f7xbgz8z-pqp8s Ready <none> 22h v1.24.11 -
Now we can test a scale down event to see if AutoScaler is communicating with Prism Central APIs to delete VMs that are not necessary
-
Watch the Node count, pods count, and Deployment logs as before.
You have experienced one of your many serverless compute experience.
Do come back to this site check for more updates.