Kubernetes HPA Controlled Elastic Scaling based on Prometheus Custom Metrics

There are three main types of elastic scaling in Kubernetes: HPA, VPA, and CA. Here we will focus on Pod Horizontal Scaling HPA.

With the release of Kubernetes v1.23, the HPA API came to a stable version autoscaling/v2:

Scaling based on custom metrics
Scaling based on multiple metrics
Configurable scaling behaviour

From the initial v1 version of HPA, which only supported CPU and memory utilisation scaling, to the later support for custom metrics and aggregation layer APIs, to v1.18, which added support for configurable scaling behaviour, HPA has become more and more usable and reliable.

Scaling on CPU or memory metrics is not used by all systems and does not appear to be as reliable. For most web backend systems, elastic scaling based on RPS (requests per second) to handle bursts of traffic is more reliable.

Prometheus is a popular open source monitoring system that provides access to real-time traffic load metrics, so today we’ll be trying out a custom metric based on Prometheus for elastic scaling.

Note: The current HPA scale to 0 requires the alpha version of HPAScaleToZero to be turned on at the feature gate and an object or external metric to be configured. Even if it is turned on, the scaling from 0 to 1 requires scheduling, IP allocation, mirror pulling, etc., which has some overhead. How to reduce this overhead is not covered here, but will be added in a subsequent article.

All code used in the article can be downloaded from here.

Overall architecture

sobyte

The HPA is to obtain metrics data from Prometheus. Here the Prometheus Adapter component is introduced, which implements the resource metrics, custom metrics and external metrics APIs APIs and supports autoscaling/v2 HPAs.

Once the metrics data is obtained, the number of examples of workloads is adjusted according to predefined rules.

Environment build

K3s

We use the latest 1.23 version of K3s as our Kubernetes environment.

1
2

export INSTALL_K3S_VERSION=v1.23.1+k3s2
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --write-kubeconfig ~/.kube/config

Example application

We prepare a simple web application that records the number of requests and outputs the metric http_requests_total in Prometheus format via the /metrics endpoint.

func main() {
	metrics := prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Name:        "http_requests_total",
			Help:        "Number of total http requests",
		},
		[]string{"status"},
	)
	prometheus.MustRegister(metrics)

	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		path := r.URL.Path
		statusCode := 200
		switch path {
		case "/metrics":
			promhttp.Handler().ServeHTTP(w, r)
		default:
			w.WriteHeader(statusCode)
			w.Write([]byte("Hello World!"))
		}
		metrics.WithLabelValues(strconv.Itoa(statusCode)).Inc()
	})
	http.ListenAndServe(":3000", nil)
}

Deploying applications to clusters.

`1`	`kubectl apply -f kubernetes/sample-httpserver-deployment.yaml`

Prometheus

To install Prometheus using Helm, first add the chart repository for prometheus.

`1`	`helm repo add prometheus-community https://prometheus-community.github.io/helm-charts`

Only prometheus-server is used for this test, and other components are disabled during installation. Also, to demonstrate the effectiveness of the effect, the pull interval of the indicator is set to 10s.

1
2
3

# install prometheus with some components disabled
# set scrape interval to 10s
helm install prometheus prometheus-community/prometheus -n default --set alertmanager.enabled=false,pushgateway.enabled=false,nodeExporter.enabled=false,kubeStateMetrics.enabled=false,server.global.scrape_interval=10s

Port forwarding allows web pages to be accessed in a browser.

1
2

# port forward
kubectl port-forward svc/prometheus-server 9090:80 -n prometheus

Here the RPS of the Pod is queried using the sum(rate(http_requests_total[30s])) by (pod) statement.

sobyte

Prometheus Adapter

The Produmetheus Adapter is also installed using Helm, with additional configuration here.

`1`	`helm install prometheus-adapter prometheus-community/prometheus-adapter -n default -f kubernetes/values-adapter.yaml`

In addition to configuring how the Prometheus server is accessed, it is also important to configure the rules for calculating custom metrics, telling the adapter how to get the metrics from Prometheus and calculate the metrics we need.

rules:
  default: false
  custom:
   - seriesQuery: '{__name__=~"^http_requests.*_total$",container!="POD",namespace!="",pod!=""}'
     resources:
       overrides:
         namespace: { resource: "namespace" }
         pod: { resource: "pod" }
     name:
       matches: "(.*)_total"
       as: "${1}_qps"
     metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[30s])) by (<<.GroupBy>>)

You can refer to the detailed Adapter configuration.

After the promethues-adapter pod has successfully run, you can execute the custom.metrics.k8s.io request.

kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_qps' | jq .
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests_qps"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Pod",
        "namespace": "default",
        "name": "sample-httpserver-64c495844f-b58pl",
        "apiVersion": "/v1"
      },
      "metricName": "http_requests_qps",
      "timestamp": "2022-01-18T03:32:51Z",
      "value": "100m",
      "selector": null
    }
  ]
}

Note: Here value: 100m, the suffix “m” identifies milli-requests per seconds, so 100m here means 0.1/s 0.1 requests per second.

HPA

Finally, the configuration of the HPA is as follows

the minimum and maximum number of replicas is set to 1 and 10 respectively
to test the effectiveness of the effect, set the behavior of the scaling behavior.
specify the metric http_requests_qps, the type Pods and the target value 50,000m: the average RPS per pod is 50. For example, with 300 RPS accesses, the number of replicas is 300/50=6.

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
  name: sample-httpserver
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-httpserver
  minReplicas: 1
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_qps
        target:
          type: AverageValue
          averageValue: 50000m

Testing

The test tool is vegeta because it can specify the RPS.

First create the NodePort service for the application.

kubectl expose deploy sample-httpserver --name sample-httpserver-host --type NodePort --target-port 3000

kubectl get svc sample-httpserver-host
NAME                     TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
sample-httpserver-host   NodePort   10.43.66.206   <none>        3000:31617/TCP   12h

Requests are initiated using RPS of 240, 120 and 40 respectively.

# 240
echo "GET http://192.168.1.92:31617" | vegeta attack -duration 60s -connections 10 -rate 240 | vegeta report
# 120
echo "GET http://192.168.1.92:31617" | vegeta attack -duration 60s -connections 10 -rate 120 | vegeta report
# 40
echo "GET http://192.168.1.92:31617" | vegeta attack -duration 60s -connections 10 -rate 40 | vegeta report

Observe the change in the number of requests versus the number of examples from the Prometheus web interface.

sobyte

kubectl describe hpa sample-httpserver
Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
Name:                           sample-httpserver
Namespace:                      default
Labels:                         <none>
Annotations:                    <none>
CreationTimestamp:              Mon, 17 Jan 2022 23:18:46 +0800
Reference:                      Deployment/sample-httpserver
Metrics:                        ( current / target )
  "http_requests_qps" on pods:  100m / 50
Min replicas:                   1
Max replicas:                   10
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
  Scale Down:
    Stabilization Window: 30 seconds
    Select Policy: Max
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
Deployment pods:       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric http_requests_qps
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age                  From                       Message
  ----    ------             ----                 ----                       -------
  Normal  SuccessfulRescale  25m                  horizontal-pod-autoscaler  New size: 6; reason: pods metric http_requests_qps above target
  Normal  SuccessfulRescale  19m                  horizontal-pod-autoscaler  New size: 4; reason: All metrics below target
  Normal  SuccessfulRescale  12m (x2 over 9h)     horizontal-pod-autoscaler  New size: 4; reason: pods metric http_requests_qps above target
  Normal  SuccessfulRescale  11m                  horizontal-pod-autoscaler  New size: 5; reason: pods metric http_requests_qps above target
  Normal  SuccessfulRescale  9m40s (x2 over 12m)  horizontal-pod-autoscaler  New size: 2; reason: pods metric http_requests_qps above target
  Normal  SuccessfulRescale  9m24s (x4 over 10h)  horizontal-pod-autoscaler  New size: 3; reason: pods metric http_requests_qps above target
  Normal  SuccessfulRescale  7m54s (x3 over 9h)   horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
  Normal  SuccessfulRescale  7m39s (x4 over 9h)   horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Summary

Horizontal scaling of applications based on custom metrics such as requests per second is more reliable than CPU/memory and is suitable for most web systems. Promeheus is a popular application monitoring system that can be used as a scaling metric with the support of the Adapter and Aggregate APIs.

HPA’s scale to 0 is currently in alpha and needs to focus on the effectiveness of scaling from 0 to N replicas. If the minimum number of replicas is greater than 0, it will again be resource intensive for some services. Next, we will try to address the performance of 0 to N, and the resource consumption issues.

Table of Contents

Overall architecture

Environment build

K3s

Example application

Prometheus

Prometheus Adapter

HPA

Testing

Summary