Developing Kubernetes Operators with Go: Basic Architecture

Developing Kubernetes Operators with Go

Image credit from “Kubernetes Operators Explained

A few years ago, I called Kubernetes the de facto standard for service orchestration and container scheduling, and today K8s is the unchallenged “kingpin” of the space. However, while Kubernetes has evolved to become very complex today, the original data model, application model and scaling approach of Kubernetes is still valid. And application models and scaling methods like Operator are becoming increasingly popular with developers and operators.

We have stateful backend services inside our platform. Deployment and maintenance of stateful services is the specialty of k8s operators, so it’s time to look at operator.

Study the operator now.

1. Advantages of Operator

The concept of the kubernetes operator originally came from CoreOS, a container technology company acquired by Red Hat.

CoreOS introduced the operator concept along with the first reference implementations of the operator: the etcd operator and the prometheus operator.

Note: etcd was released as open source by CoreOS in 2013; prometheus, the first time-series data storage and monitoring system for cloud-native services, was released as open source by SoundCloud in 2012.

Here is CoreOS’s interpretation of the concept of Operator: Operator represents in software the human knowledge of operations and maintenance through which an application can be reliably managed.

CoreOS interpretation of operator

Screenshot from the official CoreOS blog archive

The original purpose of Operator was to free up operations staff, and today Operator is becoming increasingly popular with cloud-native operations developers.

So what are the benefits of Operator? The following diagram compares the benefits of using Operator with those of not using it.

using Operator with those of not using it

This diagram gives you an idea of the benefits of operators, even if you don’t know much about them.

We can see that with operator, a simple command is all that is needed to scale a stateful application (in this case, scale-out operations, but also other “complex” operations for stateful applications such as version upgrades). Ops doesn’t need to know how scaling works inside k8s for stateful applications.

In the absence of an operator, Ops needs to have a deep knowledge of the steps of stateful application scaling, and execute the commands in a sequence one by one and check the command response, and retry in case of failure until the scaling succeeds.

We see operator as an experienced operator built into k8s, monitoring the state of the target object at all times, keeping the complexity to itself, giving the operator a simple interface to interact with, and at the same time operator reduces the probability of operational errors caused by the operator for personal reasons.

However, although operator is good, but the development threshold is not low. The development threshold is reflected in at least the following aspects.

The understanding of the operator concept is based on the understanding of k8s, which has become increasingly complex since it was open sourced in 2014, and requires some time investment to understand.
Hand-jobbing operator from scratch is very verbose, almost no one does it, most developers go to learn the corresponding development framework and tools, such as: kubebuilder, operator framework sdk, etc..
There are also high and low levels of operator capability, and the operator framework proposes a CAPABILITY MODEL containing * five levels of operator capability, see the following figure. Developing operators with high capability levels using Go requires a deep understanding of client-go, the API in the official kubernetes go client library.

Operator Capability Model

Screenshot from operator framework official website

Of course, among these thresholds, understanding the concept of operator is both the foundation and the prerequisite, and understanding operator in turn presupposes a deep understanding of many concepts of kubernetes, especially resource, resource type, API, controller, and the relationship between them. Let’s take a quick look at these concepts.

2. Introduction to Kubernetes resource, resource type, API and controller

Kubernetes has evolved to the point where its essence has become apparent.

Kubernetes is a “database” (the data is actually stored persistently in etcd).
Its API is a “sql statement”.
The API is designed in a resource-based Restful style, with the resource type being the endpoint of the API.
Each type of resource (i.e. Resource Type) is a “table”, and the spec of the Resource Type corresponds to the “table structure” information (schema).
A row in each “table” is a resource, i.e., an instance of the Resource Type corresponding to that table.
Kubernetes has many built-in “tables”, such as Pod, Deployment, DaemonSet, ReplicaSet, etc.

The following is a schematic representation of the relationship between the Kubernetes API and RESOURCE.

Diagram of the relationship between the Kubernetes API and resource

We see that there are two types of resource types, one namespace-related (namespace-scoped), and we manipulate instances of such resource types by means of an API of the following form.

1
2

VERB /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE - Manipulate a collection of resource instances in a resouce type under a specific namespace
VERB /apis/GROUP/VERSION/namespaces/NAMESPACE/RESOURCETYPE/NAME - Operate on a specific resource instance of a resource type under a specific namespace

The other class is namespace-independent, i.e., cluster-scoped, and we operate on instances of this type of resource type through the following form of API.

1
2

VERB /apis/GROUP/VERSION/RESOURCETYPE - Manipulating a collection of resource instances in a resouce type
VERB /apis/GROUP/VERSION/RESOURCETYPE/NAME - Manipulate a specific resource instance in the resource type

We know that Kubernetes is not really just a “database”, it is a platform standard for service orchestration and container scheduling, and its basic scheduling unit is a Pod (also a resource type), which is a collection of containers. How are Pods created, updated and deleted? This can’t be done without a controller. Each type of resource type has its own controller . Take the resource type pod as an example, its controller is an instance of ReplicasSet.

The operating logic of the controller is shown in the following diagram.

Controller operation logic

Quoted from the article “Kubernetes Operators Explained

Once the controller starts, it will try to get the current state of the resource and compare it with the desired state (spec) of the resource stored in k8s, and if it does not match, the controller will call the corresponding API to adjust it and try to make the current state to agree with the desired state. This reconciliation process is called reconciliation, and the pseudo-code logic of the reconciliation process is as follows.

for {
    desired := getDesiredState()
    current := getCurrentState()
    makeChanges(desired, current)
}

Note: There is a concept of object in k8s? So what is an object? It is similar to the Java Object base class or the Object superclass in Ruby. Not only is the instance resource of a resource type an (is-a)object, but the resource type itself is also an object, which is an instance of the kubernetes concept.

With the above initial understanding of these concepts of k8s, let’s understand what Operator really is!

3. Operator pattern = operation object (CRD) + control logic (controller)

If the operations and maintenance personnel face these built-in resource type (such as deployment, pod, etc.), which is the second case in the previous “using operator vs. not using operator” comparison chart, the operations and maintenance personnel will face a very complex situation, and operation The operation will be very complicated and error-prone.

So how do we customize the resource type if we don’t face the built-in resource type, Kubernetes provides Custom Resource Definition, CRD (when coreos first introduced the operator concept, crd was formerly known as Third Party Resource, TPR) can be used to customize the resource type.

According to our understanding of resource type, defining a CRD is equivalent to creating a new “table” (resource type), and once the CRD is created, k8s will automatically generate an API endpoint for us corresponding to the CRD, and we can manipulate it through yaml or API. We can manipulate this “table” through yaml or API. We can “insert” data into the “table”, i.e., create a Custom Resource (CR) based on the CRD, which is like creating a Deployment instance and inserting data into the Deployment “table”.

Like the native built-in resource type, it is not enough to have a CR that stores the state of the object, the native resource type has a corresponding controller responsible for coordinating the creation, scaling and deletion of instances. CR also needs such a “coordinator”. That is, we also need to define a controller to listen to the CR state and manage the creation, scaling, and deletion of CRs, as well as to keep the desired state (spec) consistent with the current state (current state). This controller is no longer an instance of the native Resource type, but a controller for the CRD-oriented instance CR.

With a custom operation object type (CRD) and a controller oriented to instances of the operation object type, we packaged it into a concept: the “Operator pattern”. The controller in the operator pattern is also called operator, and it is the controller in the The main body of the cluster that performs maintenance operations on CRs.

4. Developing a webserver operator with kubebuilder

Assumptions: At this point, your local development environment has everything you need to access your experimental k8s environment, and you can manipulate k8s at will through the kubectl tool.

A more in-depth explanation of the concept is not as helpful as a real-world experience to understand the concept, we will develop a simple Operator.

As mentioned before, operator development is very verbose, so the community provides development tools and frameworks to help developers simplify the development process, and the current mainstream includes operator framework sdk and kubebuilder. The former is a set of tools open-sourced and maintained by redhat, supporting operator development using go, ansible, and helm (of which only go can develop operators up to capability level 5, while the other two cannot); and kubebuilder is an operator development tool maintained by the official sig (special interest group) of kubernetes. Currently, when developing operators based on the operator framework sdk and go, the operator sdk also uses kubebuilder at the bottom, so here we will directly use kubebuilder to develop operators.

A more in-depth explanation of the concept is not as helpful as a real-world experience to understand the concept, we will develop a simple Operator.

As mentioned before, operator development is very verbose, so the community provides development tools and frameworks to help developers simplify the development process, and the current mainstream includes operator framework sdk and kubebuilder. The former is a set of tools open-sourced and maintained by redhat, which supports operator development using go, ansible, and helm (of which only go can develop operators up to capability level 5, while the other two cannot); and kubebuilder is an operator development tool maintained by the official sig (interest group) of kubernetes. Currently, when developing operators based on the operator framework sdk and go, the operator sdk also uses kubebuilder at the bottom, so here we will directly use kubebuilder to develop operators.

According to the operator capability model, our operator is almost at this level of level 2. We define a resource type of a webserver, which represents an nginx-based webserver cluster, and our operator supports the creation of webserver examples (an nginx cluster), support for nginx cluster scaling, and support for version upgrades of nginx in the cluster.

Let’s use kubebuilder to implement this operator!

1. Installing kubebuilder

Here we use the source code build method to install, and the steps are as follows.

$git clone git@github.com:kubernetes-sigs/kubebuilder.git
$cd kubebuilder
$make
$cd bin
$./kubebuilder version
Version: main.version{KubeBuilderVersion:"v3.5.0-101-g5c949c2e",
KubernetesVendor:"unknown",
GitCommit:"5c949c2e50ca8eec80d64878b88e1b2ee30bf0bc",
BuildDate:"2022-08-06T09:12:50Z", GoOs:"linux", GoArch:"amd64"}

Then just copy bin/kubebuilder to one of the paths in your PATH environment variable.

2. Creating the webserver-operator project

Next, we can use kubebuilder to create the webserver-operator project.

$mkdir webserver-operator
$cd webserver-operator
$kubebuilder init  --repo github.com/bigwhite/webserver-operator --project-name webserver-operator

Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.12.2
go: downloading k8s.io/client-go v0.24.2
go: downloading k8s.io/component-base v0.24.2
Update dependencies:
$ go mod tidy
Next: define a resource with:
kubebuilder create api

Note: -repo specifies the module root path in go.mod, you can define your own module root path.

3. Creating the API and generating the initial CRD

The Operator consists of a CRD and a controller. Here we create our own CRD, i.e., a custom resource type, which is the endpoint of the API, and we use the following kubebuilder create command to complete this step.

$kubebuilder create api --version v1 --kind WebServer
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/webserver_types.go
controllers/webserver_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen || GOBIN=/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests

After that, we execute make manifests to generate the yaml file corresponding to the final CRD.

1
2

$make manifests
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases

At this moment, the directory file layout of the entire project is as follows.

$tree -F .
.
├── api/
│   └── v1/
│       ├── groupversion_info.go
│       ├── webserver_types.go
│       └── zz_generated.deepcopy.go
├── bin/
│   └── controller-gen*
├── config/
│   ├── crd/
│   │   ├── bases/
│   │   │   └── my.domain_webservers.yaml
│   │   ├── kustomization.yaml
│   │   ├── kustomizeconfig.yaml
│   │   └── patches/
│   │       ├── cainjection_in_webservers.yaml
│   │       └── webhook_in_webservers.yaml
│   ├── default/
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager/
│   │   ├── controller_manager_config.yaml
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus/
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   ├── rbac/
│   │   ├── auth_proxy_client_clusterrole.yaml
│   │   ├── auth_proxy_role_binding.yaml
│   │   ├── auth_proxy_role.yaml
│   │   ├── auth_proxy_service.yaml
│   │   ├── kustomization.yaml
│   │   ├── leader_election_role_binding.yaml
│   │   ├── leader_election_role.yaml
│   │   ├── role_binding.yaml
│   │   ├── role.yaml
│   │   ├── service_account.yaml
│   │   ├── webserver_editor_role.yaml
│   │   └── webserver_viewer_role.yaml
│   └── samples/
│       └── _v1_webserver.yaml
├── controllers/
│   ├── suite_test.go
│   └── webserver_controller.go
├── Dockerfile
├── go.mod
├── go.sum
├── hack/
│   └── boilerplate.go.txt
├── main.go
├── Makefile
├── PROJECT
└── README.md

14 directories, 40 files

4. The basic structure of the webserver-operator

Ignoring things like leader election, auth_proxy, etc. that we don’t care about this time, I’ve organized the main parts of this operator example into the following diagram.

The basic structure of the webserver-operator

The parts of the diagram are the basic structure of the operator generated using kubebuilder.

The webserver operator consists mainly of a CRD and a controller.

CRD

The box in the lower left corner of the diagram is the CRD yaml file generated above: config/crd/bases/my.domain_webservers.yaml. CRD is closely related to api/v1/webserver_types.go. We define spec related fields for CRD in api/v1/webserver_types.go, after that make manifests command can parse the changes in webserver_types.go and update the CRD yaml file.
controller

The right part of the diagram shows that the controller itself is deployed as a deployment running in the k8s cluster, which monitors the running state of the CRD’s instance CR and checks in the Reconcile method whether the expected state is consistent with the current state, and if not, performs the relevant action.
Other

The top left corner of the figure is about the controller’s permission settings. The controller accesses the k8s API server through serviceaccount, and sets the controller’s role and permission through role.yaml and role_binding.yaml.

5. Adding fields to the CRD spec (fields)

In order to achieve the functional goals of the webserver operator, we need to add some status fields to the CRD spec. As mentioned before, the CRD is synchronized with the webserver_types.go file in the api, so we just need to modify the webserver_types.go file. We add Replicas and Image fields to the WebServerSpec structure, which are used to indicate the number of copies of the webserver instance and the container image used, respectively.

// api/v1/webserver_types.go

// WebServerSpec defines the desired state of WebServer
type WebServerSpec struct {
    // INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
    // Important: Run "make" to regenerate code after modifying this file

    // The number of replicas that the webserver should have
    Replicas int `json:"replicas,omitempty"`

    // The container image of the webserver
    Image string `json:"image,omitempty"`

    // Foo is an example field of WebServer. Edit webserver_types.go to remove/update
    Foo string `json:"foo,omitempty"`
}

After saving the changes, execute make manifests to regenerate config/crd/bases/my.domain_webservers.yaml

$cat my.domain_webservers.yaml
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.9.2
  creationTimestamp: null
  name: webservers.my.domain
spec:
  group: my.domain
  names:
    kind: WebServer
    listKind: WebServerList
    plural: webservers
    singular: webserver
  scope: Namespaced
  versions:
  - name: v1
    schema:
      openAPIV3Schema:
        description: WebServer is the Schema for the webservers API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: WebServerSpec defines the desired state of WebServer
            properties:
              foo:
                description: Foo is an example field of WebServer. Edit webserver_types.go
                  to remove/update
                type: string
              image:
                description: The container image of the webserver
                type: string
              replicas:
                description: The number of replicas that the webserver should have
                type: integer
            type: object
          status:
            description: WebServerStatus defines the observed state of WebServer
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}

Once the CRD is defined, we can install it into k8s.

$make install
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize || { curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash -s -- 3.8.7 /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin; }
{Version:kustomize/v3.8.7 GitCommit:ad092cc7a91c07fdf63a2e4b7f13fa588a39af4f BuildDate:2020-11-11T23:14:14Z GoOs:linux GoArch:amd64}
kustomize installed to /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/webservers.my.domain created

Check the installation.

1
2

$kubectl get crd|grep webservers
webservers.my.domain                                             2022-08-06T21:55:45Z

6. Modify role.yaml

Before we start the controller development, let’s “pave the way” for the subsequent operation of the controller, i.e., set the appropriate permissions.

We will create the corresponding deployments and services for the CRD instance in the controller, which requires the controller to have the permission to operate the deployments and services, so we need to modify role.yaml to add service account: controller-manager to manipulate deployments and services.

// config/rbac/role.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: manager-role
rules:
- apiGroups:
  - my.domain
  resources:
  - webservers
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - my.domain
  resources:
  - webservers/finalizers
  verbs:
  - update
- apiGroups:
  - my.domain
  resources:
  - webservers/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  - ""
  resources:
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

The modified role.yaml will be placed here first and then deployed to k8s together with the controller.

7. Implementing the controller’s Reconcile (coordination) logic

kubebuilder builds the code framework of the controller for us, we just need to implement the Reconcile method of WebServerReconciler in controllers/webserver_controller.go. Here is a simple flowchart of Reconcile, it is much easier to understand the code with this diagram.

kubebuilder builds the code framework of the controller for us

The following is the code for the corresponding Reconcile method.

// controllers/webserver_controller.go

func (r *WebServerReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("Webserver", req.NamespacedName)

    instance := &mydomainv1.WebServer{}
    err := r.Get(ctx, req.NamespacedName, instance)
    if err != nil {
        if errors.IsNotFound(err) {
            // Request object not found, could have been deleted after reconcile request.
            // Return and don't requeue
            log.Info("Webserver resource not found. Ignoring since object must be deleted")
            return ctrl.Result{}, nil
        }

        // Error reading the object - requeue the request.
        log.Error(err, "Failed to get Webserver")
        return ctrl.Result{RequeueAfter: time.Second * 5}, err
    }

    // Check if the webserver deployment already exists, if not, create a new one
    found := &appsv1.Deployment{}
    err = r.Get(ctx, types.NamespacedName{Name: instance.Name, Namespace: instance.Namespace}, found)
    if err != nil && errors.IsNotFound(err) {
        // Define a new deployment
        dep := r.deploymentForWebserver(instance)
        log.Info("Creating a new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
        err = r.Create(ctx, dep)
        if err != nil {
            log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", dep.Namespace, "Deployment.Name", dep.Name)
            return ctrl.Result{RequeueAfter: time.Second * 5}, err
        }
        // Deployment created successfully - return and requeue
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Deployment")
        return ctrl.Result{RequeueAfter: time.Second * 5}, err
    }

    // Ensure the deployment replicas and image are the same as the spec
    var replicas int32 = int32(instance.Spec.Replicas)
    image := instance.Spec.Image

    var needUpd bool
    if *found.Spec.Replicas != replicas {
        log.Info("Deployment spec.replicas change", "from", *found.Spec.Replicas, "to", replicas)
        found.Spec.Replicas = &replicas
        needUpd = true
    }

    if (*found).Spec.Template.Spec.Containers[0].Image != image {
        log.Info("Deployment spec.template.spec.container[0].image change", "from", (*found).Spec.Template.Spec.Containers[0].Image, "to", image)
        found.Spec.Template.Spec.Containers[0].Image = image
        needUpd = true
    }

    if needUpd {
        err = r.Update(ctx, found)
        if err != nil {
            log.Error(err, "Failed to update Deployment", "Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
            return ctrl.Result{RequeueAfter: time.Second * 5}, err
        }
        // Spec updated - return and requeue
        return ctrl.Result{Requeue: true}, nil
    }

    // Check if the webserver service already exists, if not, create a new one
    foundService := &corev1.Service{}
    err = r.Get(ctx, types.NamespacedName{Name: instance.Name + "-service", Namespace: instance.Namespace}, foundService)
    if err != nil && errors.IsNotFound(err) {
        // Define a new service
        srv := r.serviceForWebserver(instance)
        log.Info("Creating a new Service", "Service.Namespace", srv.Namespace, "Service.Name", srv.Name)
        err = r.Create(ctx, srv)
        if err != nil {
            log.Error(err, "Failed to create new Servie", "Service.Namespace", srv.Namespace, "Service.Name", srv.Name)
            return ctrl.Result{RequeueAfter: time.Second * 5}, err
        }
        // Service created successfully - return and requeue
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Service")
        return ctrl.Result{RequeueAfter: time.Second * 5}, err
    }

    // Tbd: Ensure the service state is the same as the spec, your homework

    // reconcile webserver operator in again 10 seconds
    return ctrl.Result{RequeueAfter: time.Second * 10}, nil
}

Here you may find: The original CRD controller eventually translates CR into k8s native Resource, such as service, deployment, etc. The state changes of CR (such as here replicas, images, etc.) are eventually converted into a native resource such as deployment. This is the essence of operator! Understanding this layer, operator is no longer an inaccessible concept for everyone.

Some people may also find that the above flowchart does not seem to consider the operation of deployment and service when the CR instance is deleted, which is true. But for a 7×24 service running in the background, we are more concerned about its changes, scaling, upgrades and other operations, deletion is the lowest priority requirements.

8. Building the controller image

After the controller code is written, we will build the controller image, which we know from the previous article is actually a pod running in k8s under deployment.

We need to build its image and deploy it to k8s via deployment.

The operator project created by kubebuilder contains the Makefile, and the controller image can be built by make docker-build. docker-build uses the golang builder image to build the controller source code, but if you don’t modify the Docker-build uses the golang builder image to build the controller source code, but without a slight modification to the Dockerfile, you will have a hard time compiling over it, because the default GOPROXY is not accessible in China. The easiest way to modify here is to use vendor build, and here is the modified Dockerfile.

# Build the manager binary
FROM golang:1.18 as builder

ENV GOPROXY https://goproxy.cn
WORKDIR /workspace
# Copy the Go Modules manifests
COPY go.mod go.mod
COPY go.sum go.sum
COPY vendor/ vendor/
# cache deps before building and copying source so that we don't need to re-download as much
# and so that source changes don't invalidate our downloaded layer
#RUN go mod download

# Copy the go source
COPY main.go main.go
COPY api/ api/
COPY controllers/ controllers/

# Build
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor -a -o manager main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
#FROM gcr.io/distroless/static:nonroot
FROM katanomi/distroless-static:nonroot
WORKDIR /
COPY --from=builder /workspace/manager .
USER 65532:65532

ENTRYPOINT ["/manager"]

The following are the steps of the build.

$go mod vendor
$make docker-build

test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen || GOBIN=/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
KUBEBUILDER_ASSETS="/home/tonybai/.local/share/kubebuilder-envtest/k8s/1.24.2-linux-amd64" go test ./... -coverprofile cover.out
?       github.com/bigwhite/webserver-operator    [no test files]
?       github.com/bigwhite/webserver-operator/api/v1    [no test files]
ok      github.com/bigwhite/webserver-operator/controllers    4.530s    coverage: 0.0% of statements
docker build -t bigwhite/webserver-controller:latest .
Sending build context to Docker daemon  47.51MB
Step 1/15 : FROM golang:1.18 as builder
 ---> 2d952adaec1e
Step 2/15 : ENV GOPROXY https://goproxy.cn
 ---> Using cache
 ---> db2b06a078e3
Step 3/15 : WORKDIR /workspace
 ---> Using cache
 ---> cc3c613c19c6
Step 4/15 : COPY go.mod go.mod
 ---> Using cache
 ---> 5fa5c0d89350
Step 5/15 : COPY go.sum go.sum
 ---> Using cache
 ---> 71669cd0fe8e
Step 6/15 : COPY vendor/ vendor/
 ---> Using cache
 ---> 502b280a0e67
Step 7/15 : COPY main.go main.go
 ---> Using cache
 ---> 0c59a69091bb
Step 8/15 : COPY api/ api/
 ---> Using cache
 ---> 2b81131c681f
Step 9/15 : COPY controllers/ controllers/
 ---> Using cache
 ---> e3fd48c88ccb
Step 10/15 : RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -mod=vendor -a -o manager main.go
 ---> Using cache
 ---> 548ac10321a2
Step 11/15 : FROM katanomi/distroless-static:nonroot
 ---> 421f180b71d8
Step 12/15 : WORKDIR /
 ---> Running in ea7cb03027c0
Removing intermediate container ea7cb03027c0
 ---> 9d3c0ea19c3b
Step 13/15 : COPY --from=builder /workspace/manager .
 ---> a4387fe33ab7
Step 14/15 : USER 65532:65532
 ---> Running in 739a32d251b6
Removing intermediate container 739a32d251b6
 ---> 52ae8742f9c5
Step 15/15 : ENTRYPOINT ["/manager"]
 ---> Running in 897893b0c9df
Removing intermediate container 897893b0c9df
 ---> e375cc2adb08
Successfully built e375cc2adb08
Successfully tagged bigwhite/webserver-controller:latest

Note: Before executing the make command, change the initial value of the IMG variable in the Makefile to IMG ? = bigwhite/webserver-controller:latest

After a successful build, execute make docker-push to push the image to the image repository (the public repository provided by docker is used here).

9. Deploying the controller

We have already installed the CRD into k8s by make install, next we deploy the controller to k8s and our operator is deployed. Execute make deploy to achieve deployment.

$make deploy
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen || GOBIN=/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
test -s /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize || { curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash -s -- 3.8.7 /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin; }
cd config/manager && /home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize edit set image controller=bigwhite/webserver-controller:latest
/home/tonybai/test/go/operator/kubebuilder/webserver-operator/bin/kustomize build config/default | kubectl apply -f -
namespace/webserver-operator-system created
customresourcedefinition.apiextensions.k8s.io/webservers.my.domain unchanged
serviceaccount/webserver-operator-controller-manager created
role.rbac.authorization.k8s.io/webserver-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/webserver-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/webserver-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/webserver-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/webserver-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/webserver-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/webserver-operator-proxy-rolebinding created
configmap/webserver-operator-manager-config created
service/webserver-operator-controller-manager-metrics-service created
deployment.apps/webserver-operator-controller-manager created

We see that deploy not only installs controller, serviceaccount, role, rolebinding, it also creates namespace and also installs crd all over again. In other words, deploy is a complete operator installation command.

Note: Use make undeploy to completely uninstall operator-related resources.

We use kubectl logs to view the controller’s operational logs.

$kubectl logs -f deployment.apps/webserver-operator-controller-manager -n webserver-operator-system
1.6600280818476188e+09    INFO    controller-runtime.metrics    Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
1.6600280818478029e+09    INFO    setup    starting manager
1.6600280818480284e+09    INFO    Starting server    {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
1.660028081848097e+09    INFO    Starting server    {"kind": "health probe", "addr": "[::]:8081"}
I0809 06:54:41.848093       1 leaderelection.go:248] attempting to acquire leader lease webserver-operator-system/63e5a746.my.domain...
I0809 06:54:57.072336       1 leaderelection.go:258] successfully acquired lease webserver-operator-system/63e5a746.my.domain
1.6600280970724037e+09    DEBUG    events    Normal    {"object": {"kind":"Lease","namespace":"webserver-operator-system","name":"63e5a746.my.domain","uid":"e05aaeb5-4a3a-4272-b036-80d61f0b6788","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5238800"}, "reason": "LeaderElection", "message": "webserver-operator-controller-manager-6f45bc88f7-ptxlc_0e960015-9fbe-466d-a6b1-ff31af63a797 became leader"}
1.6600280970724993e+09    INFO    Starting EventSource    {"controller": "webserver", "controllerGroup": "my.domain", "controllerKind": "WebServer", "source": "kind source: *v1.WebServer"}
1.6600280970725305e+09    INFO    Starting Controller    {"controller": "webserver", "controllerGroup": "my.domain", "controllerKind": "WebServer"}
1.660028097173026e+09    INFO    Starting workers    {"controller": "webserver", "controllerGroup": "my.domain", "controllerKind": "WebServer", "worker count": 1}

As you can see, the controller has been successfully started and is waiting for a WebServer CR related event (e.g. creation). Let’s create a WebServer CR!

10. Creating the WebServer CR

There is a CR sample in the webserver-operator project, located under config/samples, which we modify to add the fields we added in the spec.

// config/samples/_v1_webserver.yaml 

apiVersion: my.domain/v1
kind: WebServer
metadata:
  name: webserver-sample
spec:
  # TODO(user): Add fields here
  image: nginx:1.23.1
  replicas: 3

We create this WebServer CR via kubectl.

1
2
3

$cd config/samples
$kubectl apply -f _v1_webserver.yaml
webserver.my.domain/webserver-sample created

Observe the controller’s log.

1
2

1.6602084232243123e+09  INFO    controllers.WebServer   Creating a new Deployment   {"Webserver": "default/webserver-sample", "Deployment.Namespace": "default", "Deployment.Name": "webserver-sample"}
1.6602084233446114e+09  INFO    controllers.WebServer   Creating a new Service  {"Webserver": "default/webserver-sample", "Service.Namespace": "default", "Service.Name": "webserver-sample-service"}

We see that when the CR is created, the controller listens to the relevant events and creates the corresponding Deployment and service, so let’s take a look at the Deployment, the three Pods and the service created for the CR.

$kubectl get service
NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes                 ClusterIP   172.26.0.1     <none>        443/TCP        22d
webserver-sample-service   NodePort    172.26.173.0   <none>        80:30010/TCP   2m58s

$kubectl get deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
webserver-sample   3/3     3            3           4m44s

$kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
webserver-sample-bc698b9fb-8gq2h   1/1     Running   0          4m52s
webserver-sample-bc698b9fb-vk6gw   1/1     Running   0          4m52s
webserver-sample-bc698b9fb-xgrgb   1/1     Running   0          4m52s

Let’s request the service.

$curl http://192.168.10.182:30010
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

The service returns a response as expected!

11. Scaling, Changing Versions, and Service Self-Healing

Next, let’s do some common O&M operations on CR.

Number of copies changed from 3 to 4

We change the replicas of CR from 3 to 4 and do an extension operation on the container instance.

// config/samples/_v1_webserver.yaml 

apiVersion: my.domain/v1
kind: WebServer
metadata:
name: webserver-sample
spec:
# TODO(user): Add fields here
image: nginx:1.23.1
replicas: 4

Then make it effective by kubectl apply.

1
2

$kubectl apply -f _v1_webserver.yaml
webserver.my.domain/webserver-sample configured

After the above command is executed, we observe the controller log of operator as follows.

`1`	`1.660208962767797e+09 INFO controllers.WebServer Deployment spec.replicas change {"Webserver": "default/webserver-sample", "from": 3, "to": 4}`

Later, check the number of pods.

$kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
webserver-sample-bc698b9fb-8gq2h   1/1     Running   0          9m41s
webserver-sample-bc698b9fb-v9gvg   1/1     Running   0          42s
webserver-sample-bc698b9fb-vk6gw   1/1     Running   0          9m41s
webserver-sample-bc698b9fb-xgrgb   1/1     Running   0          9m41s

The number of webserver pod copies was successfully expanded from 3 to 4.

Changing the webserver image version

We change the version of the CR image from nginx:1.23.1 to nginx:1.23.0 and then execute kubectl apply to make it work.

We check the response log of the controller as follows.

`1`	`1.6602090494113188e+09 INFO controllers.WebServer Deployment spec.template.spec.container[0].image change {"Webserver": "default/webserver-sample", "from": "nginx:1.23.1", "to": "nginx:1.23.0"}`

The controller updates the deployment, causing a rolling upgrade of the pods under its jurisdiction.

$kubectl get pods
NAME                               READY   STATUS              RESTARTS   AGE
webserver-sample-bc698b9fb-8gq2h   1/1     Running             0          10m
webserver-sample-bc698b9fb-vk6gw   1/1     Running             0          10m
webserver-sample-bc698b9fb-xgrgb   1/1     Running             0          10m
webserver-sample-ffcf549ff-g6whk   0/1     ContainerCreating   0          12s
webserver-sample-ffcf549ff-ngjz6   0/1     ContainerCreating   0          12s

After waiting patiently for a short while, the final pod list is as follows.

$kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
webserver-sample-ffcf549ff-g6whk   1/1     Running   0          6m22s
webserver-sample-ffcf549ff-m6z24   1/1     Running   0          3m12s
webserver-sample-ffcf549ff-ngjz6   1/1     Running   0          6m22s
webserver-sample-ffcf549ff-t7gvc   1/1     Running   0          4m16s

service self-healing: restore the service that was undeleted

Let’s do a “mistake” and remove the webserver-sample-service to see if the controller can help the service heal itself.

1
2

$kubectl delete service/webserver-sample-service
service "webserver-sample-service" deleted

View controller logs.

`1`	`1.6602096994710526e+09 INFO controllers.WebServer Creating a new Service {"Webserver": "default/webserver-sample", "Service.Namespace": "default", "Service.Name": "webserver-sample-service"}`

We see that the controller detected the status of the service being deleted and rebuilt a new service!

Request the newly created service.

curl http://192.168.10.182:30010
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

You can see that the service has finished healing itself with the help of the controller!

5. Summary

In this article, we have introduced the concept and advantages of Kubernetes Operator, and developed a level 2 operator based on kubebuilder, which is still a long way from perfection.

I believe that after reading this article, you will have a clear understanding of the operator, especially its basic structure, and will be able to develop a simple operator!

The source code for this article can be downloaded from here.