Apache Spark is one of the most widely used computational tools for big data analytics today. It excels at batch and real-time stream processing and supports machine learning, artificial intelligence, natural language processing, and data analytics applications. The narrowly defined Hadoop (MR) technology stack is shrinking as Spark becomes more popular and more heavily used. In addition, general opinion and practical experience has proven that Hadoop (YARN) does not have the flexibility to integrate with a broader enterprise technology stack, except for Big Data-related workloads.
JavaScript Data Processing - Mapping Tables
The common data collections in JavaScript are lists (Array) and mapping tables (Plain Object). This article talks about mapping tables. Due to the dynamic nature of JavaScript, the object itself is a mapping table, and the “attribute name ⇒ attribute value” of an object is the “key ⇒ value” in the mapping table. To make it easier to use the object as a mapping table, JavaScript even allows property names
Getting Started with Kubernetes - A Knowledge Base
Start with an introduction to k8s knowledge points and common terms.
Master The Master node, the brain of k8s, is the control node of the cluster and is responsible for the management and control of the cluster. All other Nodes register themselves with the Master and report all information about themselves on a regular basis. The following four processes run on the Master node.
Api Server: The service process that provides the HTTP Rest interface; the only entry point for all resource add, delete, and check operations; the entry point for cluster control, kubectl is directly responsible for the Api Server.
Django's signaling mechanism
Django describes signal as a “signal dispatcher” that triggers multiple applications, mainly in the form of signals. This article will explain how signals work in Django and how to use them from the perspective of source code analysis. Signal class The most common scenario for signal is notifications. For example, if your blog has comments, the system will have a notification mechanism to push the comments to you. If you
Getting Started with Kubernetes - Pods Explained
Pods are the basic scheduling unit of k8s and are the key to k8s. This article explains how to publish and manage applications in k8s, starting from Pod usage, control, scheduling, and application configuration. Pod Basic Usage The requirement for a long-running container is that its main application needs to be running in the foreground at all times. kubelet creates a Pod containing this container, then runs the command, considers
Python Coroutine
A coroutine, also known as a micro-thread, is a lightweight thread. coroutines can be understood as processes that collaborate with the caller to produce values provided by the caller. The advantage over threads is that context switching is cheaper and under the control of the user.
History of development Coroutines in Python have gone through three main phases. coroutines were first implemented in Python 2.5, morphed from generators and implemented with keywords like yield/send; yield from was introduced, allowing complex generators to be refactored into small nested generators; Python 3.
Getting Started with Kubernetes - Networking Explained
Service is the core concept of the k8s networking part, and in k8s, Service is mainly responsible for four layers of load balancing. This article explains the network-related principles of k8s in terms of load balancing, extranet access, DNS service construction, and the Ingress seven-layer routing mechanism. Service Explained Service is the main mechanism used to implement the application to provide services to the outside world. As shown in the
Kubernetes Security Mechanisms Explained
In Kubernetes, all access and changes to resources revolve around the APIServer. For example, kubectl commands and client-side HTTP RESTFUL requests are all done by calling the APIServer API, so this article focuses on what k8s does for cluster security.
First, the official Kubernetes documentation gives the diagram above. It describes the authentication, authorization, and access control mechanisms of the APIServer that users need to go through before they can access or change resources.
Kubernetes Job and Cronjob
If resources such as Deployment and DaemonSet take on long, online compute for Kubernetes, then regular, short-term, and even one-time offline compute is what Job and CronJob take on.
Job A Job is actually one or more pods that are defined to execute a task, and when the pod exits after execution, the Job is done. So Job is also called Batch Job, i.e. compute business or offline business.
Job Usage The YAML definition of a Job is very similar to Deployment.
RabbitMQ Consumption Model and Dead Letter Queues
RabbitMQ Model RabbitMQ is a producer/consumer model, where the producer produces messages to a queue and the consumer takes messages from the queue to consume, without interacting directly with each other. Let’s start by looking at the structure of the RabbitMQ model. In the diagram, we can see that the entire structure consists of a Producer, an Exchange, a Queue, and a Consumer. Among them, the Producer and the Consumer
The OIDC protocol and its use in Kubernetes
Most of the authentication mechanisms in K8s are done with ServiceAccount. Although K8s has the concept of User, there is no resource that corresponds to a “person”, so it is still very difficult to do user management in K8s. The good thing is that K8s provides an alternative way for user management, which is to interface with the OIDC protocol. In this article, we’ll take a look at what the OIDC protocol is and how it is used in K8s.
Kubernetes Service Discovery - coreDNS
Service discovery is an important feature of K8s, and there are two ways to do this: either by injecting the svc ClusterIP into the pod as an environment variable, or by using DNS, which has replaced kube dns as the built-in DNS server since version 1.13. In this article, we will briefly analyze coreDNS.
K8s DNS Policies There are four types of DNS policies for Pods in Kubernetes.
Default: Pod inherits the DNS configuration on the host where it resides.
Unified User Management for Kubernetes with KeyCloak
As we all know, in the K8s permission management system, you can bind RoleBinding to ServiceAccount, User and Group to achieve permission assignment.
We often use ServiceAccount to restrict the permissions of a pod; for User and Group, there are no specific resources to correspond to them except for some special system groups, which is very unfriendly to the user management in traditional projects.
This article talks about how to unify user management in K8s clusters.
Build and maintain operators using the Operator Framework toolkit
Operator Framework The Operator Framework is a CoreOS open source toolkit for rapid development and maintenance of Operator, and contains three main projects. Operator SDK: Quickly build an Operator application without having to understand the complex Kubernetes API features. Operator Lifecycle Manager (OLM): helps install, update, and manage all operators (and their associated services) running across clusters Operator Metering: collects operator metric information and generates reports With these three sub-projects,
Creating custom K8s AdmissionWebhooks with Kubebuilder
Kubebuilder can build AdmissionWebhooks in addition to CRD APIs and their Controllers, and this article will analyze how Kubebuilder builds AdmissionWebhooks in detail.
AdmissionWebhooks for K8s First of all, it is important to know what AdmissionWebhooks is in K8s and what its purpose is.
Let’s start with the scenario. If we need to make configuration changes or checks on a pod before it is created, this part of the work would require the administrator to compile it into a binary file in ApiServer, which would be very annoying if the configuration changes were made in a custom form.
Golang Common Concurrency Programming Tips
Golang was one of the first languages to incorporate the principles of CSP into its core and to introduce this style of concurrent programming to the general public.CSP refers to Communicating Sequential Processes, or communicating sequential processes, where each instruction needs to specify exactly whether it is an output variable (the case of reading a variable from a process), or a destination (in the case of sending input to a
Deployment Controller Workflow
This article is based on reading the source code of Kubernetes v1.16 and describes the workflow of Deployment Controller through flowcharts, but also contains related code for reference.
In the previous article, “How Kubernetes Controller Manager Works”, we explained how the Controller Manager manages the Controller, and we know that the Controller only needs to implement the event handler and does not need to care about the upper logic. This article is based on that article and describes what the Deployment Controller does when it receives an event from the Informer.
How Kubernetes Controller Manager works
This article is based on reading the source code of Kubernetes v1.16. The article has some source code, but I will try to describe it clearly by matching pictures In the Kubernetes Master node, there are three important components: ApiServer, ControllerManager, and Scheduler, which are responsible for the management of the whole cluster together. In this article, we try to sort out the workflow and principle of ControllerManager. What is
Kubernetes container-native workflow engine: Argo
This article is based on Argo v2.7.6, Kubernetes v1.18
Argo is a workflow engine based on Kubernetes CRD implementation, providing container-native workflows for Kubernetes, i.e. each workflow node is running a task as a container.
Installation Since it is based on the Kubernetes CRD implementation, it is composed of CRD + Controller.
Installing Argo is as simple as creating an argo namespace and applying an official yaml.
1 2 kubectl create namespace argo kubectl apply -n argo -f https://raw.
Operating system memory virtualization
The program itself doesn’t need to care where its data and code live, and memory appears to be contiguous and exclusive to the program. This is certainly not the case, and it is the operating system that is responsible for this - memory virtualization. This article describes how the operating system implements a virtual memory system.
Address Space The operating system provides an easy-to-use abstraction of physical memory: the address space.