ClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP) of queries.
Clickhouse Features
-
A true column-oriented DBMS
In a true column-oriented DBMS, no additional data is stored along with the values. Beyond that, this means that constant length values must be supported to avoid storing their length “numbers” next to the values. For example, a billion UInt8 type values should consume about 1 GB uncompressed, otherwise this would severely impact CPU usage. Even in the uncompressed case, the data must be stored compactly (without any “garbage”), because the speed of decompression (CPU usage) depends largely on the amount of uncompressed data.
ClickHouse is a database management system, not a single database. clickHouse allows to create tables and databases, load data and run queries at runtime without reconfiguring and restarting the server.
-
Data compression
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse offers specialized codecs for specific types of data, which allows ClickHouse to compete with and outperform more niche databases, such as time series databases.
-
Disk storage of data
Keeping data physically sorted by primary key allows extraction of data for a specific value or range of values with low latency (less than tens of milliseconds). Some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) only work in RAM. This approach encourages allocating a larger hardware budget than is required for real-time analysis.
ClickHouse is designed for common hard drives, which means the cost per GB of data storage is low, but if available, it can also take advantage of SSDs and additional RAM.
-
Multi-core parallel processing
Large queries are naturally parallelized, taking up all necessary resources available on the current server.
-
Distributed processing on multiple servers
In ClickHouse, data can reside on different slices. Each slice can be a set of replicas for fault tolerance. All slices are used to run queries in parallel, transparent to the user.
-
SQL Support
ClickHouse supports SQL-based declarative query language, which in many cases is identical to ANSI SQL standard.
Supported queries include GROUP BY, ORDER BY, subqueries in FROM, JOIN clauses, IN operators, window functions and scalar subqueries.
-
Vector Computing Engine
Data is not only stored by column, but also processed by vectors (part of a column), resulting in high CPU efficiency.
-
Real-time data update
ClickHouse supports tables with primary keys. To quickly perform queries on the range of primary keys, the data is sorted incrementally using merge trees. Thus, data can be continuously added to the table. No locking when ingesting new data.
-
First-level indexes
Physical sorting of data by primary key allows data to be fetched for its specific value or range of values with low latency (less than tens of milliseconds).
-
Secondary indexes
Unlike other database management systems, secondary indexes in ClickHouse do not point to specific rows or ranges of rows. Instead, they let the database know in advance that all rows in certain parts of the data will not match the query filters and not read them at all, so they are called data-skipping indexes.
-
Suitable for online queries
Most OLAP database management systems do not target online queries with sub-second latency. In alternative systems, report build times of tens of seconds or even minutes are often considered acceptable. Sometimes even more effort is needed to prepare reports offline
In ClickHouse, low latency means that queries can be processed without delay while loading user interface pages, without trying to prepare answers in advance. In other words, online.
-
Support for approximate calculations
ClickHouse provides several ways to exchange performance accuracy.
Aggregate functions for approximate calculation of different values, medians and quartiles.
Run queries based on partial (sample) data and get approximate results. In this case, the data retrieved from disk is reduced proportionally.
Run aggregation for a limited number of random keys instead of all keys. Under specific conditions of key distribution in the data, this provides fairly accurate results while using fewer resources.
-
Adaptive join algorithm
ClickHouse adaptively selects how to JOIN multiple tables by preferring the hash join algorithm and falling back to the merge join algorithm if there is more than one large table.
-
Data replication and data integrity support
ClickHouse uses asynchronous multi-master replication. After writing any available replicas, all remaining replicas retrieve their copies in the background. The system maintains the same data on different replicas. Most post-failure recovery is performed automatically or, in complex cases, semi-automatically.
-
Role-based access control
ClickHouse implements user account management using SQL queries and allows role-based access control configurations similar to those found in ANSI SQL standards and popular relational database management systems.
For more features refer to the official documentation:https://clickhouse.com/docs/en/introduction/performance/
Clickhouse Persistence Configuration
Data persistence here uses NFS for persisting data.
Install NFS.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
#这里我使用单独服务器进行演示,实际上顺便使用一台服务器安装nfs都可以 (建议和kubernetes集群分开,找单独一台机器)
[root@nfs ~]# yum install nfs-utils -y rpcbind
#接下来设置nfs存储目录
[root@nfs ~]# mkdir /data/k8s-volume
[root@nfs ~]# chmod 755 /data/k8s-volume/
#编辑nfs配置文件
[root@nfs ~]# cat /etc/exports
/data/k8s-volume *(rw,no_root_squash,sync)
#存储目录,*允许所有人连接,rw读写权限,sync文件同时写入硬盘及内存,no_root_squash 使用者root用户自动修改为普通用户
接下来启动rpcbind
[root@nfs ~]# systemctl start rpcbind
[root@nfs ~]# systemctl enable rpcbind
[root@nfs ~]# systemctl status rpcbind
● rpcbind.service - RPC bind service
Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
Active: active (running) since 四 2019-12-19 18:44:29 CST; 11min ago
Main PID: 3126 (rpcbind)
CGroup: /system.slice/rpcbind.service
└─3126 /sbin/rpcbind -w
#由于nfs需要向rpcbind进行注册,所以我们需要优先启动rpcbind
#启动NFS
[root@nfs ~]# systemctl restart nfs
[root@nfs ~]# systemctl enable nfs
[root@nfs ~]# systemctl status nfs
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
Drop-In: /run/systemd/generator/nfs-server.service.d
└─order-with-mounts.conf
Active: active (exited) since 四 2019-12-19 18:44:30 CST; 13min ago
Main PID: 3199 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/nfs-server.service
#检查rpcbind及nfs是否正常
[root@nfs ~]# rpcinfo |grep nfs
100003 3 tcp 0.0.0.0.8.1 nfs superuser
100003 4 tcp 0.0.0.0.8.1 nfs superuser
100227 3 tcp 0.0.0.0.8.1 nfs_acl superuser
100003 3 udp 0.0.0.0.8.1 nfs superuser
100003 4 udp 0.0.0.0.8.1 nfs superuser
100227 3 udp 0.0.0.0.8.1 nfs_acl superuser
100003 3 tcp6 ::.8.1 nfs superuser
100003 4 tcp6 ::.8.1 nfs superuser
100227 3 tcp6 ::.8.1 nfs_acl superuser
100003 3 udp6 ::.8.1 nfs superuser
100003 4 udp6 ::.8.1 nfs superuser
100227 3 udp6 ::.8.1 nfs_acl superuser
#查看nfs目录挂载权限
[root@nfs ~]# cat /var/lib/nfs/etab
/data/k8s-volume *(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,no_pnfs,anonuid=65534,anongid=65534,sec=sys,rw,secure,no_root_squash,no_all_squash)
#检查挂载是否正常
[root@nfs ~]# showmount -e 127.0.0.1
Export list for 127.0.0.1:
/data/k8s-volume *
|
Create nfs client, the nfs server address is 192.168.31.101 and the data storage directory is /data/k8s-volume.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
kind: Deployment
apiVersion: apps/v1
metadata:
name: nfs-client-provisioner
spec:
replicas: 1
selector:
matchLabels:
app: nfs-client-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: quay.io/external_storage/nfs-client-provisioner:latest
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: fuseim.pri/ifs
- name: NFS_SERVER
value: 192.168.31.101 #nfs server 地址
- name: NFS_PATH
value: /data/k8s-volume #nfs共享目录
volumes:
- name: nfs-client-root
nfs:
server: 192.168.31.101
path: /data/k8s-volume
|
Next, we need to create a serveraccount to bind the ServiceAccount in nfs-client-provisioner to a ClusterRole of nfs-client-provisioner-runner, which declares The ClusterRole declares some permissions, including CRUD for persistentvolumes, so we can use the ServiceAccount to automatically create PVs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["create", "delete", "get", "list", "watch", "patch", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
|
Check if the pod is ok
1
2
3
|
[root@k8s-01 nfs]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-client-provisioner-7995946c89-n7bsc 1/1 Running 0 13m
|
Create storageclass . Here we declare a Storageclass object named managed-nfs-storage.
1
2
3
4
5
|
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: fuseim.pri/ifs # or choose another name, must match deployment's env PROVISIONER_NAME'
|
Check Status.
1
2
3
|
[root@k8s-01 nfs]# kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-storage fuseim.pri/ifs Delete Immediate false 104d
|
Create pvc for clickhouse.
First you need to create a namespace to put ck related.
1
|
$ kubectl create ns test
|
The contents of the pvc yaml file are as follows
1
2
3
4
5
6
7
8
9
10
11
12
13
|
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clickhouse-pvc
namespace: test
spec:
resources:
requests:
storage: 10Gi #数据大小
accessModes:
- ReadWriteMany # pvc数据访问类型
storageClassName: "managed-nfs-storage" #storageclass 名称
|
Check Status.
1
2
3
|
[root@k8s-01 clickhouse]# kubectl get pvc -n test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
clickhouse-pvc Bound pvc-ee8a47fc-a196-459f-aca4-143a8af58bf3 10Gi RWX managed-nfs-storage 25s
|
Clickhouse installation
Since we need to modify the users.xml
configuration here and do some configuration parameter jumping, I will download the users.xml
here and modify it and mount it using configmap.
1
2
3
4
5
6
7
8
|
#这里可以直接下载我的配置,或者是启动ck在复制users.xml拷贝下来修改
wget https://d.frps.cn/file/kubernetes/clickhouse/users.xml
[root@k8s-01 clickhouse]# kubectl create cm -n test clickhouse-users --from-file=users.xml #不做配置持久化可以跳过
configmap/clickhouse-users created
[root@k8s-01 clickhouse]# kubectl get cm -n test
NAME DATA AGE
clickhouse-users 1 5s
|
The clickhouse yaml file is as follows.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
|
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: clickhouse
name: clickhouse
namespace: test
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: clickhouse
template:
metadata:
labels:
app: clickhouse
spec:
containers:
- image: clickhouse/clickhouse-server
imagePullPolicy: IfNotPresent
name: clickhouse
ports:
- containerPort: 8123
protocol: TCP
resources:
limits:
cpu: 1048m
memory: 2Gi
requests:
cpu: 1048m
memory: 2Gi
volumeMounts:
- mountPath: /var/lib/clickhouse
name: clickhouse-volume
- mountPath: /etc/clickhouse-server/users.xml
subPath: users.xml
name: clickhouse-users
volumes:
- name: clickhouse-users
configMap:
name: clickhouse-users
defaultMode: 511
- name: clickhouse-volume
persistentVolumeClaim:
claimName: clickhouse-pvc
restartPolicy: Always
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: clickhouse
namespace: test
spec:
ports:
- port: 8123
protocol: TCP
targetPort: 8123
selector:
app: clickhouse
type: ClusterIP
|
Check if the service is working.
1
2
3
4
5
6
7
8
|
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned test/clickhouse-bd6cb4f4b-8b6lx to k8s-02
Normal Pulling 6m17s kubelet, k8s-02 Pulling image "clickhouse/clickhouse-server"
Normal Pulled 4m25s kubelet, k8s-02 Successfully pulled image "clickhouse/clickhouse-server"
Normal Created 4m20s kubelet, k8s-02 Created container clickhouse
Normal Started 4m17s kubelet, k8s-02 Started container clickhouse
|
Check pod svc status.
1
2
3
4
5
6
7
|
[root@k8s-01 clickhouse]# kubectl get pod -n test
NAME READY STATUS RESTARTS AGE
clickhouse-bd6cb4f4b-8b6lx 1/1 Running 0 7m4s
[root@k8s-01 clickhouse]# kubectl get svc -n test
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
clickhouse ClusterIP 10.100.88.207 <none> 8123/TCP 7m23s
|
pod internal call test.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
|
[root@k8s-01 clickhouse]# kubectl exec -it -n test clickhouse-bd6cb4f4b-8b6lx bash #进入到容器
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
root@clickhouse-bd6cb4f4b-8b6lx:/# clickhouse-client #连接客户端
ClickHouse client version 21.12.3.32 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.12.3 revision 54452.
clickhouse-bd6cb4f4b-8b6lx :) show databases; #查看数据库
SHOW DATABASES
Query id: d89a782e-2fb5-47e8-a4e0-1ab3aa038bdf
┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default │
│ information_schema │
│ system │
└────────────────────┘
4 rows in set. Elapsed: 0.003 sec.
clickhouse-bd6cb4f4b-8b6lx :) create database abcdocker #创建测试库
CREATE DATABASE abcdocker
Query id: 3a7aa992-9fe1-49fe-bc54-f537e0f4a104
Ok.
0 rows in set. Elapsed: 3.353 sec.
clickhouse-bd6cb4f4b-8b6lx :) show databases;
SHOW DATABASES
Query id: c53996ba-19de-4ffa-aa7f-2f3c305d5af5
┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ abcdocker │
│ default │
│ information_schema │
│ system │
└────────────────────┘
5 rows in set. Elapsed: 0.006 sec.
clickhouse-bd6cb4f4b-8b6lx :) use abcdocker;
USE abcdocker
Query id: e8302401-e922-4677-9ce3-28c263d162b1
Ok.
0 rows in set. Elapsed: 0.002 sec.
clickhouse-bd6cb4f4b-8b6lx :) show tables
SHOW TABLES
Query id: 29b3ec6d-6486-41f5-a526-28e80ea17107
Ok.
0 rows in set. Elapsed: 0.003 sec.
clickhouse-bd6cb4f4b-8b6lx :)
|
Next, we create a Telnet container and test whether it works to access the container directly using svc name.
1
2
3
4
5
6
|
$ kubectl run -n test --generator=run-pod/v1 -i --tty busybox --image=busybox --restart=Never -- sh
/ # telnet clickhouse 8123
Connected to clickhouse
#如果不在同一个命名空间就需要使用clickhouse.test.svc.cluster.local
|
External access to Clickhouse
k8s internal calls we use svc name, external can be implemented via nodeport.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
#svc 外部yaml如下
apiVersion: v1
kind: Service
metadata:
name: clickhouse-node
namespace: test
spec:
ports:
- port: 8123
protocol: TCP
targetPort: 8123
selector:
app: clickhouse
type: NodePort
[root@k8s-01 clickhouse]# kubectl get svc -n test
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
clickhouse ClusterIP 10.100.88.207 <none> 8123/TCP 33m
clickhouse-node NodePort 10.99.147.187 <none> 8123:32445/TCP 8s
#如果用的阿里云托管可以直接使用阿里云LoadBalancer
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: "xxxx"
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
name: clickhouse-ck
namespace: test
spec:
ports:
- port: 8123
protocol: TCP
targetPort: 8123
selector:
app: clickhouse
type: LoadBalancer
|
First you need to download Windows Tools
https://dbeaver.io/download/
Next, connect to ck and check if the library we created exists (install the downloaded package).
Add clickhouse connection
Here you can already see the database we created, which is just an empty database.
If we need to set a password for ck, we need to modify the configmap we mounted.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
root@clickhouse-bd6cb4f4b-8b6lx:/etc/clickhouse-server# cat users.xml |grep pass
<!-- See also the files in users.d directory where the password can be overridden.
If you want to specify password in plaintext (not recommended), place it in 'password' element.
Example: <password>qwerty</password>.
If you want to specify SHA256, place it in 'password_sha256_hex' element.
Example: <password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex>
If you want to specify double SHA1, place it in 'password_double_sha1_hex' element.
Example: <password_double_sha1_hex>e395796d6546b1b65db9d665cd43f0e858dd4303</password_double_sha1_hex>
place 'kerberos' element instead of 'password' (and similar) elements.
How to generate decent password:
In first line will be password and in second - corresponding SHA256.
In first line will be password and in second - corresponding double SHA1.
<password></password> #设置免密参数
|