Today I saw a piece of code that exposes metrics with Prometheus’ client_go. Instead of simply inc()
ing the corresponding metric counter, it implements a very strange logic of its own:
- when the program needs to add counter +1, it does not operate the corresponding metrics directly, but packages the metrics to be added in its own format and sends the object to a channel, each metric corresponds to a channel
- The program starts a globally unique worker goroutine at the beginning, which is responsible for all the metrics: it gets messages from different channels, unpacks them, finds the corresponding metrics that should be added, and then performs the final addition operation.
The actual operation is much more complicated, first creating a MetricsBuilder, then the MetricsBuilder has an Add() function, which actually sends a message to the Channel, which is read out and executed to metrics + 1 through some series of cascading calls.
It feels like it’s a one-line code metrics.Add()
thing, why does it have to be so complicated? After thinking about it, I think the only possible explanation is that this is an extremely loaded system and I want to make the metrics operation asynchronous so that it doesn’t take up business processing time. But using channel also involves packaging and unpacking, so is it really fast?
At first I thought channel might be a high-performance lock-free operation, but after reading the runtime section of golang, I found that there are also locks, and if multiple threads write to a channel at the same time, there are also contention conditions.
And Prometheus’ client_golang just performs an Add operation: atomic.AddUint64(&c.valInt, ival)
Although atomic is also a CAS operation, intuitively I don’t think using channel is faster than atomic.
Two pieces of code were written to compare the two cases (the test code and the way to run it can be found here atomic_or_channel).
Directly with atomic.
The simulation opens a channel responsible for adding.
|
|
The parameters are as follows, intended to simulate 100 parallel connections, which need to be increased by 1 million times.
The actual results are just as I thought, with atomic being 15 times faster than the channel approach.
|
|
Atomic can simulate 100 clients adding 1 million times in parallel in 2s, i.e. supporting 50 million QPS (and only on my laptop), while the same operation takes 30-40s using the channel approach described above. 15 times slower.
Although I have said in some places that atomic is slow, this speed is perfectly adequate for this scenario of metrics statistics.