Grafana Releases Phlare, an Open Source Database for Large-Scale Continuous Performance profiling

Grafana Phlare is an open source software project for aggregating continuous profiling data.Grafana Phlare can be fully integrated with Grafana, allowing you to correlate it with other observable signals.

Grafana Phlare

What is continuous profiling?

The concept is valuable: Profiling can help you understand the resource usage of your programs, which in turn can help you optimize their performance and cost. However, the shift to distributed cloud-native architectures complicates this, creating the need for continuous analytics, where information about resource usage is automatically collected periodically across the computing infrastructure, then compressed and stored as time-series data, which allows you to visualize changes over time and zoom in on profile files that match the time period of interest - For example, the amount of CPU time spent during its highest utilization.

In terms of the value it brings, continuous analytics has been called the fourth pillar of observability (after metrics, logging, and tracing).

At Grafana Labs, we are beginning to look at using continuous analytics to understand the performance of the software we use to support Grafana Cloud, including Grafana Loki, Grafana Mimir, Grafana Tempo, and Grafana. For example, if we are paging a slow query in Mimir, we might use analytics to see where in the Mimir codebase that query is taking the most time. If we see Grafana crashing repeatedly due to out-of-memory errors, we would look at the memory profile to see which object was consuming the most memory before the crash.

continuous profiling

While there are open source projects for storing and querying continuous analytics data, after some investigation, we struggled to find one that met the scalability, reliability, and performance requirements needed to support the level of continuous analytics required by Grafana Labs. A group of engineers led the project during a company-wide hackathon that demonstrated the value of analyzing data when connected to metrics, logs and traces, further increasing our desire to roll out continuous analytics in all environments.

As a result, we decided to set out to create a database for continuous analytics telemetry, based on the design principles that have made our other open source observability backends Loki, Tempo, and Mimir so successful: horizontally scalable architecture and the use of object storage.

Core Features

Grafana Phlare provides horizontally scalable, highly available, long-term storage and analysis of data queries. Just like Prometheus, it’s easy to install with a single binary and no additional dependencies. Because Phlare uses object storage, you can store all the history you need without spending a lot of money. Its native multi-tenancy and isolation feature set allows you to run one database for multiple independent teams or business units.The core features of Grafana Phlare are shown below.

Easy to install: Using its monolithic model, only one binary is needed to get Grafana Phlare up and running, with no additional dependencies. On Kubernetes, you can use Helm Chart method for different modes of deployment.
Horizontal scalability: The ability to run Grafana Phlare on multiple machines makes it easy to scale the database to handle the amount of analysis generated by the workload.
High Availability: Grafana Phlare replicates incoming profiles files, ensuring that no data is lost in the event of a machine failure. This means you can rollout without interrupting profile file ingestion and analysis.
Cheap, durable profile file storage: Grafana Phlare uses object storage for long-term data storage, enabling it to take advantage of this ubiquitous, cost-effective, and highly durable technology. It is compatible with multiple object storage implementations, including AWS S3, Google Cloud Storage, Azure Blob Storage, OpenStack Swift, and any S3-compatible object storage.
Native Multi-Tenancy: Grafana Phlares multi-tenant architecture enables you to isolate data and queries from separate teams or business units so that these groups can share the same database.

Architecture

Grafana Phlare has a microservices-based architecture with multiple horizontally scalable microservices that can run individually and in parallel. grafana Phlare microservices are called components. grafana Phlare is designed to compile the code for all components into a single binary. The -target parameter controls which components a single binary will run as, in the same pattern as Grafana Loki. For users who want a quick experience, Grafana Phlare can also run in monolithic mode, with all components running simultaneously in a single process.

Most of Grafana Phlare’s components are stateless and do not require any data retention between process restarts. Some components are stateful and rely on storage that is not prone to data loss to prevent data loss between process restarts. grafana Phlare includes a set of components that interact to form clusters: Distributor, Ingester, and Querier.

Monolithic Mode

Monolithic mode runs all required components in a single process and is the default mode of operation, which you can set by specifying the -target=all parameter. Monolithic mode is the easiest way to deploy Grafana Phlare, which is useful if you want to get started quickly or if you want to use Grafana Phlare in a development environment. To see a list of components that run when -target is set to all, run Grafana Phlare with the -modules flag.

`1`	`./phlare -modules`

Monolithic Mode

Microservice Model

In the microservices model, components are deployed in different processes. Scaling is done on a per-component basis, which allows for greater flexibility in scaling and more granular fault domains. The microservices model is the preferred approach for production deployments, but it is also the most complex.

In microservices mode, each Grafana Phlare process is invoked with its -target parameter set to a specific Grafana Phlare component (for example, -target=ingester or -target=distributor). To get a working instance of Grafana Phlare, you must deploy each of the required components. If you want to deploy Grafana Phlare using the microservices model, then it is highly recommended to use Kubernetes.

Microservice Model

Deployment

Here we deploy to a Kubernetes cluster as Helm Chart, provided we have an available Kubernetes cluster with kubectl and helm configured.

First we create a namespace called phlare-test and deploy the entire application in that namespace.

`1`	`☸ ➜ kubectl create namespace phlare-test`

Then add Phlare’s Helm Chart repository.

1
2

☸ ➜ helm repo add grafana https://grafana.github.io/helm-charts
☸ ➜ helm repo update

Then we can use Helm to do the installation.

If you want to install it as a single entity by default, just execute the following command to install it with one click.

`1`	`☸ ➜ helm -n phlare-test install phlare grafana/phlare`

If you want to install Grafana Phlare in microservice mode, you can first obtain the default values configuration file that is officially provided.

# Collecting the default configuration of microservices
☸ ➜ curl -LO values-micro-services.yaml https://raw.githubusercontent.com/grafana/phlare/main/operations/phlare/helm/phlare/values-micro-services.yaml
☸ ➜ cat values-micro-services.yaml
# Default values for phlare.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

phlare:
  components:
    querier:
      kind: Deployment
      replicaCount: 3
      resources:
        limits:
          memory: 1Gi
        requests:
          memory: 256Mi
          cpu: 100m
    distributor:
      kind: Deployment
      replicaCount: 2
      resources:
        limits:
          memory: 1Gi
        requests:
          memory: 256Mi
          cpu: 500m
    agent:
      kind: Deployment
      replicaCount: 1
      resources:
        limits:
          memory: 512Mi
        requests:
          memory: 128Mi
          cpu: 50m
    ingester:
      kind: StatefulSet
      replicaCount: 3
      resources:
        limits:
          memory: 12Gi
        requests:
          memory: 6Gi
          cpu: 1

minio:
  enabled: true

We need to use the values file above to install Grafana Phlare, or you can adjust the configuration according to the actual situation of your cluster, for example ingester configured with memory: 6Gi, cpu: 1 for resource requests. I have insufficient cluster resources here, so I can lower it a bit and set the number of copies to 1 for now (for testing only), otherwise there is no way to schedule successfully.

The installation can then be started using the following command.

☸ ➜ helm -n phlare-test upgrade --install phlare grafana/phlare -f values-micro-services.yaml
Release "phlare" does not exist. Installing it now.
NAME: phlare
LAST DEPLOYED: Thu Nov  3 14:37:38 2022
NAMESPACE: phlare-test
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thanks for deploying Grafana Phlare.

In order to configure Grafana to use the Phlare datasource, you need to add the Phlare datasource to your Grafana instance.

The in-cluster query URL is:
http://phlare-querier.phlare-test.svc.cluster.local.:4100

To forward the query API to your localhost you can use:
kubectl --namespace phlare-test port-forward svc/phlare-querier 4100:4100

To see if the Pod status is OK after deployment.

☸ ➜ kubectl get pods -n phlare-test
NAME                                  READY   STATUS    RESTARTS   AGE
phlare-agent-56986dd4b9-4s6z6         1/1     Running   0          3m23s
phlare-distributor-7447b4c6c5-f4rjw   1/1     Running   0          3m23s
phlare-ingester-0                     1/1     Running   0          3m23s
phlare-minio-0                        1/1     Running   0          3m23s
phlare-querier-8cdf986c-hhn29         1/1     Running   0          3m23s

Wait until the status of all Pods changes to Running or Completed to indicate that the deployment is complete.

Usage

Then we can configure Grafana to query profile data. Here we install Grafana in the same Kubernetes cluster where Phlare is installed, again using the following command in one click.

☸ ➜ helm template -n phlare-test grafana grafana/grafana \
  --set image.repository=aocenas/grafana \
  --set image.tag=profiling-ds-2 \
  --set env.GF_FEATURE_TOGGLES_ENABLE=flameGraph \
  --set env.GF_AUTH_ANONYMOUS_ENABLED=true \
  --set env.GF_AUTH_ANONYMOUS_ORG_ROLE=Admin \
  --set env.GF_DIAGNOSTICS_PROFILING_ENABLED=true \
  --set env.GF_DIAGNOSTICS_PROFILING_ADDR=0.0.0.0 \
  --set env.GF_DIAGNOSTICS_PROFILING_PORT=6060 \
  --set-string 'podAnnotations.phlare\.grafana\.com/scrape=true' \
  --set-string 'podAnnotations.phlare\.grafana\.com/port=6060' > grafana.yaml
☸ ➜ kubectl apply -f grafana.yaml

The list of Pods for the entire phlare-test namespace after deployment is as follows.

☸ ➜ kubectl get pods -n phlare-test                            
NAME                                  READY   STATUS              RESTARTS   AGE
grafana-5ff87bdfd-whmkm               1/1     Running             0          85s
phlare-agent-56986dd4b9-4s6z6         1/1     Running             0          9m17s
phlare-distributor-7447b4c6c5-f4rjw   1/1     Running             0          9m17s
phlare-ingester-0                     1/1     Running             0          9m17s
phlare-minio-0                        1/1     Running             0          9m17s
phlare-querier-8cdf986c-hhn29         1/1     Running             0          9m17s

We can use the following command to forward the Grafana service locally.

`1`	`☸ ➜ kubectl port-forward -n phlare-test service/grafana 3000:80`

Then open http://localhost:3000 in your browser to access the Grafana service.

Click Configuration -> Data Sources on the left side of the page to add a data source for profiles, and select a data source of type phlare.

phlare

Set the URL of the data source to http://phlare-querier.phlare-test.svc.cluster.local.:4100/.

URL of the data source

Click Save & Test to save. Once the data source is added, you should be able to query the profiles file in Grafana Explore in almost the same way as Loki and Prometheus, as shown below we can query the CPU of the Grafana application.

Dashborad

Phlare integrates natively with Grafana, allowing you to visualize profile data along with metrics, logging, and tracing, and get a full view of the entire stack. We’ve also added a flame chart panel to Grafana that allows you to build dashboards that display analytics data alongside data from hundreds of different data sources visualized in Grafana.

Dashborad

Phare’s Helm Chart uses the default configuration for its agents to crawl pods as long as they have the correct annotations, which uses relabel_config and kubernetes_sd_config that may be similar to Prometheus or Grafna Agent configurations.

In order for Phlare to crawl the pod, you must add the following annotation to the pod.

metadata:
  annotations:
    phlare.grafana.com/scrape: "true"
    phlare.grafana.com/port: "8080"

where phlare.grafana.com/port should be set to the port on which your pod serves the /debug/pprof/ endpoint. Note that the values of phlare.grafana.io/scrape and phlare.grafana.io/port must be enclosed in double quotes to ensure that it is represented as a string.

We configured these two annotations above when we installed Grafana, so we can use Phlare to constantly scrape the Grafana application’s profiles data, which is why we can go ahead and filter Grafana’s profiles data above.

Ref

https://github.com/grafana/phlare
https://grafana.com/blog/2022/11/02/announcing-grafana-phlare-oss-continuous-profiling-database/

Table of Contents

What is continuous profiling?

Core Features

Architecture

Monolithic Mode

Microservice Model

Deployment

Usage

Ref