Recommend automated monitoring of website operation services - Gatus

Gatus

After deploying a service, it’s important to make it clear to the team that the service is running. For example, GitHub provides a overall running page that monitors the status of common operations such as Git Operations, Webhooks, or GitHub Actions and other services. This allows developers to check the status of their services in real time when they encounter problems and take appropriate action. There are many online services like this, such as Atlassian’s Statuspage or PingPong, and more free services can be found directly at See awesome-status-pages for more free services. This article introduces a set of open source software Gatus, which is written in Go language and is very lightweight.

What is Gatus

Gatus provides a lightweight service health monitoring webpage for developers to monitor service status through simple HTTP, ICMP, TCP protocols, etc., and determine the health of a website based on the Status Code or Response time and Body content of the webpage response, and set different Alerts if an abnormality occurs. If an abnormality occurs, you can set different alerts such as Slack, Email, Teams, Discord or Telegram, and other common real-time software. You can see the actual status of the Dashboard at check this link.

Why choose Gatus

The official has actually written very clearly

Why would I use Gatus when I can just use Prometheus, Alertmanager, Cloudwatch or even Splunk?

The first point you developers can think about is how to monitor the status of the entire service, instead of waiting until the customer encounters a problem before you know what’s going on. Gatus can configure and check each function from the customer’s point of view, and the team can monitor the important services or interfaces and organize the data in real time, so that the team can know the status earlier than the customer.

The second point the team can consider is that if they start with Prometheus, is the threshold too high and does the team really have the time and manpower to do complete monitoring? Using Prometheus + Alert to the Grafana monitoring page takes a lot of time and manpower to complete, and are these really the indicators that the customer wants to see? And are these indicators really what the customer wants to see? And are the alerts being received correctly? Gatus allows the team to quickly monitor the entire service with a simple setup, and real-time notifications can be set up in a matter of hours.

Docker Installation

The fastest way to install is via Docker, with Postgres, but you can also use SQLite lightweight database.

version: '3.9'

services:
  gatus:
    image: twinproduction/gatus:v3.6.0
    volumes:
      - ./config:/config
    restart: always
    stop_signal: SIGINT
    stop_grace_period: 10s
    ports: 
      - 8085:8080
    networks:
      - web

  postgres:
    image: postgres:12
    volumes:
      - /data/monitor/database:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=gatus
      - POSTGRES_USER=gatus
      - POSTGRES_PASSWORD=gatus
    networks:
      - web

networks:
  web:
    external: true

As you can see, you also need to create a config directory with a new config.yaml file.

storage:
  type: postgres
  path: "postgres://gatus:gatus@postgres:5432/gatus?sslmode=disable"

endpoints:
  - name: TL API
    group: Transfer Learning
    url: "https://tl-api.xxxxxx/healthz"
    interval: 60s
    conditions:
      - "[STATUS] == 200"

Once started, open the browser http://localhost:8085 to see the live page.

Gatus Settings File

Since our team has many projects, each project designs the website structure and services, so we can use group settings to distinguish different project settings.

endpoints:
  # Monitor
  - name: Prometheus
    group: Monitor
    url: "http://pm.xxxxxx/-/healthy"
    interval: 10s
    conditions:
      - "[STATUS] == 200"
      - "[BODY] == Prometheus is Healthy."
    alerts:
      - type: email
        enabled: true
        description: "healthcheck failed"
        send-on-resolved: true

  - name: Grafana
    group: Monitor
    url: "http://gf.xxxxxx/healthz"
    interval: 10s
    conditions:
      - "[STATUS] == 200"
      - "[BODY] == Ok"
    alerts:
      - type: email
        enabled: true
        description: "healthcheck failed"
        send-on-resolved: true

  - name: Loki
    group: Monitor
    url: "http://loki.xxxxxx/ready"
    interval: 10s
    conditions:
      - "[STATUS] == 200"
      - "[BODY] == ready"
    alerts:
      - type: email
        enabled: true
        description: "healthcheck failed"
        send-on-resolved: true

  # Storage
  - name: Object
    group: Storage
    url: "http://object.xxxxxx/minio/health/live"
    interval: 10s
    conditions:
      - "[STATUS] == 200"
    alerts:
      - type: email
        enabled: true
        description: "healthcheck failed"
        send-on-resolved: true

As you can see above, we can monitor the health of Prometheus, not only set STATUS, but also set BODY, which is quite simple. In addition, Alerts can be set in various ways, such as Email, Discord, Slack, etc… Take Email as an example

alerting:
  email:
    from: "srv_it_eas1_tester@xxxxxx"
    host: "smtp.mediatek.inc"
    username: "srv_it_eas1_tester@xxxxxx"
    password: "xxxxxx"
    port: 25
    to: "GSS_Global_AIDE_PA@xxxxxx"

Look at the Email notification message and you can clearly see the status of all condition detection.

Email notification message

Since there are always new services or tests, you need to move to the config file often, and Gatus provides real-time detection of file configuration changes to dynamically adjust the web monitoring display. This point needs to be noted in the docker-compose not to hang the config.yaml directly inside the container. I have issued a PR fix example, after the change, you can put the config file into the service via CI/CD in real time. Next, let’s see how to deploy through Drone, two steps to finish.

kind: pipeline
type: docker
name: monitor-gatus

steps:
- name: upload config
  image: appleboy/drone-scp
  settings:
    host: mtkmattermost.mediatek.inc
    username: deploy
    key:
      from_secret: monitor_key
    port: 22
    target: /home/deploy/monitor-gatus
    source:
      - config
      - data
      - docker-compose.yml

- name: deploy script
  image: appleboy/drone-ssh
  settings:
    host: mtkmattermost.mediatek.inc
    username: deploy
    key:
      from_secret: monitor_key
    port: 22
    script:
      - cd monitor-gatus && docker-compose up -d

The directory structure is as follows, after which each team member can adjust their own settings.

├── config
│   └── config.yaml
├── data
└── docker-compose.yml

Gatus notification function is not enough

If you have used it, you can know that all Alert notifications can only set a group of data, like Email, you can only set a group of To list, and can not adjust the To list according to different groups, this in last year the author also issued Issue to record this point issues/96), I also issued PR to supplement the Email function according to this point of record, if PR is accepted, then the next version can use the function under.

alerting:
  email:
    from: "from@example.com"
    username: "from@example.com"
    password: "hunter2"
    host: "mail.example.com"
    port: 587
    to: "recipient1@example.com,recipient2@example.com"
    overrides:
      - group: "core"
        to: "recipient3@example.com,recipient4@example.com"

Summary

The reason I choose this set is simple setup and easy deployment, in addition to monitoring web services, the test team can actually take this set, to write a large number of tests to monitor all services and performance, this alone can save a lot of time for the team to do testing. In addition, each service can also see the response time results.

response time

Table of Contents