March 27, 2025

Setting up Prometheus, Grafana, and Alertmanager on Ubuntu

These are my notes from setting up the full Prometheus monitoring stack on Ubuntu. I'll walk through installing and configuring Prometheus, Node Exporter, Pushgateway, Grafana, basic auth, and Alertmanager with Slack notifications. By the end you'll have a working monitoring pipeline that collects system metrics, visualizes them in dashboards, and sends alerts to Slack when something goes wrong.

Prerequisites

An Ubuntu server (20.04 or later) with sudo access
wget and tar installed (they usually are by default)
A Slack workspace where you can create apps and incoming webhooks
Basic familiarity with systemd and editing config files

Goals

Install Prometheus and configure it to scrape metrics from itself and other exporters
Set up Node Exporter to collect system-level metrics (CPU, memory, disk)
Install Grafana and connect it to Prometheus as a data source
Configure Pushgateway for short-lived jobs that can't be scraped directly
Enable basic auth on the Prometheus web UI
Set up Alertmanager with Slack notifications and a watchdog alert

Creating system users

For each service, create a dedicated system account. Two reasons: it limits blast radius if something goes wrong, and it makes it easier to track which processes and files belong to which service.

sudo useradd --system --no-create-home --shell /bin/false prometheus

The --system flag creates a system user (no home directory, no login shell), which is exactly what you want for a service account. Do the same for node_exporter and alertmanager later.

Prometheus

Install

Grab the latest version from the Prometheus downloads page. Replace the version number below with whatever is current:

wget https://github.com/prometheus/prometheus/releases/download/v2.54.0/prometheus-2.54.0.linux-amd64.tar.gz
tar -xvf prometheus-2.54.0.linux-amd64.tar.gz
cd prometheus-2.54.0.linux-amd64

Now move the binaries and config files to their proper locations:

sudo mv prometheus promtool /usr/local/bin/
sudo mv consoles/ console_libraries/ /etc/prometheus/
sudo mv prometheus.yml /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/ /var/lib/prometheus/

promtool is bundled with Prometheus and is useful for validating config files and alerting rules before reloading. You'll use it a lot.

systemd service

Create /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle

[Install]
WantedBy=multi-user.target

A few things to note here. --storage.tsdb.path tells Prometheus where to store its time-series data on disk. --web.enable-lifecycle is important because it lets you reload config via an HTTP API call without restarting the service, which is very useful when you're iterating on scrape configs and alerting rules.

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Access the web UI at :9090. Go to Status > Targets and you should see Prometheus scraping itself every 15 seconds.

Node Exporter

Node Exporter collects Linux system metrics (CPU, disk, memory, network, etc.) and exposes them on an HTTP endpoint that Prometheus can scrape.

Install

sudo useradd --system --no-create-home --shell /bin/false node_exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar -xvf node_exporter-1.8.2.linux-amd64.tar.gz
sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/

systemd service

Create /etc/systemd/system/node_exporter.service following the same pattern as Prometheus. The key line in ExecStart is:

ExecStart=/usr/local/bin/node_exporter --collector.logind

The --collector.logind flag enables the logind collector, which tracks active user sessions. This is optional but useful for tracking who is logged in.

Enable and start it the same way:

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

Node Exporter runs on port 9100. You can verify it's working by visiting :9100/metrics in your browser.

Add to Prometheus config

In /etc/prometheus/prometheus.yml, add a new scrape job so Prometheus knows to collect metrics from Node Exporter:

scrape_configs:
  - job_name: "node_exporter"
    static_configs:
      - targets: ["localhost:9100"]

Validate the config and reload without downtime:

promtool check config /etc/prometheus/prometheus.yml
curl -X POST http://localhost:9090/-/reload

Always run promtool check config before reloading. It catches syntax errors and invalid references that would otherwise cause Prometheus to reject the config silently.

Grafana

Grafana is the visualization layer. It connects to Prometheus as a data source and lets you build dashboards with graphs, tables, and alerts.

Install

sudo apt-get install -y apt-transport-https software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Access at :3000. Default credentials are admin / admin. You'll be prompted to change the password on first login.

Add Prometheus data source

You can add the data source through the UI (Configuration > Data Sources > Add data source > Prometheus > URL: http://localhost:9090), but the code approach is better for reproducibility. Create /etc/grafana/provisioning/datasources/datasources.yaml:

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://localhost:9090
    isDefault: true

Restart Grafana to pick it up:

sudo systemctl restart grafana-server

Dashboards

For a quick test, create a new panel using the scrape_duration_seconds metric, set the legend to {{job}}, and the unit to seconds. This shows how long each scrape takes.

For a full Node Exporter dashboard, import dashboard ID 1860 from the Grafana dashboard library. It covers CPU, memory, network, disk, and more. Go to Dashboards > Import > enter 1860 > select your Prometheus data source.

Pushgateway

Some jobs can't be scraped directly because they're short-lived. Think Jenkins builds, cron jobs, batch scripts. Instead of waiting for Prometheus to come scrape them, these jobs push their metrics to Pushgateway, and Prometheus scrapes Pushgateway on its regular schedule.

Install

Same pattern as the other components: create a system user, download the binary from the Pushgateway releases page, extract, move to /usr/local/bin/, and create a systemd service.

Pushgateway runs on port 9091.

Add to Prometheus

- job_name: "pushgateway"
  honor_labels: true
  static_configs:
    - targets: ["localhost:9091"]

The honor_labels: true setting tells Prometheus to keep the original job labels pushed by the client, rather than overwriting them with pushgateway. Without this, all metrics pushed through Pushgateway would show up under the same job label, making it hard to tell them apart.

Push metrics

Here's an example of pushing a metric from a shell script:

echo "jenkins_job_duration_seconds 15.7" | curl --data-binary @- http://localhost:9091/metrics/job/backup

The backup part in the URL is an arbitrary job label. It shows up in Prometheus under the job label, so you can use it to identify which job pushed the metric. Search for jenkins_job_duration_seconds in the Prometheus UI to find it.

Basic Auth for Prometheus

Prometheus has built-in basic auth support, so there's no need to put an nginx reverse proxy in front of it just for authentication.

Generate password hash

pip install bcrypt
python3 -c "import bcrypt; print(bcrypt.hashpw(b'YOUR_PASSWORD', bcrypt.gensalt()).decode())"

Replace YOUR_PASSWORD with your actual password. Save the bcrypt hash that gets printed.

Create web config

Create /etc/prometheus/web.yml:

basic_auth_users:
  admin: <paste-your-bcrypt-hash-here>

Update systemd service

Add the --web.config.file flag to the ExecStart line in /etc/systemd/system/prometheus.service:

ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.listen-address=0.0.0.0:9090 \
  --web.enable-lifecycle \
  --web.config.file=/etc/prometheus/web.yml

sudo systemctl daemon-reload
sudo systemctl restart prometheus

Update scrape config for self-scraping

Now that the web UI requires auth, Prometheus also needs credentials to scrape itself:

scrape_configs:
  - job_name: "prometheus"
    basic_auth:
      username: admin
      password: YOUR_PASSWORD
    static_configs:
      - targets: ["localhost:9090"]

Note: Yes, the password is in plaintext in the config file. For production setups, consider using a secrets manager or file-based secret references. For a personal monitoring stack this is fine.

Also update Grafana's datasource config to include basicAuth: true and the credentials, then restart Grafana.

From now on, reloading the config also requires credentials:

curl -X POST -u admin:YOUR_PASSWORD http://localhost:9090/-/reload

Alertmanager

Alertmanager handles deduplication, grouping, and routing of alerts from Prometheus. It can send notifications to email, PagerDuty, Slack, and many other channels. For HA setups you'd run multiple instances in a cluster, but one is fine for this setup.

Install

sudo useradd --system --no-create-home --shell /bin/false alertmanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.27.0.linux-amd64.tar.gz
cd alertmanager-0.27.0.linux-amd64

sudo mv alertmanager /usr/local/bin/
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo mv alertmanager.yml /etc/alertmanager/
sudo chown -R alertmanager:alertmanager /etc/alertmanager/ /var/lib/alertmanager/

Create a systemd service following the same pattern. Alertmanager runs on port 9093.

Note: The /var/lib/alertmanager storage directory is mandatory. Alertmanager uses it to persist silences and notification state. If you wipe this directory, all your silences are gone.

Always-on "Watchdog" alert

This is a good practice. The idea is to create an alert that is always firing. You then wire it to something like Dead Man's Snitch or a similar service that pages you if the alert stops firing. That way, if your entire monitoring pipeline goes down, you'll know about it.

Create /etc/prometheus/rules/watchdog.yml:

groups:
  - name: watchdog
    rules:
      - alert: DeadMansSwitch
        expr: vector(1)
        labels:
          severity: none
        annotations:
          summary: "Alertmanager watchdog"

The vector(1) expression always returns 1, so this alert is permanently in a firing state. That's intentional.

Slack integration

Create a #alerts channel in your Slack workspace
Go to api.slack.com/apps, create a new Slack app, enable incoming webhooks, and install it to your workspace
Copy the webhook URL

Update /etc/alertmanager/alertmanager.yml:

global:
  slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

route:
  receiver: 'slack-notifications'
  routes:
    - match:
        severity: warning
      receiver: 'slack-notifications'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts'
        send_resolved: true

The send_resolved: true setting tells Alertmanager to also send a message when an alert resolves, not just when it fires. This is useful so you know when things go back to normal.

Wire Alertmanager into Prometheus

Add the following to /etc/prometheus/prometheus.yml:

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["localhost:9093"]

rule_files:
  - "/etc/prometheus/rules/*.yml"

The rule_files directive tells Prometheus where to find alerting rules. The glob pattern *.yml picks up all YAML files in that directory, including the watchdog rule you created earlier.

Reload Prometheus:

promtool check config /etc/prometheus/prometheus.yml
curl -X POST -u admin:YOUR_PASSWORD http://localhost:9090/-/reload

Test the alert flow

To verify everything works end to end, create a test alert rule that fires when a metric exceeds a threshold, push a metric to Pushgateway that triggers it, and watch the Slack message come in. To resolve the alert, push a value below the threshold and wait for the next evaluation cycle.

Quick reference

Service	Port	Config File
Prometheus	9090	`/etc/prometheus/prometheus.yml`
Node Exporter	9100	Flags only (no config file)
Pushgateway	9091	Flags only (no config file)
Alertmanager	9093	`/etc/alertmanager/alertmanager.yml`
Grafana	3000	`/etc/grafana/grafana.ini`

Conclusion

Here's what we set up: Prometheus as the central metrics store, Node Exporter feeding it system metrics, Pushgateway for short-lived jobs, Grafana for dashboards, basic auth to lock down the Prometheus UI, and Alertmanager routing alerts to Slack.

A few things to keep in mind going forward:

Always run promtool check config before reloading Prometheus config
After editing systemd unit files, always run systemctl daemon-reload before restarting
Alertmanager's storage directory (/var/lib/alertmanager) is mandatory. If you wipe it, all silences and notification state are gone
Basic auth reload requires passing credentials in the curl request
If the Prometheus self-scrape target goes down after enabling auth, the scrape config is missing the basic_auth block

Prerequisites

Goals

Creating system users

Prometheus

Install

systemd service

Node Exporter

Install

systemd service

Add to Prometheus config

Grafana

Install

Add Prometheus data source

Dashboards

Pushgateway

Install

Add to Prometheus

Push metrics

Basic Auth for Prometheus

Generate password hash

Create web config

Update systemd service

Update scrape config for self-scraping

Alertmanager

Install

Always-on "Watchdog" alert

Slack integration

Wire Alertmanager into Prometheus

Test the alert flow

Quick reference

Conclusion

Comments