
Setting up Prometheus, Grafana, and Alertmanager on Ubuntu
These are my notes from setting up the full Prometheus monitoring stack on Ubuntu. I'll walk through installing and configuring Prometheus, Node Exporter, Pushgateway, Grafana, basic auth, and Alertmanager with Slack notifications. By the end you'll have a working monitoring pipeline that collects system metrics, visualizes them in dashboards, and sends alerts to Slack when something goes wrong.
Prerequisites
- An Ubuntu server (20.04 or later) with
sudoaccess wgetandtarinstalled (they usually are by default)- A Slack workspace where you can create apps and incoming webhooks
- Basic familiarity with
systemdand editing config files
Goals
- Install Prometheus and configure it to scrape metrics from itself and other exporters
- Set up Node Exporter to collect system-level metrics (CPU, memory, disk)
- Install Grafana and connect it to Prometheus as a data source
- Configure Pushgateway for short-lived jobs that can't be scraped directly
- Enable basic auth on the Prometheus web UI
- Set up Alertmanager with Slack notifications and a watchdog alert
Creating system users
For each service, create a dedicated system account. Two reasons: it limits blast radius if something goes wrong, and it makes it easier to track which processes and files belong to which service.
sudo useradd --system --no-create-home --shell /bin/false prometheusThe --system flag creates a system user (no home directory, no login shell), which is exactly what you want for a service account. Do the same for node_exporter and alertmanager later.
Prometheus
Install
Grab the latest version from the Prometheus downloads page. Replace the version number below with whatever is current:
wget https://github.com/prometheus/prometheus/releases/download/v2.54.0/prometheus-2.54.0.linux-amd64.tar.gz
tar -xvf prometheus-2.54.0.linux-amd64.tar.gz
cd prometheus-2.54.0.linux-amd64Now move the binaries and config files to their proper locations:
sudo mv prometheus promtool /usr/local/bin/
sudo mv consoles/ console_libraries/ /etc/prometheus/
sudo mv prometheus.yml /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus/ /var/lib/prometheus/promtool is bundled with Prometheus and is useful for validating config files and alerting rules before reloading. You'll use it a lot.
systemd service
Create /etc/systemd/system/prometheus.service:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle
[Install]
WantedBy=multi-user.targetA few things to note here. --storage.tsdb.path tells Prometheus where to store its time-series data on disk. --web.enable-lifecycle is important because it lets you reload config via an HTTP API call without restarting the service, which is very useful when you're iterating on scrape configs and alerting rules.
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheusAccess the web UI at :9090. Go to Status > Targets and you should see Prometheus scraping itself every 15 seconds.
Node Exporter
Node Exporter collects Linux system metrics (CPU, disk, memory, network, etc.) and exposes them on an HTTP endpoint that Prometheus can scrape.
Install
sudo useradd --system --no-create-home --shell /bin/false node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar -xvf node_exporter-1.8.2.linux-amd64.tar.gz
sudo mv node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/systemd service
Create /etc/systemd/system/node_exporter.service following the same pattern as Prometheus. The key line in ExecStart is:
ExecStart=/usr/local/bin/node_exporter --collector.logindThe --collector.logind flag enables the logind collector, which tracks active user sessions. This is optional but useful for tracking who is logged in.
Enable and start it the same way:
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporterNode Exporter runs on port 9100. You can verify it's working by visiting :9100/metrics in your browser.
Add to Prometheus config
In /etc/prometheus/prometheus.yml, add a new scrape job so Prometheus knows to collect metrics from Node Exporter:
scrape_configs:
- job_name: "node_exporter"
static_configs:
- targets: ["localhost:9100"]Validate the config and reload without downtime:
promtool check config /etc/prometheus/prometheus.yml
curl -X POST http://localhost:9090/-/reloadAlways run promtool check config before reloading. It catches syntax errors and invalid references that would otherwise cause Prometheus to reject the config silently.
Grafana
Grafana is the visualization layer. It connects to Prometheus as a data source and lets you build dashboards with graphs, tables, and alerts.
Install
sudo apt-get install -y apt-transport-https software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-serverAccess at :3000. Default credentials are admin / admin. You'll be prompted to change the password on first login.
Add Prometheus data source
You can add the data source through the UI (Configuration > Data Sources > Add data source > Prometheus > URL: http://localhost:9090), but the code approach is better for reproducibility. Create /etc/grafana/provisioning/datasources/datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://localhost:9090
isDefault: trueRestart Grafana to pick it up:
sudo systemctl restart grafana-serverDashboards
For a quick test, create a new panel using the scrape_duration_seconds metric, set the legend to {{job}}, and the unit to seconds. This shows how long each scrape takes.
For a full Node Exporter dashboard, import dashboard ID 1860 from the Grafana dashboard library. It covers CPU, memory, network, disk, and more. Go to Dashboards > Import > enter 1860 > select your Prometheus data source.
Pushgateway
Some jobs can't be scraped directly because they're short-lived. Think Jenkins builds, cron jobs, batch scripts. Instead of waiting for Prometheus to come scrape them, these jobs push their metrics to Pushgateway, and Prometheus scrapes Pushgateway on its regular schedule.
Install
Same pattern as the other components: create a system user, download the binary from the Pushgateway releases page, extract, move to /usr/local/bin/, and create a systemd service.
Pushgateway runs on port 9091.
Add to Prometheus
- job_name: "pushgateway"
honor_labels: true
static_configs:
- targets: ["localhost:9091"]The honor_labels: true setting tells Prometheus to keep the original job labels pushed by the client, rather than overwriting them with pushgateway. Without this, all metrics pushed through Pushgateway would show up under the same job label, making it hard to tell them apart.
Push metrics
Here's an example of pushing a metric from a shell script:
echo "jenkins_job_duration_seconds 15.7" | curl --data-binary @- http://localhost:9091/metrics/job/backupThe backup part in the URL is an arbitrary job label. It shows up in Prometheus under the job label, so you can use it to identify which job pushed the metric. Search for jenkins_job_duration_seconds in the Prometheus UI to find it.
Basic Auth for Prometheus
Prometheus has built-in basic auth support, so there's no need to put an nginx reverse proxy in front of it just for authentication.
Generate password hash
pip install bcrypt
python3 -c "import bcrypt; print(bcrypt.hashpw(b'YOUR_PASSWORD', bcrypt.gensalt()).decode())"Replace YOUR_PASSWORD with your actual password. Save the bcrypt hash that gets printed.
Create web config
Create /etc/prometheus/web.yml:
basic_auth_users:
admin: <paste-your-bcrypt-hash-here>Update systemd service
Add the --web.config.file flag to the ExecStart line in /etc/systemd/system/prometheus.service:
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--web.config.file=/etc/prometheus/web.ymlsudo systemctl daemon-reload
sudo systemctl restart prometheusUpdate scrape config for self-scraping
Now that the web UI requires auth, Prometheus also needs credentials to scrape itself:
scrape_configs:
- job_name: "prometheus"
basic_auth:
username: admin
password: YOUR_PASSWORD
static_configs:
- targets: ["localhost:9090"]Note: Yes, the password is in plaintext in the config file. For production setups, consider using a secrets manager or file-based secret references. For a personal monitoring stack this is fine.
Also update Grafana's datasource config to include basicAuth: true and the credentials, then restart Grafana.
From now on, reloading the config also requires credentials:
curl -X POST -u admin:YOUR_PASSWORD http://localhost:9090/-/reloadAlertmanager
Alertmanager handles deduplication, grouping, and routing of alerts from Prometheus. It can send notifications to email, PagerDuty, Slack, and many other channels. For HA setups you'd run multiple instances in a cluster, but one is fine for this setup.
Install
sudo useradd --system --no-create-home --shell /bin/false alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar -xvf alertmanager-0.27.0.linux-amd64.tar.gz
cd alertmanager-0.27.0.linux-amd64
sudo mv alertmanager /usr/local/bin/
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo mv alertmanager.yml /etc/alertmanager/
sudo chown -R alertmanager:alertmanager /etc/alertmanager/ /var/lib/alertmanager/Create a systemd service following the same pattern. Alertmanager runs on port 9093.
Note: The
/var/lib/alertmanagerstorage directory is mandatory. Alertmanager uses it to persist silences and notification state. If you wipe this directory, all your silences are gone.
Always-on "Watchdog" alert
This is a good practice. The idea is to create an alert that is always firing. You then wire it to something like Dead Man's Snitch or a similar service that pages you if the alert stops firing. That way, if your entire monitoring pipeline goes down, you'll know about it.
Create /etc/prometheus/rules/watchdog.yml:
groups:
- name: watchdog
rules:
- alert: DeadMansSwitch
expr: vector(1)
labels:
severity: none
annotations:
summary: "Alertmanager watchdog"The vector(1) expression always returns 1, so this alert is permanently in a firing state. That's intentional.
Slack integration
- Create a
#alertschannel in your Slack workspace - Go to api.slack.com/apps, create a new Slack app, enable incoming webhooks, and install it to your workspace
- Copy the webhook URL
Update /etc/alertmanager/alertmanager.yml:
global:
slack_api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
route:
receiver: 'slack-notifications'
routes:
- match:
severity: warning
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: trueThe send_resolved: true setting tells Alertmanager to also send a message when an alert resolves, not just when it fires. This is useful so you know when things go back to normal.
Wire Alertmanager into Prometheus
Add the following to /etc/prometheus/prometheus.yml:
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
rule_files:
- "/etc/prometheus/rules/*.yml"The rule_files directive tells Prometheus where to find alerting rules. The glob pattern *.yml picks up all YAML files in that directory, including the watchdog rule you created earlier.
Reload Prometheus:
promtool check config /etc/prometheus/prometheus.yml
curl -X POST -u admin:YOUR_PASSWORD http://localhost:9090/-/reloadTest the alert flow
To verify everything works end to end, create a test alert rule that fires when a metric exceeds a threshold, push a metric to Pushgateway that triggers it, and watch the Slack message come in. To resolve the alert, push a value below the threshold and wait for the next evaluation cycle.
Quick reference
| Service | Port | Config File |
|---|---|---|
| Prometheus | 9090 | /etc/prometheus/prometheus.yml |
| Node Exporter | 9100 | Flags only (no config file) |
| Pushgateway | 9091 | Flags only (no config file) |
| Alertmanager | 9093 | /etc/alertmanager/alertmanager.yml |
| Grafana | 3000 | /etc/grafana/grafana.ini |
Conclusion
Here's what we set up: Prometheus as the central metrics store, Node Exporter feeding it system metrics, Pushgateway for short-lived jobs, Grafana for dashboards, basic auth to lock down the Prometheus UI, and Alertmanager routing alerts to Slack.
A few things to keep in mind going forward:
- Always run
promtool check configbefore reloading Prometheus config - After editing systemd unit files, always run
systemctl daemon-reloadbefore restarting - Alertmanager's storage directory (
/var/lib/alertmanager) is mandatory. If you wipe it, all silences and notification state are gone - Basic auth reload requires passing credentials in the
curlrequest - If the Prometheus self-scrape target goes down after enabling auth, the scrape config is missing the
basic_authblock
Comments