May 22, 2024

SQL Joins for DevOps Engineers

As a DevOps engineer, SQL shows up more than you might expect. When you are on-call and a database-backed service is behaving strangely, you need to query RDS or Aurora directly. When you are building dashboards or alerts from relational data (CloudTrail logs, access logs stored in PostgreSQL, deployment records in a CI database), you need joins. When a developer asks why their migration is slow, you need to understand the query plan.

This is a practical reference for the six SQL join types, framed around the questions infrastructure engineers actually ask: which services are missing from a health check table, which deployments happened without a matching approval record, which IAM users exist in the database but not in the active directory export.

Primary Keys and Foreign Keys

Two concepts underpin everything in a relational database.

A primary key is the column (or set of columns) that uniquely identifies a row. Usually an id column. Every row has exactly one.
A foreign key is a column in one table that references the primary key of another table. It is how two tables express a relationship.

A real operational example: a deployments table with a deployment_id primary key, and an approvals table with a deployment_id foreign key. Joining on that column lets you answer: which deployments have an approval record, and which do not?

A join technically only requires two columns of compatible data types, one from each table. They do not have to be formally declared as primary or foreign keys.

The 6 Types of SQL Joins

1. INNER JOIN

Returns: Only rows where both tables have a matching value.

Operational use: Find all deployments that have a matching approval. Any deployment without an approval is excluded from the result.

SELECT deployments.id, deployments.service, approvals.approved_by
FROM deployments
INNER JOIN approvals ON deployments.id = approvals.deployment_id;

Use INNER JOIN when you only care about records that exist in both tables. If a row in one table has no match in the other, it disappears from the output. This is the most common join and the default when you write JOIN without a qualifier.

Watch for non-unique join columns. If approvals has multiple rows per deployment_id, the join will produce one output row per match, which can inflate your row count unexpectedly.

2. LEFT JOIN

Returns: All rows from the left table, plus matched rows from the right. Unmatched right-side columns are NULL.

Operational use: Find all services in the service registry, including those with no recent health check record.

SELECT services.name, health_checks.last_checked_at, health_checks.status
FROM services
LEFT JOIN health_checks ON services.id = health_checks.service_id;

Services with no entry in health_checks will appear with NULL in the last two columns. This makes it easy to spot gaps: services that exist but have never been checked, or that dropped out of monitoring.

LEFT JOIN is the second most common join and is useful any time you want to preserve the full left-side dataset and optionally enrich it with data that may or may not exist on the right.

3. RIGHT JOIN

Returns: All rows from the right table, plus matched rows from the left. Unmatched left-side columns are NULL.

Operational use: Find all alert rules, including those that have no matching service definition.

SELECT services.name, alert_rules.rule_name
FROM services
RIGHT JOIN alert_rules ON services.id = alert_rules.service_id;

In practice, RIGHT JOIN is rarely written. Any right join can be expressed as a left join by swapping the table order, and left joins are easier to read because the "main" table is always on the left. Most teams standardise on LEFT JOIN and avoid RIGHT JOIN entirely.

4. FULL OUTER JOIN

Returns: All rows from both tables. Rows with no match get NULL on the side that has no data.

Operational use: Compare two environment configuration tables (production and staging) to find settings that exist in one but not the other, or that exist in both but with different values.

SELECT
  prod.key,
  prod.value AS prod_value,
  staging.value AS staging_value
FROM prod_config prod
FULL OUTER JOIN staging_config staging ON prod.key = staging.key
WHERE prod.value IS DISTINCT FROM staging.value
   OR prod.key IS NULL
   OR staging.key IS NULL;

This is useful for drift detection: keys that are in production but missing from staging, keys that are in staging but not yet in production, and keys that exist in both but have different values. The WHERE clause filters to only the mismatches, which is typically what you want in an audit query.

5. UNION and UNION ALL

Returns: Rows from both queries stacked vertically. Columns must be compatible types.

Operational use: Combine access log records from two separate tables (for example, after a log table was partitioned or renamed) into a single result set for analysis.

-- Deduplicated: removes rows that appear in both queries
SELECT timestamp, user_id, action FROM access_logs_2024
UNION
SELECT timestamp, user_id, action FROM access_logs_2025;

-- All rows including duplicates: faster, use when duplicates are expected
SELECT timestamp, user_id, action FROM access_logs_2024
UNION ALL
SELECT timestamp, user_id, action FROM access_logs_2025;

UNION removes duplicates by running a sort and deduplication pass, which adds cost. Use UNION ALL when you know duplicates are not possible or acceptable, as it is significantly faster on large tables.

UNION is not technically a join because it does not combine columns side-by-side. It combines rows from two queries that have the same column structure.

6. CROSS JOIN

Returns: Every possible combination of rows from both tables (a Cartesian product).

Operational use: Generate a test matrix of all environment and region combinations to seed a test configuration table.

SELECT environments.name AS env, regions.name AS region
FROM environments
CROSS JOIN regions;

If environments has 3 rows (dev, staging, prod) and regions has 4 rows (us-east-1, eu-west-1, ap-southeast-1, sa-east-1), the result has 12 rows, one for every combination. Use with care: the result size is rows_in_table_A × rows_in_table_B, which grows fast on large tables.

Quick Reference

Join Type	Returns	Common use in operations
`INNER JOIN`	Matching rows only	Records that exist in both tables
`LEFT JOIN`	All left rows + matches	Find gaps: records missing from the right table
`RIGHT JOIN`	All right rows + matches	Inverse of LEFT JOIN (rarely used)
`FULL OUTER JOIN`	All rows from both	Drift detection, config comparison
`UNION`	Stacked rows (deduplicated)	Combine data from multiple tables with same schema
`UNION ALL`	Stacked rows (with duplicates)	Same as UNION but faster when deduplication is not needed
`CROSS JOIN`	Every row × every row	Test matrix generation, seeding combination tables

A Note on Performance

When working with large tables (audit logs, metrics tables, deployment histories), the choice of join type matters for query performance. A few things to keep in mind:

Always join on indexed columns. Joining on unindexed columns causes a full table scan on every match.
INNER JOIN generally performs best because it narrows the result set early.
FULL OUTER JOIN is the most expensive because it must process all rows from both tables.
UNION adds a deduplication pass on top of the two queries. If you do not need deduplication, always use UNION ALL.
On RDS or Aurora, use EXPLAIN ANALYZE before running expensive joins on production data to see the query plan and estimated cost.

EXPLAIN ANALYZE
SELECT deployments.id, approvals.approved_by
FROM deployments
LEFT JOIN approvals ON deployments.id = approvals.deployment_id
WHERE deployments.created_at > NOW() - INTERVAL '7 days';

This habit of checking query plans before running ad-hoc queries on a production database is worth building early.