Tools: Zero-Downtime Database Migrations: Patterns for Production PostgreSQL in 2026
Zero-Downtime Database Migrations: Patterns for Production PostgreSQL in 2026
The Core Problem
Pattern 1: Expand-Contract Migrations
Example: Renaming a Column
Pattern 2: Safe Column Operations
Adding a Column
Removing a Column
Changing a Column Type
Pattern 3: Online Index Creation
Replacing an Existing Index
Pattern 4: Safe Constraint Changes
Adding NOT NULL
Adding Foreign Keys
Pattern 5: Batched Backfills
Pattern 6: Migration Linting
The Migration Checklist
Real-World Timing
Tools That Help Database migrations are the scariest part of deployments. One bad ALTER TABLE can lock your production database for minutes, turning a routine release into an incident. Here's how to make schema changes without any downtime. PostgreSQL acquires locks during DDL operations. A simple ALTER TABLE users ADD COLUMN email_verified boolean NOT NULL DEFAULT false on a table with millions of rows will: On a table with 50M rows, this can take 30+ seconds. Every query hitting that table queues up, your connection pool saturates, and your app goes down. Split every breaking change into three phases: Wrong (causes downtime): Right (zero-downtime): Never drop columns in the same deploy as removing code references: Standard CREATE INDEX locks the table for writes. Always use CONCURRENTLY: Key caveats with CONCURRENTLY: Large UPDATE operations can cause issues: WAL bloat, replication lag, and lock contention. Always backfill in batches: Catch dangerous migrations before they reach production. Use squawk for PostgreSQL: Before every migration: On a 100M row table with PostgreSQL 15: The zero-downtime approach takes more deploys but zero seconds of user-facing downtime. Have you experienced a migration-related outage? What patterns does your team use for zero-downtime schema changes? Share in the comments — I'm collecting war stories for a follow-up post on migration disaster recovery. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
Phase 1 (Expand): Add new column/table, keep old one working
Phase 2 (Migrate): Backfill data, update app to write both
Phase 3 (Contract): Remove old column/table after verification
Phase 1 (Expand): Add new column/table, keep old one working
Phase 2 (Migrate): Backfill data, update app to write both
Phase 3 (Contract): Remove old column/table after verification
Phase 1 (Expand): Add new column/table, keep old one working
Phase 2 (Migrate): Backfill data, update app to write both
Phase 3 (Contract): Remove old column/table after verification
ALTER TABLE users RENAME COLUMN name TO full_name;
ALTER TABLE users RENAME COLUMN name TO full_name;
ALTER TABLE users RENAME COLUMN name TO full_name;
-- Migration 1: Expand
ALTER TABLE users ADD COLUMN full_name text; -- Migration 2: Backfill (in batches)
UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 0 AND 10000;
UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 10001 AND 20000;
-- ... continue in batches -- Migration 3: Add trigger to keep columns in sync
CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$
BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name := NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name := NEW.full_name; END IF; RETURN NEW;
END;
$$ LANGUAGE plpgsql; CREATE TRIGGER sync_user_name_trigger
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Deploy app code that reads/writes full_name instead of name
-- Verify for a few days -- Migration 4: Contract (after app fully migrated)
DROP TRIGGER sync_user_name_trigger ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN name;
-- Migration 1: Expand
ALTER TABLE users ADD COLUMN full_name text; -- Migration 2: Backfill (in batches)
UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 0 AND 10000;
UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 10001 AND 20000;
-- ... continue in batches -- Migration 3: Add trigger to keep columns in sync
CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$
BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name := NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name := NEW.full_name; END IF; RETURN NEW;
END;
$$ LANGUAGE plpgsql; CREATE TRIGGER sync_user_name_trigger
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Deploy app code that reads/writes full_name instead of name
-- Verify for a few days -- Migration 4: Contract (after app fully migrated)
DROP TRIGGER sync_user_name_trigger ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN name;
-- Migration 1: Expand
ALTER TABLE users ADD COLUMN full_name text; -- Migration 2: Backfill (in batches)
UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 0 AND 10000;
UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 10001 AND 20000;
-- ... continue in batches -- Migration 3: Add trigger to keep columns in sync
CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$
BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name := NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name := NEW.full_name; END IF; RETURN NEW;
END;
$$ LANGUAGE plpgsql; CREATE TRIGGER sync_user_name_trigger
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Deploy app code that reads/writes full_name instead of name
-- Verify for a few days -- Migration 4: Contract (after app fully migrated)
DROP TRIGGER sync_user_name_trigger ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN name;
-- Safe on PostgreSQL 11+: volatile default doesn't rewrite table
ALTER TABLE orders ADD COLUMN status text DEFAULT 'pending'; -- For NOT NULL, add in two steps:
ALTER TABLE orders ADD COLUMN priority integer;
-- Backfill in batches...
ALTER TABLE orders ALTER COLUMN priority SET NOT NULL;
-- The SET NOT NULL requires a full table scan but NOT a rewrite
-- Safe on PostgreSQL 11+: volatile default doesn't rewrite table
ALTER TABLE orders ADD COLUMN status text DEFAULT 'pending'; -- For NOT NULL, add in two steps:
ALTER TABLE orders ADD COLUMN priority integer;
-- Backfill in batches...
ALTER TABLE orders ALTER COLUMN priority SET NOT NULL;
-- The SET NOT NULL requires a full table scan but NOT a rewrite
-- Safe on PostgreSQL 11+: volatile default doesn't rewrite table
ALTER TABLE orders ADD COLUMN status text DEFAULT 'pending'; -- For NOT NULL, add in two steps:
ALTER TABLE orders ADD COLUMN priority integer;
-- Backfill in batches...
ALTER TABLE orders ALTER COLUMN priority SET NOT NULL;
-- The SET NOT NULL requires a full table scan but NOT a rewrite
-- Step 1: Deploy app code that stops reading/writing the column
-- Step 2: Wait for all old app instances to drain
-- Step 3: Drop the column
ALTER TABLE users DROP COLUMN legacy_field;
-- Step 1: Deploy app code that stops reading/writing the column
-- Step 2: Wait for all old app instances to drain
-- Step 3: Drop the column
ALTER TABLE users DROP COLUMN legacy_field;
-- Step 1: Deploy app code that stops reading/writing the column
-- Step 2: Wait for all old app instances to drain
-- Step 3: Drop the column
ALTER TABLE users DROP COLUMN legacy_field;
-- Wrong: rewrites entire table
ALTER TABLE events ALTER COLUMN payload TYPE jsonb USING payload::jsonb; -- Right: create new column, backfill, swap
ALTER TABLE events ADD COLUMN payload_jsonb jsonb; -- Backfill in batches
DO $$
DECLARE batch_size integer := 5000; max_id bigint; current_id bigint := 0;
BEGIN SELECT max(id) INTO max_id FROM events; WHILE current_id <= max_id LOOP UPDATE events SET payload_jsonb = payload::jsonb WHERE id > current_id AND id <= current_id + batch_size AND payload_jsonb IS NULL; current_id := current_id + batch_size; COMMIT; PERFORM pg_sleep(0.1); -- Brief pause to reduce load END LOOP;
END $$; -- Then swap in app code, then drop old column
-- Wrong: rewrites entire table
ALTER TABLE events ALTER COLUMN payload TYPE jsonb USING payload::jsonb; -- Right: create new column, backfill, swap
ALTER TABLE events ADD COLUMN payload_jsonb jsonb; -- Backfill in batches
DO $$
DECLARE batch_size integer := 5000; max_id bigint; current_id bigint := 0;
BEGIN SELECT max(id) INTO max_id FROM events; WHILE current_id <= max_id LOOP UPDATE events SET payload_jsonb = payload::jsonb WHERE id > current_id AND id <= current_id + batch_size AND payload_jsonb IS NULL; current_id := current_id + batch_size; COMMIT; PERFORM pg_sleep(0.1); -- Brief pause to reduce load END LOOP;
END $$; -- Then swap in app code, then drop old column
-- Wrong: rewrites entire table
ALTER TABLE events ALTER COLUMN payload TYPE jsonb USING payload::jsonb; -- Right: create new column, backfill, swap
ALTER TABLE events ADD COLUMN payload_jsonb jsonb; -- Backfill in batches
DO $$
DECLARE batch_size integer := 5000; max_id bigint; current_id bigint := 0;
BEGIN SELECT max(id) INTO max_id FROM events; WHILE current_id <= max_id LOOP UPDATE events SET payload_jsonb = payload::jsonb WHERE id > current_id AND id <= current_id + batch_size AND payload_jsonb IS NULL; current_id := current_id + batch_size; COMMIT; PERFORM pg_sleep(0.1); -- Brief pause to reduce load END LOOP;
END $$; -- Then swap in app code, then drop old column
-- Wrong: blocks writes
CREATE INDEX idx_orders_customer ON orders (customer_id); -- Right: doesn't block writes
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);
-- Wrong: blocks writes
CREATE INDEX idx_orders_customer ON orders (customer_id); -- Right: doesn't block writes
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);
-- Wrong: blocks writes
CREATE INDEX idx_orders_customer ON orders (customer_id); -- Right: doesn't block writes
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);
-- Check for invalid indexes after CONCURRENTLY
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE NOT indisvalid; -- If invalid, drop and retry
DROP INDEX CONCURRENTLY idx_orders_customer;
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);
-- Check for invalid indexes after CONCURRENTLY
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE NOT indisvalid; -- If invalid, drop and retry
DROP INDEX CONCURRENTLY idx_orders_customer;
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);
-- Check for invalid indexes after CONCURRENTLY
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE NOT indisvalid; -- If invalid, drop and retry
DROP INDEX CONCURRENTLY idx_orders_customer;
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);
-- Create new index concurrently
CREATE INDEX CONCURRENTLY idx_orders_customer_v2 ON orders (customer_id, created_at); -- Swap (instant, just changes catalog)
-- Option A: Drop old, rename new
DROP INDEX CONCURRENTLY idx_orders_customer;
ALTER INDEX idx_orders_customer_v2 RENAME TO idx_orders_customer; -- Option B: For unique constraints, use ALTER TABLE
ALTER TABLE orders DROP CONSTRAINT orders_email_key, ADD CONSTRAINT orders_email_key UNIQUE USING INDEX idx_orders_email_v2;
-- Create new index concurrently
CREATE INDEX CONCURRENTLY idx_orders_customer_v2 ON orders (customer_id, created_at); -- Swap (instant, just changes catalog)
-- Option A: Drop old, rename new
DROP INDEX CONCURRENTLY idx_orders_customer;
ALTER INDEX idx_orders_customer_v2 RENAME TO idx_orders_customer; -- Option B: For unique constraints, use ALTER TABLE
ALTER TABLE orders DROP CONSTRAINT orders_email_key, ADD CONSTRAINT orders_email_key UNIQUE USING INDEX idx_orders_email_v2;
-- Create new index concurrently
CREATE INDEX CONCURRENTLY idx_orders_customer_v2 ON orders (customer_id, created_at); -- Swap (instant, just changes catalog)
-- Option A: Drop old, rename new
DROP INDEX CONCURRENTLY idx_orders_customer;
ALTER INDEX idx_orders_customer_v2 RENAME TO idx_orders_customer; -- Option B: For unique constraints, use ALTER TABLE
ALTER TABLE orders DROP CONSTRAINT orders_email_key, ADD CONSTRAINT orders_email_key UNIQUE USING INDEX idx_orders_email_v2;
-- Wrong: full table scan while holding ACCESS EXCLUSIVE lock
ALTER TABLE users ALTER COLUMN email SET NOT NULL; -- Better: add CHECK constraint first (validated in background)
ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID; -- Then validate (only takes ShareUpdateExclusiveLock, doesn't block writes)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null; -- PostgreSQL 12+ can then cheaply promote to NOT NULL
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT users_email_not_null;
-- Wrong: full table scan while holding ACCESS EXCLUSIVE lock
ALTER TABLE users ALTER COLUMN email SET NOT NULL; -- Better: add CHECK constraint first (validated in background)
ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID; -- Then validate (only takes ShareUpdateExclusiveLock, doesn't block writes)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null; -- PostgreSQL 12+ can then cheaply promote to NOT NULL
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT users_email_not_null;
-- Wrong: full table scan while holding ACCESS EXCLUSIVE lock
ALTER TABLE users ALTER COLUMN email SET NOT NULL; -- Better: add CHECK constraint first (validated in background)
ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID; -- Then validate (only takes ShareUpdateExclusiveLock, doesn't block writes)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null; -- PostgreSQL 12+ can then cheaply promote to NOT NULL
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT users_email_not_null;
-- Wrong: validates all existing rows while holding lock
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id); -- Right: add unvalidated, then validate separately
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID; ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_customer;
-- Wrong: validates all existing rows while holding lock
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id); -- Right: add unvalidated, then validate separately
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID; ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_customer;
-- Wrong: validates all existing rows while holding lock
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id); -- Right: add unvalidated, then validate separately
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID; ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_customer;
# backfill.py — Safe batched backfill
import psycopg
import time BATCH_SIZE = 5000
SLEEP_BETWEEN_BATCHES = 0.1 # seconds def backfill_column(conn_string: str, table: str, set_clause: str, where_clause: str): with psycopg.connect(conn_string) as conn: conn.autocommit = True # Each batch is its own transaction # Get total count with conn.execute(f"SELECT count(*) FROM {table} WHERE {where_clause}") as cur: total = cur.fetchone()[0] processed = 0 while True: with conn.execute(f""" WITH batch AS ( SELECT ctid FROM {table} WHERE {where_clause} LIMIT {BATCH_SIZE} FOR UPDATE SKIP LOCKED ) UPDATE {table} SET {set_clause} WHERE ctid IN (SELECT ctid FROM batch) """) as cur: affected = cur.rowcount if affected == 0: break processed += affected print(f"Progress: {processed}/{total} ({processed*100//total}%)") time.sleep(SLEEP_BETWEEN_BATCHES) print(f"Backfill complete: {processed} rows updated") # Usage
backfill_column( "postgresql://localhost/mydb", "users", "email_normalized = lower(email)", "email_normalized IS NULL"
)
# backfill.py — Safe batched backfill
import psycopg
import time BATCH_SIZE = 5000
SLEEP_BETWEEN_BATCHES = 0.1 # seconds def backfill_column(conn_string: str, table: str, set_clause: str, where_clause: str): with psycopg.connect(conn_string) as conn: conn.autocommit = True # Each batch is its own transaction # Get total count with conn.execute(f"SELECT count(*) FROM {table} WHERE {where_clause}") as cur: total = cur.fetchone()[0] processed = 0 while True: with conn.execute(f""" WITH batch AS ( SELECT ctid FROM {table} WHERE {where_clause} LIMIT {BATCH_SIZE} FOR UPDATE SKIP LOCKED ) UPDATE {table} SET {set_clause} WHERE ctid IN (SELECT ctid FROM batch) """) as cur: affected = cur.rowcount if affected == 0: break processed += affected print(f"Progress: {processed}/{total} ({processed*100//total}%)") time.sleep(SLEEP_BETWEEN_BATCHES) print(f"Backfill complete: {processed} rows updated") # Usage
backfill_column( "postgresql://localhost/mydb", "users", "email_normalized = lower(email)", "email_normalized IS NULL"
)
# backfill.py — Safe batched backfill
import psycopg
import time BATCH_SIZE = 5000
SLEEP_BETWEEN_BATCHES = 0.1 # seconds def backfill_column(conn_string: str, table: str, set_clause: str, where_clause: str): with psycopg.connect(conn_string) as conn: conn.autocommit = True # Each batch is its own transaction # Get total count with conn.execute(f"SELECT count(*) FROM {table} WHERE {where_clause}") as cur: total = cur.fetchone()[0] processed = 0 while True: with conn.execute(f""" WITH batch AS ( SELECT ctid FROM {table} WHERE {where_clause} LIMIT {BATCH_SIZE} FOR UPDATE SKIP LOCKED ) UPDATE {table} SET {set_clause} WHERE ctid IN (SELECT ctid FROM batch) """) as cur: affected = cur.rowcount if affected == 0: break processed += affected print(f"Progress: {processed}/{total} ({processed*100//total}%)") time.sleep(SLEEP_BETWEEN_BATCHES) print(f"Backfill complete: {processed} rows updated") # Usage
backfill_column( "postgresql://localhost/mydb", "users", "email_normalized = lower(email)", "email_normalized IS NULL"
)
# .github/workflows/migration-lint.yml
name: Migration Safety Check
on: pull_request: paths: - 'migrations/**' jobs: lint-migrations: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install squawk run: npm install -g squawk-cli - name: Lint new migrations run: | # Find new migration files MIGRATIONS=$(git diff --name-only origin/main...HEAD -- 'migrations/*.sql') if [ -n "$MIGRATIONS" ]; then echo "$MIGRATIONS" | xargs squawk fi
# .github/workflows/migration-lint.yml
name: Migration Safety Check
on: pull_request: paths: - 'migrations/**' jobs: lint-migrations: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install squawk run: npm install -g squawk-cli - name: Lint new migrations run: | # Find new migration files MIGRATIONS=$(git diff --name-only origin/main...HEAD -- 'migrations/*.sql') if [ -n "$MIGRATIONS" ]; then echo "$MIGRATIONS" | xargs squawk fi
# .github/workflows/migration-lint.yml
name: Migration Safety Check
on: pull_request: paths: - 'migrations/**' jobs: lint-migrations: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install squawk run: npm install -g squawk-cli - name: Lint new migrations run: | # Find new migration files MIGRATIONS=$(git diff --name-only origin/main...HEAD -- 'migrations/*.sql') if [ -n "$MIGRATIONS" ]; then echo "$MIGRATIONS" | xargs squawk fi - Acquire an ACCESS EXCLUSIVE lock on the table
- Rewrite the entire table (for NOT NULL with DEFAULT on PG < 11)
- Block ALL reads and writes until complete - Cannot run inside a transaction
- Takes longer (two table scans instead of one)
- Can fail and leave an invalid index — always check: - ALTER TABLE ... ADD COLUMN ... DEFAULT (unsafe before PG 11)
- CREATE INDEX without CONCURRENTLY
- ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY without NOT VALID
- Column type changes that rewrite tables
- NOT NULL additions without CHECK constraint pattern - Will it acquire an ACCESS EXCLUSIVE lock? If yes, restructure.
- Does it rewrite the table? Check with ALTER TABLE ... SET (fillfactor = 100) trick or check the PG docs.
- Are indexes created CONCURRENTLY?
- Are constraints added NOT VALID first?
- Is the backfill batched with sleep intervals?
- Can you roll back? Every expand migration should be reversible without data loss.
- Is the app compatible with both old and new schema? The expand phase MUST work with the current app code. - squawk: Migration linter for PostgreSQL
- pgroll: Automated zero-downtime migrations from Xata
- reshape: Schema migration tool with automatic expand-contract
- pg_repack: Repacks tables without ACCESS EXCLUSIVE locks
- Flyway/Liquibase: Migration runners (use with the patterns above)