Tools: Zero-Downtime Database Migrations: Patterns for Production PostgreSQL in 2026

Tools: Zero-Downtime Database Migrations: Patterns for Production PostgreSQL in 2026

Zero-Downtime Database Migrations: Patterns for Production PostgreSQL in 2026

The Core Problem

Pattern 1: Expand-Contract Migrations

Example: Renaming a Column

Pattern 2: Safe Column Operations

Adding a Column

Removing a Column

Changing a Column Type

Pattern 3: Online Index Creation

Replacing an Existing Index

Pattern 4: Safe Constraint Changes

Adding NOT NULL

Adding Foreign Keys

Pattern 5: Batched Backfills

Pattern 6: Migration Linting

The Migration Checklist

Real-World Timing

Tools That Help Database migrations are the scariest part of deployments. One bad ALTER TABLE can lock your production database for minutes, turning a routine release into an incident. Here's how to make schema changes without any downtime. PostgreSQL acquires locks during DDL operations. A simple ALTER TABLE users ADD COLUMN email_verified boolean NOT NULL DEFAULT false on a table with millions of rows will: On a table with 50M rows, this can take 30+ seconds. Every query hitting that table queues up, your connection pool saturates, and your app goes down. Split every breaking change into three phases: Wrong (causes downtime): Right (zero-downtime): Never drop columns in the same deploy as removing code references: Standard CREATE INDEX locks the table for writes. Always use CONCURRENTLY: Key caveats with CONCURRENTLY: Large UPDATE operations can cause issues: WAL bloat, replication lag, and lock contention. Always backfill in batches: Catch dangerous migrations before they reach production. Use squawk for PostgreSQL: Before every migration: On a 100M row table with PostgreSQL 15: The zero-downtime approach takes more deploys but zero seconds of user-facing downtime. Have you experienced a migration-related outage? What patterns does your team use for zero-downtime schema changes? Share in the comments — I'm collecting war stories for a follow-up post on migration disaster recovery. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

Phase 1 (Expand): Add new column/table, keep old one working Phase 2 (Migrate): Backfill data, update app to write both Phase 3 (Contract): Remove old column/table after verification Phase 1 (Expand): Add new column/table, keep old one working Phase 2 (Migrate): Backfill data, update app to write both Phase 3 (Contract): Remove old column/table after verification Phase 1 (Expand): Add new column/table, keep old one working Phase 2 (Migrate): Backfill data, update app to write both Phase 3 (Contract): Remove old column/table after verification ALTER TABLE users RENAME COLUMN name TO full_name; ALTER TABLE users RENAME COLUMN name TO full_name; ALTER TABLE users RENAME COLUMN name TO full_name; -- Migration 1: Expand ALTER TABLE users ADD COLUMN full_name text; -- Migration 2: Backfill (in batches) UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 0 AND 10000; UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 10001 AND 20000; -- ... continue in batches -- Migration 3: Add trigger to keep columns in sync CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$ BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name := NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name := NEW.full_name; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER sync_user_name_trigger BEFORE UPDATE ON users FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Deploy app code that reads/writes full_name instead of name -- Verify for a few days -- Migration 4: Contract (after app fully migrated) DROP TRIGGER sync_user_name_trigger ON users; DROP FUNCTION sync_user_name(); ALTER TABLE users DROP COLUMN name; -- Migration 1: Expand ALTER TABLE users ADD COLUMN full_name text; -- Migration 2: Backfill (in batches) UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 0 AND 10000; UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 10001 AND 20000; -- ... continue in batches -- Migration 3: Add trigger to keep columns in sync CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$ BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name := NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name := NEW.full_name; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER sync_user_name_trigger BEFORE UPDATE ON users FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Deploy app code that reads/writes full_name instead of name -- Verify for a few days -- Migration 4: Contract (after app fully migrated) DROP TRIGGER sync_user_name_trigger ON users; DROP FUNCTION sync_user_name(); ALTER TABLE users DROP COLUMN name; -- Migration 1: Expand ALTER TABLE users ADD COLUMN full_name text; -- Migration 2: Backfill (in batches) UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 0 AND 10000; UPDATE users SET full_name = name WHERE full_name IS NULL AND id BETWEEN 10001 AND 20000; -- ... continue in batches -- Migration 3: Add trigger to keep columns in sync CREATE OR REPLACE FUNCTION sync_user_name() RETURNS trigger AS $$ BEGIN IF NEW.name IS DISTINCT FROM OLD.name THEN NEW.full_name := NEW.name; END IF; IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN NEW.name := NEW.full_name; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER sync_user_name_trigger BEFORE UPDATE ON users FOR EACH ROW EXECUTE FUNCTION sync_user_name(); -- Deploy app code that reads/writes full_name instead of name -- Verify for a few days -- Migration 4: Contract (after app fully migrated) DROP TRIGGER sync_user_name_trigger ON users; DROP FUNCTION sync_user_name(); ALTER TABLE users DROP COLUMN name; -- Safe on PostgreSQL 11+: volatile default doesn't rewrite table ALTER TABLE orders ADD COLUMN status text DEFAULT 'pending'; -- For NOT NULL, add in two steps: ALTER TABLE orders ADD COLUMN priority integer; -- Backfill in batches... ALTER TABLE orders ALTER COLUMN priority SET NOT NULL; -- The SET NOT NULL requires a full table scan but NOT a rewrite -- Safe on PostgreSQL 11+: volatile default doesn't rewrite table ALTER TABLE orders ADD COLUMN status text DEFAULT 'pending'; -- For NOT NULL, add in two steps: ALTER TABLE orders ADD COLUMN priority integer; -- Backfill in batches... ALTER TABLE orders ALTER COLUMN priority SET NOT NULL; -- The SET NOT NULL requires a full table scan but NOT a rewrite -- Safe on PostgreSQL 11+: volatile default doesn't rewrite table ALTER TABLE orders ADD COLUMN status text DEFAULT 'pending'; -- For NOT NULL, add in two steps: ALTER TABLE orders ADD COLUMN priority integer; -- Backfill in batches... ALTER TABLE orders ALTER COLUMN priority SET NOT NULL; -- The SET NOT NULL requires a full table scan but NOT a rewrite -- Step 1: Deploy app code that stops reading/writing the column -- Step 2: Wait for all old app instances to drain -- Step 3: Drop the column ALTER TABLE users DROP COLUMN legacy_field; -- Step 1: Deploy app code that stops reading/writing the column -- Step 2: Wait for all old app instances to drain -- Step 3: Drop the column ALTER TABLE users DROP COLUMN legacy_field; -- Step 1: Deploy app code that stops reading/writing the column -- Step 2: Wait for all old app instances to drain -- Step 3: Drop the column ALTER TABLE users DROP COLUMN legacy_field; -- Wrong: rewrites entire table ALTER TABLE events ALTER COLUMN payload TYPE jsonb USING payload::jsonb; -- Right: create new column, backfill, swap ALTER TABLE events ADD COLUMN payload_jsonb jsonb; -- Backfill in batches DO $$ DECLARE batch_size integer := 5000; max_id bigint; current_id bigint := 0; BEGIN SELECT max(id) INTO max_id FROM events; WHILE current_id <= max_id LOOP UPDATE events SET payload_jsonb = payload::jsonb WHERE id > current_id AND id <= current_id + batch_size AND payload_jsonb IS NULL; current_id := current_id + batch_size; COMMIT; PERFORM pg_sleep(0.1); -- Brief pause to reduce load END LOOP; END $$; -- Then swap in app code, then drop old column -- Wrong: rewrites entire table ALTER TABLE events ALTER COLUMN payload TYPE jsonb USING payload::jsonb; -- Right: create new column, backfill, swap ALTER TABLE events ADD COLUMN payload_jsonb jsonb; -- Backfill in batches DO $$ DECLARE batch_size integer := 5000; max_id bigint; current_id bigint := 0; BEGIN SELECT max(id) INTO max_id FROM events; WHILE current_id <= max_id LOOP UPDATE events SET payload_jsonb = payload::jsonb WHERE id > current_id AND id <= current_id + batch_size AND payload_jsonb IS NULL; current_id := current_id + batch_size; COMMIT; PERFORM pg_sleep(0.1); -- Brief pause to reduce load END LOOP; END $$; -- Then swap in app code, then drop old column -- Wrong: rewrites entire table ALTER TABLE events ALTER COLUMN payload TYPE jsonb USING payload::jsonb; -- Right: create new column, backfill, swap ALTER TABLE events ADD COLUMN payload_jsonb jsonb; -- Backfill in batches DO $$ DECLARE batch_size integer := 5000; max_id bigint; current_id bigint := 0; BEGIN SELECT max(id) INTO max_id FROM events; WHILE current_id <= max_id LOOP UPDATE events SET payload_jsonb = payload::jsonb WHERE id > current_id AND id <= current_id + batch_size AND payload_jsonb IS NULL; current_id := current_id + batch_size; COMMIT; PERFORM pg_sleep(0.1); -- Brief pause to reduce load END LOOP; END $$; -- Then swap in app code, then drop old column -- Wrong: blocks writes CREATE INDEX idx_orders_customer ON orders (customer_id); -- Right: doesn't block writes CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id); -- Wrong: blocks writes CREATE INDEX idx_orders_customer ON orders (customer_id); -- Right: doesn't block writes CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id); -- Wrong: blocks writes CREATE INDEX idx_orders_customer ON orders (customer_id); -- Right: doesn't block writes CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id); -- Check for invalid indexes after CONCURRENTLY SELECT indexrelid::regclass, indisvalid FROM pg_index WHERE NOT indisvalid; -- If invalid, drop and retry DROP INDEX CONCURRENTLY idx_orders_customer; CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id); -- Check for invalid indexes after CONCURRENTLY SELECT indexrelid::regclass, indisvalid FROM pg_index WHERE NOT indisvalid; -- If invalid, drop and retry DROP INDEX CONCURRENTLY idx_orders_customer; CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id); -- Check for invalid indexes after CONCURRENTLY SELECT indexrelid::regclass, indisvalid FROM pg_index WHERE NOT indisvalid; -- If invalid, drop and retry DROP INDEX CONCURRENTLY idx_orders_customer; CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id); -- Create new index concurrently CREATE INDEX CONCURRENTLY idx_orders_customer_v2 ON orders (customer_id, created_at); -- Swap (instant, just changes catalog) -- Option A: Drop old, rename new DROP INDEX CONCURRENTLY idx_orders_customer; ALTER INDEX idx_orders_customer_v2 RENAME TO idx_orders_customer; -- Option B: For unique constraints, use ALTER TABLE ALTER TABLE orders DROP CONSTRAINT orders_email_key, ADD CONSTRAINT orders_email_key UNIQUE USING INDEX idx_orders_email_v2; -- Create new index concurrently CREATE INDEX CONCURRENTLY idx_orders_customer_v2 ON orders (customer_id, created_at); -- Swap (instant, just changes catalog) -- Option A: Drop old, rename new DROP INDEX CONCURRENTLY idx_orders_customer; ALTER INDEX idx_orders_customer_v2 RENAME TO idx_orders_customer; -- Option B: For unique constraints, use ALTER TABLE ALTER TABLE orders DROP CONSTRAINT orders_email_key, ADD CONSTRAINT orders_email_key UNIQUE USING INDEX idx_orders_email_v2; -- Create new index concurrently CREATE INDEX CONCURRENTLY idx_orders_customer_v2 ON orders (customer_id, created_at); -- Swap (instant, just changes catalog) -- Option A: Drop old, rename new DROP INDEX CONCURRENTLY idx_orders_customer; ALTER INDEX idx_orders_customer_v2 RENAME TO idx_orders_customer; -- Option B: For unique constraints, use ALTER TABLE ALTER TABLE orders DROP CONSTRAINT orders_email_key, ADD CONSTRAINT orders_email_key UNIQUE USING INDEX idx_orders_email_v2; -- Wrong: full table scan while holding ACCESS EXCLUSIVE lock ALTER TABLE users ALTER COLUMN email SET NOT NULL; -- Better: add CHECK constraint first (validated in background) ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID; -- Then validate (only takes ShareUpdateExclusiveLock, doesn't block writes) ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null; -- PostgreSQL 12+ can then cheaply promote to NOT NULL ALTER TABLE users ALTER COLUMN email SET NOT NULL; ALTER TABLE users DROP CONSTRAINT users_email_not_null; -- Wrong: full table scan while holding ACCESS EXCLUSIVE lock ALTER TABLE users ALTER COLUMN email SET NOT NULL; -- Better: add CHECK constraint first (validated in background) ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID; -- Then validate (only takes ShareUpdateExclusiveLock, doesn't block writes) ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null; -- PostgreSQL 12+ can then cheaply promote to NOT NULL ALTER TABLE users ALTER COLUMN email SET NOT NULL; ALTER TABLE users DROP CONSTRAINT users_email_not_null; -- Wrong: full table scan while holding ACCESS EXCLUSIVE lock ALTER TABLE users ALTER COLUMN email SET NOT NULL; -- Better: add CHECK constraint first (validated in background) ALTER TABLE users ADD CONSTRAINT users_email_not_null CHECK (email IS NOT NULL) NOT VALID; -- Then validate (only takes ShareUpdateExclusiveLock, doesn't block writes) ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null; -- PostgreSQL 12+ can then cheaply promote to NOT NULL ALTER TABLE users ALTER COLUMN email SET NOT NULL; ALTER TABLE users DROP CONSTRAINT users_email_not_null; -- Wrong: validates all existing rows while holding lock ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id); -- Right: add unvalidated, then validate separately ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID; ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_customer; -- Wrong: validates all existing rows while holding lock ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id); -- Right: add unvalidated, then validate separately ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID; ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_customer; -- Wrong: validates all existing rows while holding lock ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id); -- Right: add unvalidated, then validate separately ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id) NOT VALID; ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_customer; # backfill.py — Safe batched backfill import psycopg import time BATCH_SIZE = 5000 SLEEP_BETWEEN_BATCHES = 0.1 # seconds def backfill_column(conn_string: str, table: str, set_clause: str, where_clause: str): with psycopg.connect(conn_string) as conn: conn.autocommit = True # Each batch is its own transaction # Get total count with conn.execute(f"SELECT count(*) FROM {table} WHERE {where_clause}") as cur: total = cur.fetchone()[0] processed = 0 while True: with conn.execute(f""" WITH batch AS ( SELECT ctid FROM {table} WHERE {where_clause} LIMIT {BATCH_SIZE} FOR UPDATE SKIP LOCKED ) UPDATE {table} SET {set_clause} WHERE ctid IN (SELECT ctid FROM batch) """) as cur: affected = cur.rowcount if affected == 0: break processed += affected print(f"Progress: {processed}/{total} ({processed*100//total}%)") time.sleep(SLEEP_BETWEEN_BATCHES) print(f"Backfill complete: {processed} rows updated") # Usage backfill_column( "postgresql://localhost/mydb", "users", "email_normalized = lower(email)", "email_normalized IS NULL" ) # backfill.py — Safe batched backfill import psycopg import time BATCH_SIZE = 5000 SLEEP_BETWEEN_BATCHES = 0.1 # seconds def backfill_column(conn_string: str, table: str, set_clause: str, where_clause: str): with psycopg.connect(conn_string) as conn: conn.autocommit = True # Each batch is its own transaction # Get total count with conn.execute(f"SELECT count(*) FROM {table} WHERE {where_clause}") as cur: total = cur.fetchone()[0] processed = 0 while True: with conn.execute(f""" WITH batch AS ( SELECT ctid FROM {table} WHERE {where_clause} LIMIT {BATCH_SIZE} FOR UPDATE SKIP LOCKED ) UPDATE {table} SET {set_clause} WHERE ctid IN (SELECT ctid FROM batch) """) as cur: affected = cur.rowcount if affected == 0: break processed += affected print(f"Progress: {processed}/{total} ({processed*100//total}%)") time.sleep(SLEEP_BETWEEN_BATCHES) print(f"Backfill complete: {processed} rows updated") # Usage backfill_column( "postgresql://localhost/mydb", "users", "email_normalized = lower(email)", "email_normalized IS NULL" ) # backfill.py — Safe batched backfill import psycopg import time BATCH_SIZE = 5000 SLEEP_BETWEEN_BATCHES = 0.1 # seconds def backfill_column(conn_string: str, table: str, set_clause: str, where_clause: str): with psycopg.connect(conn_string) as conn: conn.autocommit = True # Each batch is its own transaction # Get total count with conn.execute(f"SELECT count(*) FROM {table} WHERE {where_clause}") as cur: total = cur.fetchone()[0] processed = 0 while True: with conn.execute(f""" WITH batch AS ( SELECT ctid FROM {table} WHERE {where_clause} LIMIT {BATCH_SIZE} FOR UPDATE SKIP LOCKED ) UPDATE {table} SET {set_clause} WHERE ctid IN (SELECT ctid FROM batch) """) as cur: affected = cur.rowcount if affected == 0: break processed += affected print(f"Progress: {processed}/{total} ({processed*100//total}%)") time.sleep(SLEEP_BETWEEN_BATCHES) print(f"Backfill complete: {processed} rows updated") # Usage backfill_column( "postgresql://localhost/mydb", "users", "email_normalized = lower(email)", "email_normalized IS NULL" ) # .github/workflows/migration-lint.yml name: Migration Safety Check on: pull_request: paths: - 'migrations/**' jobs: lint-migrations: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install squawk run: npm install -g squawk-cli - name: Lint new migrations run: | # Find new migration files MIGRATIONS=$(git diff --name-only origin/main...HEAD -- 'migrations/*.sql') if [ -n "$MIGRATIONS" ]; then echo "$MIGRATIONS" | xargs squawk fi # .github/workflows/migration-lint.yml name: Migration Safety Check on: pull_request: paths: - 'migrations/**' jobs: lint-migrations: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install squawk run: npm install -g squawk-cli - name: Lint new migrations run: | # Find new migration files MIGRATIONS=$(git diff --name-only origin/main...HEAD -- 'migrations/*.sql') if [ -n "$MIGRATIONS" ]; then echo "$MIGRATIONS" | xargs squawk fi # .github/workflows/migration-lint.yml name: Migration Safety Check on: pull_request: paths: - 'migrations/**' jobs: lint-migrations: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install squawk run: npm install -g squawk-cli - name: Lint new migrations run: | # Find new migration files MIGRATIONS=$(git diff --name-only origin/main...HEAD -- 'migrations/*.sql') if [ -n "$MIGRATIONS" ]; then echo "$MIGRATIONS" | xargs squawk fi - Acquire an ACCESS EXCLUSIVE lock on the table - Rewrite the entire table (for NOT NULL with DEFAULT on PG < 11) - Block ALL reads and writes until complete - Cannot run inside a transaction - Takes longer (two table scans instead of one) - Can fail and leave an invalid index — always check: - ALTER TABLE ... ADD COLUMN ... DEFAULT (unsafe before PG 11) - CREATE INDEX without CONCURRENTLY - ALTER TABLE ... ADD CONSTRAINT ... FOREIGN KEY without NOT VALID - Column type changes that rewrite tables - NOT NULL additions without CHECK constraint pattern - Will it acquire an ACCESS EXCLUSIVE lock? If yes, restructure. - Does it rewrite the table? Check with ALTER TABLE ... SET (fillfactor = 100) trick or check the PG docs. - Are indexes created CONCURRENTLY? - Are constraints added NOT VALID first? - Is the backfill batched with sleep intervals? - Can you roll back? Every expand migration should be reversible without data loss. - Is the app compatible with both old and new schema? The expand phase MUST work with the current app code. - squawk: Migration linter for PostgreSQL - pgroll: Automated zero-downtime migrations from Xata - reshape: Schema migration tool with automatic expand-contract - pg_repack: Repacks tables without ACCESS EXCLUSIVE locks - Flyway/Liquibase: Migration runners (use with the patterns above)