Tools: Scaling Rails on Bare Metal - Horizontal Scaling, Connection Pooling, Read Replicas, Load Balancing (2026)

Tools: Scaling Rails on Bare Metal - Horizontal Scaling, Connection Pooling, Read Replicas, Load Balancing (2026)

Start with a single healthy node

Run multiple app servers behind Nginx

Make app nodes consistent with systemd

Watch your database connection budget

Use Redis for the work Redis is good at

Add a read replica when reads dominate

Load test before you trust the architecture

A practical scaling order

What to remember Scaling a Rails app on bare metal is mostly about removing bottlenecks one layer at a time. You do not need magic. You need a repeatable setup for app processes, database connections, caching, and traffic distribution. In this post, we’ll scale a small Rails API from one server to two app nodes behind Nginx, then tighten the database and Redis setup so the app keeps behaving under load. Before you scale out, make one node boring and stable. Your Rails production settings should already include connection pooling and caching. If each Puma worker has 5 threads and you run 2 workers, one app server can create up to 10 active database connections. That number matters once you add more machines. Assume we have these nodes: Install Nginx on the load balancer: Create an upstream config: least_conn is a good default for Rails because requests do not all take the same amount of time. Each app server should run the same release and the same service file. Then reload and start: Now both nodes serve the same app, and Nginx distributes traffic between them. Horizontal scaling usually hits PostgreSQL first. Worst case: 2 x 2 x 5 = 20 app connections. Add background jobs, console sessions, and maintenance tasks, and you can exhaust PostgreSQL fast. A simple rule is to calculate the total connection budget before adding nodes. If your database allows 100 connections, do not spend 95 of them on web traffic. Leave room for jobs, migrations, and admin access. On PostgreSQL, check current pressure: If connection churn becomes a problem, add PgBouncer between Rails and PostgreSQL. That is often the cleanest next step on bare metal. Do not make PostgreSQL do everything. Low-cost caching usually gives you a bigger win than adding more CPUs. If your app spends most of its time reading dashboards, feeds, or AI result history, move those reads away from the primary. Rails supports multiple database roles. Then route safe reads: Be careful here. Replicas have lag. Do not read from a replica immediately after a write if the user expects fresh data. Do not guess. Generate traffic. For an endpoint with database work: Watch these during the test: If one app node is idle while the database is overloaded, the bottleneck is not Rails. It is the data layer. On bare metal, I’d scale in this order: That order keeps the system understandable. Scaling Rails on bare metal is not about collecting fancy components. It is about understanding pressure points. Next time, we’ll zoom out and look at the full AI Rails stack, from web requests to background jobs to model calls and vector search. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# config/puma.rb workers Integer(ENV.fetch("WEB_CONCURRENCY", 2)) threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS", 5)) threads threads_count, threads_count preload_app! port ENV.fetch("PORT", 3000) environment ENV.fetch("RAILS_ENV", "production") pidfile ENV.fetch("PIDFILE", "tmp/pids/server.pid") plugin :tmp_restart # config/puma.rb workers Integer(ENV.fetch("WEB_CONCURRENCY", 2)) threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS", 5)) threads threads_count, threads_count preload_app! port ENV.fetch("PORT", 3000) environment ENV.fetch("RAILS_ENV", "production") pidfile ENV.fetch("PIDFILE", "tmp/pids/server.pid") plugin :tmp_restart # config/puma.rb workers Integer(ENV.fetch("WEB_CONCURRENCY", 2)) threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS", 5)) threads threads_count, threads_count preload_app! port ENV.fetch("PORT", 3000) environment ENV.fetch("RAILS_ENV", "production") pidfile ENV.fetch("PIDFILE", "tmp/pids/server.pid") plugin :tmp_restart # config/database.yml production: primary: adapter: postgresql encoding: unicode pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %> url: <%= ENV.fetch("DATABASE_URL") %> # config/database.yml production: primary: adapter: postgresql encoding: unicode pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %> url: <%= ENV.fetch("DATABASE_URL") %> # config/database.yml production: primary: adapter: postgresql encoding: unicode pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %> url: <%= ENV.fetch("DATABASE_URL") %> -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install -y nginx -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install -y nginx -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install -y nginx # /etc/nginx/sites-available/myapp upstream rails_app { least_conn; server 10.0.0.11:3000 max_fails=3 fail_timeout=30s; server 10.0.0.12:3000 max_fails=3 fail_timeout=30s; keepalive 32; } server { listen 80; server_name myapp.example.com; location / { proxy_pass http://rails_app; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Connection ""; proxy_read_timeout 60s; } } # /etc/nginx/sites-available/myapp upstream rails_app { least_conn; server 10.0.0.11:3000 max_fails=3 fail_timeout=30s; server 10.0.0.12:3000 max_fails=3 fail_timeout=30s; keepalive 32; } server { listen 80; server_name myapp.example.com; location / { proxy_pass http://rails_app; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Connection ""; proxy_read_timeout 60s; } } # /etc/nginx/sites-available/myapp upstream rails_app { least_conn; server 10.0.0.11:3000 max_fails=3 fail_timeout=30s; server 10.0.0.12:3000 max_fails=3 fail_timeout=30s; keepalive 32; } server { listen 80; server_name myapp.example.com; location / { proxy_pass http://rails_app; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Connection ""; proxy_read_timeout 60s; } } -weight: 600;">sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp -weight: 600;">sudo nginx -t -weight: 600;">sudo -weight: 500;">systemctl reload nginx -weight: 600;">sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp -weight: 600;">sudo nginx -t -weight: 600;">sudo -weight: 500;">systemctl reload nginx -weight: 600;">sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp -weight: 600;">sudo nginx -t -weight: 600;">sudo -weight: 500;">systemctl reload nginx # /etc/systemd/system/myapp.-weight: 500;">service [Unit] Description=My Rails App After=network.target [Service] Type=simple User=deploy WorkingDirectory=/var/www/myapp/current Environment=RAILS_ENV=production Environment=PORT=3000 Environment=WEB_CONCURRENCY=2 Environment=RAILS_MAX_THREADS=5 Environment=DATABASE_URL=postgresql://myapp:[email protected]/myapp_production ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb Restart=always RestartSec=5 [Install] WantedBy=multi-user.target # /etc/systemd/system/myapp.-weight: 500;">service [Unit] Description=My Rails App After=network.target [Service] Type=simple User=deploy WorkingDirectory=/var/www/myapp/current Environment=RAILS_ENV=production Environment=PORT=3000 Environment=WEB_CONCURRENCY=2 Environment=RAILS_MAX_THREADS=5 Environment=DATABASE_URL=postgresql://myapp:[email protected]/myapp_production ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb Restart=always RestartSec=5 [Install] WantedBy=multi-user.target # /etc/systemd/system/myapp.-weight: 500;">service [Unit] Description=My Rails App After=network.target [Service] Type=simple User=deploy WorkingDirectory=/var/www/myapp/current Environment=RAILS_ENV=production Environment=PORT=3000 Environment=WEB_CONCURRENCY=2 Environment=RAILS_MAX_THREADS=5 Environment=DATABASE_URL=postgresql://myapp:[email protected]/myapp_production ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb Restart=always RestartSec=5 [Install] WantedBy=multi-user.target -weight: 600;">sudo -weight: 500;">systemctl daemon-reload -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable --now myapp -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status myapp -weight: 600;">sudo -weight: 500;">systemctl daemon-reload -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable --now myapp -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status myapp -weight: 600;">sudo -weight: 500;">systemctl daemon-reload -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable --now myapp -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status myapp total_connections = app_servers * workers * threads total_connections = app_servers * workers * threads total_connections = app_servers * workers * threads SELECT datname, usename, state, count(*) FROM pg_stat_activity GROUP BY datname, usename, state ORDER BY count(*) DESC; SELECT datname, usename, state, count(*) FROM pg_stat_activity GROUP BY datname, usename, state ORDER BY count(*) DESC; SELECT datname, usename, state, count(*) FROM pg_stat_activity GROUP BY datname, usename, state ORDER BY count(*) DESC; # config/environments/production.rb config.cache_store = :redis_cache_store, { url: ENV.fetch("REDIS_URL"), namespace: "myapp-cache", expires_in: 12.hours } # config/environments/production.rb config.cache_store = :redis_cache_store, { url: ENV.fetch("REDIS_URL"), namespace: "myapp-cache", expires_in: 12.hours } # config/environments/production.rb config.cache_store = :redis_cache_store, { url: ENV.fetch("REDIS_URL"), namespace: "myapp-cache", expires_in: 12.hours } # config/database.yml production: primary: url: <%= ENV.fetch("DATABASE_URL") %> primary_replica: url: <%= ENV.fetch("DATABASE_REPLICA_URL") %> replica: true # config/database.yml production: primary: url: <%= ENV.fetch("DATABASE_URL") %> primary_replica: url: <%= ENV.fetch("DATABASE_REPLICA_URL") %> replica: true # config/database.yml production: primary: url: <%= ENV.fetch("DATABASE_URL") %> primary_replica: url: <%= ENV.fetch("DATABASE_REPLICA_URL") %> replica: true ActiveRecord::Base.connected_to(role: :reading) do @recent_messages = Message.order(created_at: :desc).limit(50) end ActiveRecord::Base.connected_to(role: :reading) do @recent_messages = Message.order(created_at: :desc).limit(50) end ActiveRecord::Base.connected_to(role: :reading) do @recent_messages = Message.order(created_at: :desc).limit(50) end wrk -t4 -c100 -d30s http://myapp.example.com/health wrk -t4 -c100 -d30s http://myapp.example.com/health wrk -t4 -c100 -d30s http://myapp.example.com/health wrk -t4 -c50 -d30s http://myapp.example.com/posts wrk -t4 -c50 -d30s http://myapp.example.com/posts wrk -t4 -c50 -d30s http://myapp.example.com/posts - Ubuntu VPS machines - Nginx as the load balancer - Puma for app processes - PostgreSQL as the primary database - Redis for caching - 10.0.0.11 app-1 - 10.0.0.12 app-2 - 10.0.0.10 lb-1 - 2 app servers - 2 Puma workers each - 5 threads per worker - cache store - rate limiting - background job queue metadata - ephemeral counters - Puma CPU and memory - Nginx upstream errors - PostgreSQL active connections and slow queries - Redis memory - p95 and p99 latency - tune Puma and database pool - add Redis caching - move to multiple app nodes behind Nginx - add PgBouncer if connections get messy - add a read replica for heavy reads - split background jobs onto separate workers - Nginx spreads traffic - Puma converts CPU and memory into request handling - PostgreSQL usually becomes the first real limit - Redis removes repeated work - replicas help read-heavy workloads