Tools: My Microservices Broke on OpenShift — And How a Hidden Kubernetes Quota Nearly Cost Me Days Why

Tools: My Microservices Broke on OpenShift — And How a Hidden Kubernetes Quota Nearly Cost Me Days Why

The Setup

The Architecture

The CI/CD Pipeline

The Symptom

Bug #1: The Silent Quota Killer

What I Saw

Digging Deeper

What Actually Happened

The Fix

Fix A: Set revisionHistoryLimit in Your Deployments (Best Practice)

Fix B: Add Cleanup to Your Deploy Script (Recovery Safety Net)

Bug #2: The Gateway That Routed to Itself

The Clue

The Problem

The Irony

The Fix

The Complete Debugging Checklist

Lessons Learned

1. "It works on my machine" extends to Kubernetes

2. Kubernetes fails silently in ways you do not expect

3. Free-tier clusters have hidden constraints

4. Set revisionHistoryLimit from day one

5. CI/CD amplifies configuration bugs

Summary Deploying four Spring Boot microservices to OpenShift Developer Sandbox, I hit two silent failures — a ReplicaSet quota exhaustion and a gateway routing to localhost. Here is the full debugging story. If you're deploying microservices on OpenShift's free Developer Sandbox (or any resource-constrained Kubernetes cluster), this post might save you hours of debugging. I built a production-grade mobile application backed by a microservices architecture — four Spring Boot services deployed to Red Hat OpenShift Developer Sandbox via a fully automated GitHub Actions CI/CD pipeline. Everything was containerized, secrets-managed, health-probed, and CI/CD automated. It worked flawlessly on localhost. Then I deployed it. It broke. For two days. Here is how the system is wired: The mobile app hits the API Gateway via an OpenShift Route (HTTPS). The gateway reads the URL path and forwards it to the correct internal microservice via Kubernetes Service DNS names (e.g., http://app-auth-svc:8080). Every push to main triggers a GitHub Actions workflow that: Sounds bulletproof, right? Here is where it fell apart. After deploying, the app showed one message on every action — registration, login, anything: "Something went wrong. Please try again later." The classic generic error that tells you absolutely nothing. The CI/CD pipeline failed with this in the logs: Confusing, right? The auth service clearly started successfully (36 seconds, listening on port 8080). But the deployment was marked as failed. Looking at the pod events, I found: And buried further down, the real error: Here is what most people do not know about Kubernetes deployments: Every time you run oc rollout restart (or kubectl rollout restart), Kubernetes does not just restart your pods. It creates an entirely new ReplicaSet while keeping the old ones around as rollback history. By default, Kubernetes keeps the last 10 ReplicaSets per deployment (controlled by revisionHistoryLimit, which defaults to 10). Now multiply that by 4 microservices: The OpenShift Developer Sandbox (free tier) has a strict quota on the total number of ReplicaSets allowed in your namespace. After just a few CI/CD runs, I silently hit that ceiling. When the quota is exceeded: The app code was perfectly fine. Kubernetes just silently refused to create pods. There are two approaches, and I recommend using both: Add revisionHistoryLimit: 1 to every deployment manifest: Why 1 and not 0? Setting it to 0 means Kubernetes keeps zero rollback history. If a bad deployment goes out, you cannot do oc rollout undo to instantly revert. Keeping 1 gives you exactly one rollback point — enough for safety without wasting quota. This is the best practice because if anything goes wrong with a new deployment, you still have an instant rollback option. With 4 services at revisionHistoryLimit: 1: Add this before the rollout restart commands in your deployment script: Fix B is useful for one-time recovery when you have already hit the quota, or as a safety net alongside Fix A. But Fix A is the real solution — it is declarative, permanent, and prevents the problem from ever occurring again. Even after fixing the quota issue and getting all pods running, the app still did not work. Registration still failed. I hit the gateway health endpoint directly: Gateway was healthy. But hitting an actual API route: My API Gateway's application.yml had hardcoded localhost URLs for routing: On my machine, all 4 services run on the same host (localhost) on different ports. It works. On Kubernetes, each service runs in a separate pod with its own network namespace. localhost:7071 inside the gateway pod is just... the gateway pod itself. There is nothing listening on port 7071 there. My deploy script already created the correct internal URLs as Kubernetes secrets: And my other services correctly used them: Only the gateway was missed. The env vars were injected into the pod but never referenced in the routing config. Replace hardcoded URLs with environment variable references (with localhost as the default for local development): The ${ENV_VAR:default} syntax means: One config, works everywhere. If your microservices work locally but fail on OpenShift/Kubernetes, run through this: localhost routing is the microservice equivalent of "works on my machine." Always use environment variables with sensible defaults so the same config works in both environments. The ReplicaSet quota error did not crash my app. It did not log a warning. It just silently prevented new pods from being created, and the symptoms (readiness probe failure, connection refused) pointed you in the completely wrong direction. OpenShift Developer Sandbox, Google Cloud free tier, Azure free tier — they all have resource quotas that do not exist in your local Minikube or Docker Desktop Kubernetes. Always run oc describe quota (or kubectl describe quota) in your namespace to know your limits. Do not wait until you hit the quota. Add revisionHistoryLimit: 1 to every deployment manifest as a standard practice. It keeps your cluster clean, stays within quotas, and still gives you one rollback point for safety. When you deploy manually, you might catch issues because you are watching the logs. When CI/CD deploys automatically on every push, a configuration bug silently breaks production while you are still writing code, thinking everything is fine. If you are deploying microservices on a free-tier Kubernetes cluster and your deployments mysteriously stop working after a few CI/CD runs — check your ReplicaSet count. That silent quota limit is probably the culprit. Have you hit weird Kubernetes issues on free-tier clusters? I would love to hear about them — connect with me on LinkedIn or check out more on anupamkushwaha.me. And here is the full blog Link Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

AuthServiceApplication - Started AuthServiceApplication in 36.106 seconds ... Error from server (BadRequest): previous terminated container "app-auth" in pod "app-auth-xxxxx-xxxxx" not found Error: Process completed with exit code 1. AuthServiceApplication - Started AuthServiceApplication in 36.106 seconds ... Error from server (BadRequest): previous terminated container "app-auth" in pod "app-auth-xxxxx-xxxxx" not found Error: Process completed with exit code 1. AuthServiceApplication - Started AuthServiceApplication in 36.106 seconds ... Error from server (BadRequest): previous terminated container "app-auth" in pod "app-auth-xxxxx-xxxxx" not found Error: Process completed with exit code 1. Readiness probe failed: Get "http://10.x.x.x:8080/actuator/health": connection refused Readiness probe failed: Get "http://10.x.x.x:8080/actuator/health": connection refused Readiness probe failed: Get "http://10.x.x.x:8080/actuator/health": connection refused replicasets.apps is forbidden: exceeded quota replicasets.apps is forbidden: exceeded quota replicasets.apps is forbidden: exceeded quota apiVersion: apps/v1 kind: Deployment metadata: name: my-service spec: revisionHistoryLimit: 1 # Only keep 1 old ReplicaSet replicas: 1 selector: matchLabels: app: my-service # ... rest of your spec apiVersion: apps/v1 kind: Deployment metadata: name: my-service spec: revisionHistoryLimit: 1 # Only keep 1 old ReplicaSet replicas: 1 selector: matchLabels: app: my-service # ... rest of your spec apiVersion: apps/v1 kind: Deployment metadata: name: my-service spec: revisionHistoryLimit: 1 # Only keep 1 old ReplicaSet replicas: 1 selector: matchLabels: app: my-service # ... rest of your spec # Clean up old ReplicaSets to avoid quota issues on free-tier clusters echo "Cleaning up old ReplicaSets..." for dep in app-auth app-user app-core app-gateway; do # Get all ReplicaSets for this deployment, sorted oldest first # Delete all except the most recent one OLD_RS=$(oc get rs -l "app=$dep" \ --sort-by=.metadata.creationTimestamp \ -o name 2>/dev/null | head -n -1) if [ -n "$OLD_RS" ]; then echo "$OLD_RS" | xargs oc delete echo " Cleaned old ReplicaSets for $dep" fi done # Clean up old ReplicaSets to avoid quota issues on free-tier clusters echo "Cleaning up old ReplicaSets..." for dep in app-auth app-user app-core app-gateway; do # Get all ReplicaSets for this deployment, sorted oldest first # Delete all except the most recent one OLD_RS=$(oc get rs -l "app=$dep" \ --sort-by=.metadata.creationTimestamp \ -o name 2>/dev/null | head -n -1) if [ -n "$OLD_RS" ]; then echo "$OLD_RS" | xargs oc delete echo " Cleaned old ReplicaSets for $dep" fi done # Clean up old ReplicaSets to avoid quota issues on free-tier clusters echo "Cleaning up old ReplicaSets..." for dep in app-auth app-user app-core app-gateway; do # Get all ReplicaSets for this deployment, sorted oldest first # Delete all except the most recent one OLD_RS=$(oc get rs -l "app=$dep" \ --sort-by=.metadata.creationTimestamp \ -o name 2>/dev/null | head -n -1) if [ -n "$OLD_RS" ]; then echo "$OLD_RS" | xargs oc delete echo " Cleaned old ReplicaSets for $dep" fi done curl https://my-gateway-route.apps.openshiftapps.com/actuator/health # → 200 OK ✅ curl https://my-gateway-route.apps.openshiftapps.com/actuator/health # → 200 OK ✅ curl https://my-gateway-route.apps.openshiftapps.com/actuator/health # → 200 OK ✅ curl https://my-gateway-route.apps.openshiftapps.com/api/auth/signup # → 502 Bad Gateway ❌ curl https://my-gateway-route.apps.openshiftapps.com/api/auth/signup # → 502 Bad Gateway ❌ curl https://my-gateway-route.apps.openshiftapps.com/api/auth/signup # → 502 Bad Gateway ❌ spring: cloud: gateway: routes: - id: auth-service uri: http://localhost:7071 # Works on my laptop predicates: - Path=/api/auth/** - id: user-service uri: http://localhost:7072 # Works on my laptop predicates: - Path=/api/users/** - id: core-service uri: http://localhost:7073 # Works on my laptop predicates: - Path=/api/core/** spring: cloud: gateway: routes: - id: auth-service uri: http://localhost:7071 # Works on my laptop predicates: - Path=/api/auth/** - id: user-service uri: http://localhost:7072 # Works on my laptop predicates: - Path=/api/users/** - id: core-service uri: http://localhost:7073 # Works on my laptop predicates: - Path=/api/core/** spring: cloud: gateway: routes: - id: auth-service uri: http://localhost:7071 # Works on my laptop predicates: - Path=/api/auth/** - id: user-service uri: http://localhost:7072 # Works on my laptop predicates: - Path=/api/users/** - id: core-service uri: http://localhost:7073 # Works on my laptop predicates: - Path=/api/core/** oc create secret generic app-secrets \ --from-literal=AUTH_SERVICE_URL="http://app-auth-svc:8080" \ --from-literal=USER_SERVICE_URL="http://app-user-svc:8080" \ --from-literal=CORE_SERVICE_URL="http://app-core-svc:8080" oc create secret generic app-secrets \ --from-literal=AUTH_SERVICE_URL="http://app-auth-svc:8080" \ --from-literal=USER_SERVICE_URL="http://app-user-svc:8080" \ --from-literal=CORE_SERVICE_URL="http://app-core-svc:8080" oc create secret generic app-secrets \ --from-literal=AUTH_SERVICE_URL="http://app-auth-svc:8080" \ --from-literal=USER_SERVICE_URL="http://app-user-svc:8080" \ --from-literal=CORE_SERVICE_URL="http://app-core-svc:8080" # core-service application.yml — Correct app: user-service: base-url: ${USER_SERVICE_URL:http://localhost:7072} # core-service application.yml — Correct app: user-service: base-url: ${USER_SERVICE_URL:http://localhost:7072} # core-service application.yml — Correct app: user-service: base-url: ${USER_SERVICE_URL:http://localhost:7072} spring: cloud: gateway: routes: - id: auth-service uri: ${AUTH_SERVICE_URL:http://localhost:7071} predicates: - Path=/api/auth/** - id: user-service uri: ${USER_SERVICE_URL:http://localhost:7072} predicates: - Path=/api/users/** - id: core-service uri: ${CORE_SERVICE_URL:http://localhost:7073} predicates: - Path=/api/core/** spring: cloud: gateway: routes: - id: auth-service uri: ${AUTH_SERVICE_URL:http://localhost:7071} predicates: - Path=/api/auth/** - id: user-service uri: ${USER_SERVICE_URL:http://localhost:7072} predicates: - Path=/api/users/** - id: core-service uri: ${CORE_SERVICE_URL:http://localhost:7073} predicates: - Path=/api/core/** spring: cloud: gateway: routes: - id: auth-service uri: ${AUTH_SERVICE_URL:http://localhost:7071} predicates: - Path=/api/auth/** - id: user-service uri: ${USER_SERVICE_URL:http://localhost:7072} predicates: - Path=/api/users/** - id: core-service uri: ${CORE_SERVICE_URL:http://localhost:7073} predicates: - Path=/api/core/** - Flutter mobile frontend (automated APK releases) - API Gateway (Spring Cloud Gateway) — single entry point for all client requests - Auth Service — handles registration, login, OTP verification, JWT tokens - User Service — user profiles, preferences, settings - Core Service — main business logic, AI features, data processing - MongoDB Atlas — separate databases per service - GitHub Container Registry (GHCR) — Docker image hosting - OpenShift Developer Sandbox — free-tier Kubernetes hosting - Builds all 4 services with Maven - Creates Docker images and pushes to GHCR - Logs into OpenShift via CLI (oc login) - Creates/updates Kubernetes secrets (MongoDB URIs, JWT secret, API keys) - Applies all deployment manifests - Runs oc rollout restart on each deployment - Waits for health checks to pass - 4 services × 10 ReplicaSets = 40 ReplicaSets - Kubernetes cannot create new ReplicaSets for the rollout - No new ReplicaSet = no new pods get scheduled - No pods = readiness probe has nothing to connect to → connection refused - Rollout waits... and eventually times out → context deadline exceeded - Pipeline fails with exit code 1 - 4 services × (1 current + 1 old) = 8 ReplicaSets — well within any quota. - On Kubernetes: uses the injected secret value → http://app-auth-svc:8080 - On localhost: falls back to the default → http://localhost:7071