Tools: Rebooting a Production VM on Oracle Cloud: A Reference Guide (2026)

Tools: Rebooting a Production VM on Oracle Cloud: A Reference Guide (2026)

☁️ Pre-Flight Checklist

πŸŒ₯️ Takeoff

⛅️ Cruising Altitude

🌀️ Landing & Taxi

Prerequisites

Part 1 β€” Pre-Reboot Checklist

1.1 Verify no critical process is mid-flight

1.2 Validate your Compose configuration

1.3 Read your apt upgrade output

1.4 Understand the service restart messages

1.5 Check available disk space

Part 2 β€” Running the Reboot

Part 3 β€” Post-Reboot Verification

3.1 Check the Docker daemon

3.2 Check all containers are up

3.3 Watch the live logs

3.4 Check additional system services

Part 4 β€” Measuring Time to Recovery (TTR)

4.1 Measure OS boot time

4.2 Find the bottleneck in the boot sequence

4.3 Find the exact moment containers started

4.4 Build your full TTR timeline

4.5 Use TTR to plan user communications

Quick Reference: All Commands

Troubleshooting Reference Commands, explanations, and real output β€” for engineers who want to understand what's actually happening, not just copy-paste their way through it. Before we taxi down the runway, here’s your flight plan. Keep this handy to navigate your flight path. Welcome aboard the cloud! ☁️ Enjoy your flight! ☁️ There's a specific kind of anxiety that comes with running sudo reboot on a server with real users on it. You know the system should come back, but "should" feels a lot less reassuring at the moment your SSH session freezes. This guide removes the guesswork. It covers everything from reading your apt upgrade output intelligently, to verifying your stack is healthy after the reboot, to measuring your actual recovery time with real commands and real numbers so that the next time you need to do this, it's a procedure, not a gamble. If restart: always isn't set on your services, your containers will not come back after a reboot. Check this first. restart: always tells Docker to relaunch the container whenever it stops β€” whether from a crash or a full system reboot. The one exception to be deliberate about is one-shot containers like database migrations: they're designed to run once and exit cleanly, so no restart policy is the right call for those. Never reboot without completing this checklist. It takes under two minutes and prevents the most common post-reboot problems. If everything shows Up [days/weeks] (healthy), you are clear. Why this matters: If a database migration container is mid-run, or a background job is processing a large task, a reboot will kill it mid-execution. You want to reboot during a quiet moment. Expected output: Your full resolved docker-compose.yml printed to the terminal, with no errors. Why this matters: docker compose config resolves all environment variables and validates YAML syntax. If there's a broken variable reference or a typo in your file, this command catches it now β€” not after the reboot when containers silently fail to start. A common mistake is editing a .env file or docker-compose.yml and not realising you've introduced a syntax error. This is your safety net. When you run sudo apt update && sudo apt upgrade -y before a reboot, the output tells you exactly what changed on your system. Don't skip past it. Here's a real upgrade output and what each part means: How to read this list: The rule of thumb: If the upgrade touches anything in the kernel, networking stack, or container runtime β€” reboot. If it's only application-level packages β€” a reboot is optional but never harmful. After apt upgrade, Ubuntu's needrestart tool prints which services were restarted automatically and which were deferred: "Restarting services" β€” These were restarted immediately. Your SSH connection stayed alive because ssh.service restarts in-place without dropping existing sessions. "Service restarts being deferred" β€” These require a full reboot to apply safely. systemd-logind manages user sessions; restarting it mid-session can cause issues, so Ubuntu defers it to the next clean boot. This line means Docker detected that running container images are still current β€” no container needed to be replaced. This is expected if you haven't rebuilt your application images. You want at least 20% free on your root partition. Docker image pulls and accumulated log files are the two most common causes of a full disk, which can prevent containers from starting after a reboot. Tip: The apt upgrade process often reclaims space automatically by pruning unused Docker build cache layers. In a real upgrade run, this printed: Once the checklist is complete: What happens next, step by step: How long to wait: OCI ARM instances (Ampere A1) typically reboot in 45–90 seconds. Wait at least 60 seconds before trying to reconnect. Run these checks in order. Each one builds on the last. If the daemon isn't running: If a container is missing or in a restart loop: This shows the last 50 log lines for that specific service, which will usually tell you exactly why it failed. The -f flag follows the log stream in real time. --tail=20 shows the last 20 lines per service as a starting point. What healthy output looks like: What a transient (non-critical) error looks like: This pattern β€” an error followed immediately by a successful connection message β€” is normal during cold starts. When all containers launch simultaneously, a dependent service (like a worker) may attempt its first connection before its dependency (like Redis) has finished initialising. The container retries and connects successfully on the next attempt. This is expected behavior. What a critical error looks like: A critical error is one that does not resolve on its own. If you see continuous errors without a recovery line following them, press Ctrl+C and investigate that service. If you run a CI/CD runner or similar agent alongside Docker: TTR is the total time from sudo reboot to the moment your application is serving healthy responses. Measuring it gives you accurate data for maintenance window planning and user communications. This lists every service sorted by how long it took to start, slowest first: In this case, Docker itself accounted for 12 of the 23 total seconds. This is normal β€” Docker has to read its state from disk, re-attach networks, and prepare to launch containers. Why this is useful: If your boot time is unexpectedly long, systemd-analyze blame tells you exactly which service is the bottleneck. Every container launched within the same second. This is because Docker starts all containers in parallel as soon as the daemon is ready. Note: this timestamp reflects when Docker launched the container process, not when the application inside it was ready to serve traffic. A container may take a further 5–30 seconds to pass its health check after this point. Combining the data from the above commands: With a measured TTR, you can set honest expectations. Internal / engineering team: "Maintenance reboot at [time]. Expected downtime: ~2 minutes." The 2-minute internal window gives a buffer above the measured ~60 seconds for anything unexpected. "Scheduled maintenance in progress. Services will be restored within 5 minutes." The 5-minute external window is deliberately conservative. If a container fails its first health check and requires a full restart cycle (up to 5 retries Γ— 5 seconds = 25 extra seconds), you're still within your stated window. Under-promise, over-deliver. Cover photo by BoliviaInteligente on Unsplash Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ -weight: 600;">sudo reboot -weight: 500;">apt -weight: 500;">upgrade -weight: 500;">restart: always -weight: 500;">docker-compose.yml -weight: 500;">restart: always services: backend: image: your-backend-image -weight: 500;">restart: always # βœ… restarts automatically after reboot or crash migrations: image: your-migrations-image # no -weight: 500;">restart policy # βœ… correct β€” this should run once and exit services: backend: image: your-backend-image -weight: 500;">restart: always # βœ… restarts automatically after reboot or crash migrations: image: your-migrations-image # no -weight: 500;">restart policy # βœ… correct β€” this should run once and exit services: backend: image: your-backend-image -weight: 500;">restart: always # βœ… restarts automatically after reboot or crash migrations: image: your-migrations-image # no -weight: 500;">restart policy # βœ… correct β€” this should run once and exit -weight: 500;">restart: always -weight: 500;">docker ps Up 2 days (healthy) Up 3 minutes Restarting (1) Up 2 hours (unhealthy) Up [days/weeks] (healthy) cd ~/your-project -weight: 500;">docker compose config cd ~/your-project -weight: 500;">docker compose config cd ~/your-project -weight: 500;">docker compose config -weight: 500;">docker-compose.yml -weight: 500;">docker compose config -weight: 500;">docker-compose.yml -weight: 500;">apt -weight: 500;">upgrade -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update && -weight: 600;">sudo -weight: 500;">apt -weight: 500;">upgrade -y The following packages will be upgraded: containerd.io coreutils -weight: 500;">docker-ce -weight: 500;">docker-ce-cli -weight: 500;">docker-ce-rootless-extras -weight: 500;">docker-compose-plugin -weight: 500;">docker-model-plugin gitlab-runner gitlab-runner-helper-images libnftables1 nftables python3-pyasn1 The following packages will be upgraded: containerd.io coreutils -weight: 500;">docker-ce -weight: 500;">docker-ce-cli -weight: 500;">docker-ce-rootless-extras -weight: 500;">docker-compose-plugin -weight: 500;">docker-model-plugin gitlab-runner gitlab-runner-helper-images libnftables1 nftables python3-pyasn1 The following packages will be upgraded: containerd.io coreutils -weight: 500;">docker-ce -weight: 500;">docker-ce-cli -weight: 500;">docker-ce-rootless-extras -weight: 500;">docker-compose-plugin -weight: 500;">docker-model-plugin gitlab-runner gitlab-runner-helper-images libnftables1 nftables python3-pyasn1 containerd.io -weight: 500;">docker-ce-cli -weight: 500;">docker-compose-plugin -weight: 500;">docker compose libnftables1 gitlab-runner gitlab-runner-helper-images python3-pyasn1 -weight: 500;">apt -weight: 500;">upgrade needrestart Restarting services... -weight: 500;">systemctl -weight: 500;">restart irqbalance.-weight: 500;">service ssh.-weight: 500;">service rsyslog.-weight: 500;">service ... Service restarts being deferred: -weight: 500;">systemctl -weight: 500;">restart networkd-dispatcher.-weight: 500;">service -weight: 500;">systemctl -weight: 500;">restart systemd-logind.-weight: 500;">service Restarting services... -weight: 500;">systemctl -weight: 500;">restart irqbalance.-weight: 500;">service ssh.-weight: 500;">service rsyslog.-weight: 500;">service ... Service restarts being deferred: -weight: 500;">systemctl -weight: 500;">restart networkd-dispatcher.-weight: 500;">service -weight: 500;">systemctl -weight: 500;">restart systemd-logind.-weight: 500;">service Restarting services... -weight: 500;">systemctl -weight: 500;">restart irqbalance.-weight: 500;">service ssh.-weight: 500;">service rsyslog.-weight: 500;">service ... Service restarts being deferred: -weight: 500;">systemctl -weight: 500;">restart networkd-dispatcher.-weight: 500;">service -weight: 500;">systemctl -weight: 500;">restart systemd-logind.-weight: 500;">service ssh.-weight: 500;">service systemd-logind No containers need to be restarted. No containers need to be restarted. No containers need to be restarted. df -h / Filesystem Size Used Avail Use% Mounted on /dev/sda1 48G 12G 36G 23% / Filesystem Size Used Avail Use% Mounted on /dev/sda1 48G 12G 36G 23% / Filesystem Size Used Avail Use% Mounted on /dev/sda1 48G 12G 36G 23% / -weight: 500;">apt -weight: 500;">upgrade Total reclaimed space: 4.165GB Total reclaimed space: 4.165GB Total reclaimed space: 4.165GB -weight: 600;">sudo reboot -weight: 600;">sudo reboot -weight: 600;">sudo reboot Connection to [ip] closed by remote host. ssh -i ~/.ssh/id_rsa ubuntu@YOUR_IP ssh -i ~/.ssh/id_rsa ubuntu@YOUR_IP ssh -i ~/.ssh/id_rsa ubuntu@YOUR_IP -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status -weight: 500;">docker -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status -weight: 500;">docker -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status -weight: 500;">docker ● -weight: 500;">docker.-weight: 500;">service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/-weight: 500;">docker.-weight: 500;">service; enabled) Active: active (running) since Mon 2026-03-30 15:55:51 UTC; 5min ago ● -weight: 500;">docker.-weight: 500;">service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/-weight: 500;">docker.-weight: 500;">service; enabled) Active: active (running) since Mon 2026-03-30 15:55:51 UTC; 5min ago ● -weight: 500;">docker.-weight: 500;">service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/-weight: 500;">docker.-weight: 500;">service; enabled) Active: active (running) since Mon 2026-03-30 15:55:51 UTC; 5min ago Active: active (running) -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable -weight: 500;">docker # ensure it starts on future reboots -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start -weight: 500;">docker # -weight: 500;">start it now -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable -weight: 500;">docker # ensure it starts on future reboots -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start -weight: 500;">docker # -weight: 500;">start it now -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable -weight: 500;">docker # ensure it starts on future reboots -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start -weight: 500;">docker # -weight: 500;">start it now -weight: 500;">docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fc46f84c7bd5 app-backend "uv run uvi…" 2 days ago Up 5 minutes (healthy) 8000/tcp app_backend a3e9a2eeb160 redis:alpine "-weight: 500;">docker-ent…" 2 weeks ago Up 5 minutes (healthy) 6379/tcp app_redis f4afe2edb00c caddy:alpine "caddy run …" 4 weeks ago Up 5 minutes (healthy) 80, 443 caddy_proxy CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fc46f84c7bd5 app-backend "uv run uvi…" 2 days ago Up 5 minutes (healthy) 8000/tcp app_backend a3e9a2eeb160 redis:alpine "-weight: 500;">docker-ent…" 2 weeks ago Up 5 minutes (healthy) 6379/tcp app_redis f4afe2edb00c caddy:alpine "caddy run …" 4 weeks ago Up 5 minutes (healthy) 80, 443 caddy_proxy CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fc46f84c7bd5 app-backend "uv run uvi…" 2 days ago Up 5 minutes (healthy) 8000/tcp app_backend a3e9a2eeb160 redis:alpine "-weight: 500;">docker-ent…" 2 weeks ago Up 5 minutes (healthy) 6379/tcp app_redis f4afe2edb00c caddy:alpine "caddy run …" 4 weeks ago Up 5 minutes (healthy) 80, 443 caddy_proxy Up (healthy) (health: starting) -weight: 500;">docker compose up -weight: 500;">docker compose logs [service_name] --tail=50 -weight: 500;">docker compose logs [service_name] --tail=50 -weight: 500;">docker compose logs [service_name] --tail=50 cd ~/your-project -weight: 500;">docker compose logs -f --tail=20 cd ~/your-project -weight: 500;">docker compose logs -f --tail=20 cd ~/your-project -weight: 500;">docker compose logs -f --tail=20 app_gate | 127.0.0.1 - - [30/Mar/2026:16:00:00 +0000] "GET / HTTP/1.1" 200 4140 app_backend | INFO: 127.0.0.1:58562 - "GET /health HTTP/1.1" 200 OK caddy_proxy | {"level":"info","msg":"received request","uri":"/config/"} app_redis | * Ready to accept connections tcp app_gate | 127.0.0.1 - - [30/Mar/2026:16:00:00 +0000] "GET / HTTP/1.1" 200 4140 app_backend | INFO: 127.0.0.1:58562 - "GET /health HTTP/1.1" 200 OK caddy_proxy | {"level":"info","msg":"received request","uri":"/config/"} app_redis | * Ready to accept connections tcp app_gate | 127.0.0.1 - - [30/Mar/2026:16:00:00 +0000] "GET / HTTP/1.1" 200 4140 app_backend | INFO: 127.0.0.1:58562 - "GET /health HTTP/1.1" 200 OK caddy_proxy | {"level":"info","msg":"received request","uri":"/config/"} app_redis | * Ready to accept connections tcp app_worker | redis.exceptions.ConnectionError: Error while reading from redis:6379 : (104, 'Connection reset by peer') app_worker | 15:56:15: Starting worker for 1 functions: process_message app_worker | 15:56:15: redis_version=8.6.1 mem_usage=1.38M clients_connected=1 app_worker | redis.exceptions.ConnectionError: Error while reading from redis:6379 : (104, 'Connection reset by peer') app_worker | 15:56:15: Starting worker for 1 functions: process_message app_worker | 15:56:15: redis_version=8.6.1 mem_usage=1.38M clients_connected=1 app_worker | redis.exceptions.ConnectionError: Error while reading from redis:6379 : (104, 'Connection reset by peer') app_worker | 15:56:15: Starting worker for 1 functions: process_message app_worker | 15:56:15: redis_version=8.6.1 mem_usage=1.38M clients_connected=1 app_backend | sqlalchemy.exc.OperationalError: connection refused app_backend | [after 5 retries] giving up app_backend | sqlalchemy.exc.OperationalError: connection refused app_backend | [after 5 retries] giving up app_backend | sqlalchemy.exc.OperationalError: connection refused app_backend | [after 5 retries] giving up -weight: 600;">sudo gitlab-runner -weight: 500;">status -weight: 600;">sudo gitlab-runner -weight: 500;">status -weight: 600;">sudo gitlab-runner -weight: 500;">status gitlab-runner: Service is running gitlab-runner: Service is running gitlab-runner: Service is running -weight: 600;">sudo gitlab-runner -weight: 500;">start -weight: 600;">sudo gitlab-runner -weight: 500;">start -weight: 600;">sudo gitlab-runner -weight: 500;">start -weight: 600;">sudo reboot systemd-analyze systemd-analyze systemd-analyze Startup finished in 3.617s (kernel) + 19.608s (userspace) = 23.225s graphical.target reached after 18.845s in userspace Startup finished in 3.617s (kernel) + 19.608s (userspace) = 23.225s graphical.target reached after 18.845s in userspace Startup finished in 3.617s (kernel) + 19.608s (userspace) = 23.225s graphical.target reached after 18.845s in userspace systemd-analyze blame | head -20 systemd-analyze blame | head -20 systemd-analyze blame | head -20 12.186s -weight: 500;">docker.-weight: 500;">service 4.821s cloud-init.-weight: 500;">service 1.204s snapd.-weight: 500;">service 38ms -weight: 500;">docker.socket 12.186s -weight: 500;">docker.-weight: 500;">service 4.821s cloud-init.-weight: 500;">service 1.204s snapd.-weight: 500;">service 38ms -weight: 500;">docker.socket 12.186s -weight: 500;">docker.-weight: 500;">service 4.821s cloud-init.-weight: 500;">service 1.204s snapd.-weight: 500;">service 38ms -weight: 500;">docker.socket systemd-analyze blame -weight: 500;">docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(-weight: 500;">docker ps -q) -weight: 500;">docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(-weight: 500;">docker ps -q) -weight: 500;">docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(-weight: 500;">docker ps -q) /app_ftp_bridge: 2026-03-30T15:55:57.766Z /app_worker: 2026-03-30T15:55:57.695Z /app_backend: 2026-03-30T15:55:57.646Z /app_gate: 2026-03-30T15:55:57.830Z /app_admin: 2026-03-30T15:55:57.742Z /app_redis: 2026-03-30T15:55:57.794Z /caddy_proxy: 2026-03-30T15:55:57.615Z /app_ftp_bridge: 2026-03-30T15:55:57.766Z /app_worker: 2026-03-30T15:55:57.695Z /app_backend: 2026-03-30T15:55:57.646Z /app_gate: 2026-03-30T15:55:57.830Z /app_admin: 2026-03-30T15:55:57.742Z /app_redis: 2026-03-30T15:55:57.794Z /caddy_proxy: 2026-03-30T15:55:57.615Z /app_ftp_bridge: 2026-03-30T15:55:57.766Z /app_worker: 2026-03-30T15:55:57.695Z /app_backend: 2026-03-30T15:55:57.646Z /app_gate: 2026-03-30T15:55:57.830Z /app_admin: 2026-03-30T15:55:57.742Z /app_redis: 2026-03-30T15:55:57.794Z /caddy_proxy: 2026-03-30T15:55:57.615Z -weight: 600;">sudo reboot # --- PRE-REBOOT --- -weight: 500;">docker ps # check container states -weight: 500;">docker compose config # validate compose file syntax df -h / # check available disk space # --- REBOOT --- -weight: 600;">sudo reboot # initiate the reboot # --- POST-REBOOT --- -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status -weight: 500;">docker # confirm daemon is running -weight: 500;">docker ps # confirm containers are up -weight: 500;">docker compose logs -f --tail=20 # watch live logs -weight: 600;">sudo gitlab-runner -weight: 500;">status # check runner (if applicable) # --- TTR MEASUREMENT --- systemd-analyze # total OS boot time systemd-analyze blame | head -20 # per--weight: 500;">service boot time breakdown -weight: 500;">docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(-weight: 500;">docker ps -q) # exact container -weight: 500;">start timestamps # --- PRE-REBOOT --- -weight: 500;">docker ps # check container states -weight: 500;">docker compose config # validate compose file syntax df -h / # check available disk space # --- REBOOT --- -weight: 600;">sudo reboot # initiate the reboot # --- POST-REBOOT --- -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status -weight: 500;">docker # confirm daemon is running -weight: 500;">docker ps # confirm containers are up -weight: 500;">docker compose logs -f --tail=20 # watch live logs -weight: 600;">sudo gitlab-runner -weight: 500;">status # check runner (if applicable) # --- TTR MEASUREMENT --- systemd-analyze # total OS boot time systemd-analyze blame | head -20 # per--weight: 500;">service boot time breakdown -weight: 500;">docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(-weight: 500;">docker ps -q) # exact container -weight: 500;">start timestamps # --- PRE-REBOOT --- -weight: 500;">docker ps # check container states -weight: 500;">docker compose config # validate compose file syntax df -h / # check available disk space # --- REBOOT --- -weight: 600;">sudo reboot # initiate the reboot # --- POST-REBOOT --- -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status -weight: 500;">docker # confirm daemon is running -weight: 500;">docker ps # confirm containers are up -weight: 500;">docker compose logs -f --tail=20 # watch live logs -weight: 600;">sudo gitlab-runner -weight: 500;">status # check runner (if applicable) # --- TTR MEASUREMENT --- systemd-analyze # total OS boot time systemd-analyze blame | head -20 # per--weight: 500;">service boot time breakdown -weight: 500;">docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(-weight: 500;">docker ps -q) # exact container -weight: 500;">start timestamps -weight: 500;">docker compose logs [-weight: 500;">service] --tail=50 (health: starting) -weight: 500;">docker inspect [id] -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable -weight: 500;">docker && -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start -weight: 500;">docker -weight: 500;">docker compose logs caddy --tail=50 - Prerequisites - Part 1 β€” Pre-Reboot Checklist - Part 2 β€” Running the Reboot - Part 3 β€” Post-Reboot Verification - Part 4 β€” Measuring Time to Recovery (TTR) - Quick Reference: All Commands - Troubleshooting Reference - Ubuntu 22.04 on an OCI Compute instance (ARM or x86) - Docker + Docker Compose managing your services - All long-running services configured with -weight: 500;">restart: always in your -weight: 500;">docker-compose.yml - SSH access to the instance - The OS sends SIGTERM to all running processes, giving them time to shut down cleanly. - Docker receives the signal and stops all containers gracefully. - The kernel shuts down and the VM restarts. - Your SSH session prints Connection to [ip] closed by remote host. and terminates. This is normal. - Active: active (running) β€” the daemon is running βœ… - enabled β€” it is configured to auto--weight: 500;">start on every future boot βœ… - Every -weight: 500;">service you expect should be present. If one is missing, it crashed on startup. - STATUS should be Up or Up (healthy). (health: starting) is fine for the first 30 seconds after boot. - The CREATED timestamp does not reset on reboot β€” it reflects when the container was first created with -weight: 500;">docker compose up. This is normal.