Tools: 🐧 Essential Linux commands for DevOps engineers — must-know tools for real-world workflows

Tools: 🐧 Essential Linux commands for DevOps engineers — must-know tools for real-world workflows

🔍 Navigating the System — Know Where You Are

📁 Where Logs Live — And How to Get Them Fast

💾 Disk Space Panic — Who Ate the GBs?

⚙️ Process Management — Who’s Running What?

🚀 Keeping Services Alive — Intro to systemd

🔐 File Permissions — Don’t Break Security

🤝 Switching Users — The Right Way

📦 Network Debugging — Is It Talking?

🧩 Pipes and Redirection — Chain Like a Pro

🟩 Final Thoughts

❓ Frequently Asked Questions

What’s the difference between kill and kill -9?

Is netstat obsolete?

How do I search for a file by name? "The terminal isn’t magic — it’s just muscle memory you haven’t built yet." I remember the time I spent two hours debugging a CI/CD pipeline because a script failed with “permission denied.” No stack trace, no logs — just red text and silence. Turned out, I’d forgotten chmod +x on deploy.sh. Yeah, I learned this the hard way. Look — we all pretend we’ve got it together. Kubernetes manifests? Check. Terraform modules? Got ‘em. Helm charts? Polished. But then the alert hits at 1:47 AM. Pod crash-looping. And suddenly you’re typing man ps like you’ve never seen a process in your life. Not gonna lie — I’ve been there. More times than I’d like to admit. That night, after the fifth kubectl exec, I went back. Not to docs. Not to YouTube. Back to the shell. The raw, unglamorous Linux terminal. Because here’s the thing: every cloud system, every container, every fancy orchestration tool — they’re all just running on a Linux box with a heartbeat and a grudge. So I rebuilt my muscle memory. Started with 20 core commands. The essential Linux commands for DevOps engineers that show up 90% of the time when shit hits the fan. Spoiler: it changed everything. You don’t need to know 200 commands. You need the ones that make you dangerous. Fast. Confident. Let’s go through them — not like a textbook, but like a senior dev after a chai and a long week. With scars. And opinions. You SSH in. Screen’s blank. No prompt. No clue where the app even lives. And you’re already behind. Linux treats everything like a file. That means if you can’t navigate, you’re blind. Period. Need to find something fast? Say, logs from the last 24 hours? Used this during an audit. Found a cron job dumping 2GB of debug logs every Sunday at 3 AM. Owner? A dev who “meant to clean it up.” (Spoiler: he didn’t.) Logs are your crime scene. Treat ‘em like evidence. Know the usual spots: But real-time monitoring? I caught a 502 avalanche within seconds during a deploy. Turned out the upstream wasn’t ready. Rolled back before users even noticed. Magic? No. Just tail -f. So don’t just run it. Live with it during deployments. Make it ritual. Alert: “Disk usage >95%.” You panic. Logs? Nope. Not the logs. Run df -h. Fast. Human-readable. Shows you the big picture. But — and this matters — df won’t tell you where the bloat lives. Top 5 space hogs in /var. Once, it showed docker/overlay2 at 40GB. A dev had pulled every tag of a base image. Every tag. A quick docker system prune -f later — green alert. Weekend saved. Yeah, I’ve done this twice. (Third time, I added pruning to the CI.) CPU at 98%. No idea why. Servers aren’t haunted. But damn, sometimes they feel like it. ps gives you a snapshot. But ps aux? That’s the full inventory. But snapshots aren’t enough. You need live data. That’s top. Or — better — htop. Install it. Do it now. (Seriously. sudo apt install htop.) Sort by CPU. Memory. Runtime. See what’s spiking. Kill a bad process? Sure. Use kill PID first — sends SIGTERM. Graceful. But if it’s zombie-stubborn? kill -9 PID. Hard kill. No cleanup. Use sparingly. Like nuclear codes. I once killed PostgreSQL mid-write. Recovery took hours. Learned my lesson. Most distros use systemd now. If you don’t know it, you’re flying blind. Is it running? Failed? Masked? This tells you. Want it back after reboot? And — this is critical — start ≠ enable. I learned this the hard way during a patching cycle. Restarted the staging server. Redis was down. Why? Because I’d started it, but never enabled it. Two hours of downtime. Boss wasn’t happy. So now I double-check: systemctl is-enabled nginx. Habit. “Permission denied.” Feels like a slap. But it’s not arbitrary. Linux is strict for a reason. But — and this is big — never, ever chmod 777. It’s the “open the door and leave the keys” of the Linux world. I once saw a production API key leaked because a config file was 777. (Yes, it was on GitHub. No, we didn’t laugh.) So use 644 for config files. 755 for scripts. Be sane. Need to run as postgres? Or jenkins? Use sudo. But don’t jump into their shell unless you have to. Prefer: sudo -u jenkins command. One-off. Clean. If you must: sudo su - jenkins. Full login. Environment and all. But — here’s the thing — always run: Before doing anything destructive. I once deleted /tmp on prod — as root — thinking I was on a sandbox VM. Logs were messy. Recovery was worse. Now? whoami is ritual. Like checking mirrors before reversing. (And yes, that’s a bit paranoid. But I sleep better.) Service not responding? Could be code. Could be config. Or — 60% of the time — it’s networking. Start with the basics: If that fails? DNS. Or routing. Or someone unplugged a cable. (Happens more than you think.) ss is faster than netstat. Shows: I used this once. Node app on port 3000? Not responding. ss showed nothing listening. Why? App crashed on startup. A missing .env var. Found in 90 seconds. Logs confirmed. So — always check if it’s even listening. Headers only. No download. Fast. Useful for health checks. I’ve debugged TLS redirects, load balancer timeouts, even broken CORS with this. It’s small. But sharp. Linux philosophy: small tools. Big power. When chained. Pipes (|) pass output forward. Like an assembly line. Nice for cron jobs. Logs success and failure separately. No noise. A junior I was mentoring asked me how I debugged a spammy script. This was my answer. He’s using it in production now. Oh — and grep, awk, sed? They’re not optional. Need all 500 errors from Nginx? Now — who’s causing them? Top IPs by error count. Found a misconfigured scraper last month. Blocked it fast. I fixed 50 .conf files with one line: -i edits in place. Scary? (like this one — small, wry, self-aware) Yeah. But when you need it, you really need it. You don’t need to be a Linux wizard. But you do need fluency. The essential Linux commands for DevOps engineers? They’re not tools. They’re reflexes. Like driving. You don’t think about the gears. You just drive. From what I’ve seen on real projects — the best engineers aren’t the ones with the most tools. They’re the ones who know a few deeply. They grep like poets. They ss like surgeons. That fluency buys time. Reduces panic. Builds trust. So go break things. In a VM. Spin up a Ubuntu box. Break it. Fix it. Break it again. And when you finally solve it with a two-word command? Because you’ve earned it. kill sends a SIGTERM signal, asking the process to terminate gracefully. kill -9 sends SIGKILL, forcing immediate termination without cleanup. Always try SIGTERM first. Yes, mostly. netstat is deprecated in favor of ss, which is faster and more efficient. Use ss -tulnp as your go-to for port checks. Use find /path -name "filename". For faster results, use locate — but run updatedb first to refresh the index. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

find /var/log -name "*.log" -mtime -1 find /var/log -name "*.log" -mtime -1 find /var/log -name "*.log" -mtime -1 tail -f /var/log/nginx/access.log tail -f /var/log/nginx/access.log tail -f /var/log/nginx/access.log du -sh /var/* | sort -hr | head -5 du -sh /var/* | sort -hr | head -5 du -sh /var/* | sort -hr | head -5 systemctl status nginx systemctl status nginx systemctl status nginx sudo systemctl start nginx sudo systemctl start nginx sudo systemctl start nginx sudo systemctl enable nginx sudo systemctl enable nginx sudo systemctl enable nginx -rwxr-xr-- 1 ubuntu ubuntu 2048 May 10 10:30 deploy.sh -rwxr-xr-- 1 ubuntu ubuntu 2048 May 10 10:30 deploy.sh -rwxr-xr-- 1 ubuntu ubuntu 2048 May 10 10:30 deploy.sh chmod +x deploy.sh chmod +x deploy.sh chmod +x deploy.sh ping google.com ping google.com ping google.com ss -tulnp | grep 80 ss -tulnp | grep 80 ss -tulnp | grep 80 curl -I http://localhost:8080 curl -I http://localhost:8080 curl -I http://localhost:8080 ping google.com > success.log 2> error.log ping google.com > success.log 2> error.log ping google.com > success.log 2> error.log grep " 500 " /var/log/nginx/access.log grep " 500 " /var/log/nginx/access.log grep " 500 " /var/log/nginx/access.log grep " 500 " /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr grep " 500 " /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr grep " 500 " /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr sed -i 's/old-domain.com/new-domain.com/g' *.conf sed -i 's/old-domain.com/new-domain.com/g' *.conf sed -i 's/old-domain.com/new-domain.com/g' *.conf - pwd – Where the hell are you? Run it. Always. Like checking your GPS in a dark alley. - ls – List files. But use ls -la. I missed a broken .env.production once because I didn’t see the dot-file. Cost me 40 minutes and a sprint review. - cd – Obvious? Sure. But cd - is gold. Jumps back to last dir. Lifesaver when you’re bouncing between /var/log and /opt/app. - /var/log/syslog – Ubuntu/Debian. Everything dumps here. - /var/log/messages – RHEL/CentOS. Same idea. - /var/log/auth.log – Failed SSH attempts. If you’re seeing repeated IP tries from Russia — yeah, it’s a bot. Block it. - a – All processes - u – User details - x – Even the ones without a terminal - Read (r) – See contents - Write (w) – Edit - Execute (x) – Run as script - PID and process name - > – Overwrite - >> – Append - 2> – Errors only