Tools: How to monitor cron jobs (and stop silent failures) (2026)

Tools: How to monitor cron jobs (and stop silent failures) (2026)

The Problem

Real-World Example

Why It Happens

Why It Is Dangerous

1. Data Loss

2. Broken Pipelines

3. Missed Business Logic

4. Harder Debugging

How to Detect It

Heartbeat Monitoring

Simple Solution

Example Using curl

Common Mistakes

1. Only Checking Logs

2. Not Handling Failures Explicitly

3. Ignoring Timeouts

4. Monitoring the Wrong Thing

5. No Alerting

Alternative Approaches

1. Log Monitoring

2. Uptime Checks

3. Queue-Based Systems

4. Custom Monitoring Scripts

How do I know if my cron job ran successfully?

Can cron send emails on failure?

What is the best way to monitor cron jobs in production?

How often should I expect heartbeats?

Conclusion Cron jobs feel simple until they fail silently. This guide explains why that happens and how heartbeat monitoring helps you catch missed runs before they turn into real incidents. Cron jobs do not tell you when they fail. That is the core issue. Unlike a web server or API, there is no built-in alerting, no dashboard, and no visibility by default. If a job crashes, times out, or never runs, you often will not know unless: You have a nightly backup job: One day, the script starts failing because of a permission issue. Cron keeps triggering it, but nothing actually works. Two weeks later, you need a backup. Cron is intentionally simple. It just schedules commands. That is it. Even worse, failures can happen in subtle ways: Cron will still run the command on schedule, but success is your responsibility. Silent failures are the worst kind of failures. Here is what can go wrong: Backups fail quietly. You do not notice until it is too late. ETL jobs stop syncing data, which leads to stale dashboards and bad decisions. Emails, billing tasks, or cleanup scripts stop running. You do not know when things broke, only that they are broken now. The longer a cron job runs without monitoring, the higher the risk. To monitor cron jobs effectively, you need external confirmation that they ran successfully. This is where the idea of a heartbeat comes in. A heartbeat is a simple signal sent by your cron job when it completes. Instead of checking logs manually, you flip the model: Tell me when something does not happen. This is much more reliable. The simplest way to implement heartbeat monitoring is to send an HTTP request at the end of your cron job. Let us say you have a monitoring endpoint: Modify your cron job like this: If the script fails, the heartbeat is never sent. Now your monitoring system can: You can also send a failure signal: That gives you even more visibility. Even when people try to monitor cron jobs, they often get it wrong. Logs are passive. If you are not actively looking, they do not help. Using ; instead of && means the heartbeat fires even if the job fails. If your job hangs, it may never send a signal. Add timeouts where possible. Checking whether cron started is not enough. You need to know the job completed. Sending a heartbeat is useless if nobody gets alerted when it goes missing. Heartbeat monitoring is simple and effective, but it is not the only option. Tools like ELK or Loki can detect errors in logs. You can expose an endpoint and have a service ping it. If your jobs run through queues, such as workers, you can track success and failure there. You can build your own system to track execution timestamps. At this point, instead of building and maintaining your own heartbeat system, you can use a purpose-built tool. Tools like QuietPulse let you define expected intervals and alert you when a cron job misses its heartbeat without much setup. The most reliable way is to send a heartbeat after successful execution. If the signal does not arrive, assume failure and alert. Yes, cron can send output to email via MAILTO, but: Heartbeat monitoring is usually the simplest and most effective approach: It depends on your schedule: Set a buffer, or grace period, to avoid false alerts. Cron jobs are deceptively simple, but dangerously invisible. If you do not actively monitor them, failures will go unnoticed until they hurt. The easiest way to fix this: Once you start doing this, you stop guessing and start knowing. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

0 2 * * * /usr/local/bin/backup.sh 0 2 * * * /usr/local/bin/backup.sh 0 2 * * * /usr/local/bin/backup.sh https://example.com/heartbeat/backup-job https://example.com/heartbeat/backup-job https://example.com/heartbeat/backup-job 0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://example.com/heartbeat/backup-job 0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://example.com/heartbeat/backup-job 0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://example.com/heartbeat/backup-job 0 2 * * * /usr/local/bin/backup.sh \ && curl -fsS https://example.com/heartbeat/backup-job/success \ || curl -fsS https://example.com/heartbeat/backup-job/failure 0 2 * * * /usr/local/bin/backup.sh \ && curl -fsS https://example.com/heartbeat/backup-job/success \ || curl -fsS https://example.com/heartbeat/backup-job/failure 0 2 * * * /usr/local/bin/backup.sh \ && curl -fsS https://example.com/heartbeat/backup-job/success \ || curl -fsS https://example.com/heartbeat/backup-job/failure backup.sh ; curl ... backup.sh ; curl ... backup.sh ; curl ... backup.sh && curl ... backup.sh && curl ... backup.sh && curl ... - You manually check logs - A user reports that something is broken - Data starts looking wrong - Track execution success - Retry failed jobs - Notify you on failure - Verify that the job actually completed - The script exits early with no useful error output - Dependencies change, such as an API, database, or file system - Network issues break external calls - Environment variables differ from your interactive shell - Arrives on time -> the job is healthy - Is missing or late -> something is wrong - Runs your script - Sends the heartbeat only if the script succeeds because of && - Gives your monitoring system a reliable signal that the job completed - Expect a signal every day around 2 AM - Alert you if it does not arrive - Optionally track failures too - Good for debugging - Works with existing systems - Reactive, not proactive - Easy to miss issues - Works well for APIs - Not ideal for background jobs - Does not confirm job completion - More control - Built-in retries - Overkill for simple cron jobs - Fully customizable - Time-consuming - Reinventing the wheel - It depends on system configuration - It is often unreliable or ignored - It does not detect silent failures - Add a request at the end of the job - Track expected intervals - Alert on missing signals - Hourly jobs -> expect hourly signals - Daily jobs -> expect one signal per day - Add a heartbeat signal to every job - Track when it should arrive - Alert when it does not