Tools: Monitoring GitHub Actions scheduled workflows: a practical guide

Tools: Monitoring GitHub Actions scheduled workflows: a practical guide

Monitoring GitHub Actions scheduled workflows: a practical guide

The basic setup

Adding start/end pings for longer jobs

The gotchas

GitHub delays scheduled workflows

Scheduled workflows stop on inactive repos

Test with workflow_dispatch before trusting the schedule

Secrets aren't available in forks

Full production example

After deploying GitHub Actions is a surprisingly capable cron scheduler. Schedule a workflow, let it run nightly, forget about it. Until it stops running. And you don't notice for two weeks. Scheduled workflows in GitHub Actions are quietly unreliable. GitHub delays them, skips them during high load, and — most importantly — gives you no built-in alerting when they fail silently. Adding external monitoring takes about 5 minutes and saves you from that two-week discovery. Here's a minimal scheduled workflow with monitoring: The last step pings DeadManCheck only if all previous steps succeeded (if: success()). If the export script fails, the ping doesn't fire, and you get alerted after your configured grace period. Set up the monitor with a 25-hour interval (giving a 1-hour buffer on the 24-hour schedule). Store your token in GitHub: Settings → Secrets and variables → Actions → New repository secret named DEADMANCHECK_TOKEN. For jobs that run more than a few minutes, use the start/end pattern. This catches jobs that hang: Your ETL script writes the row count to /tmp/etl_row_count.txt. The monitoring step picks it up and includes it in the ping — so your monitor can alert on zero-output runs, not just missed runs. This is the big one. GitHub's docs admit that scheduled workflows may be delayed during high load. A workflow scheduled for 2:00am UTC might run at 2:23am or 2:51am. During busy periods, delays of 30–60 minutes aren't unusual. Don't set your DeadManCheck interval to exactly 24 hours. Set it to 25 hours. That buffer absorbs GitHub's scheduling jitter without letting real failures go undetected. If a repository has no commits in 60 days, GitHub disables scheduled workflows. You'll get an email warning. If you miss it, the job silently stops running — and your external monitor will catch it where GitHub's notification didn't reach you. Always add workflow_dispatch as a trigger (it's in all examples above). You can trigger the workflow manually from the Actions tab or via the CLI: Test your monitoring integration before the first scheduled run. Confirm the ping appears in your DeadManCheck dashboard with the correct count. If your repo is public and someone forks it, secrets.DEADMANCHECK_TOKEN will be empty in their fork. The curl will fail silently. This is fine — you don't want random forks pinging your monitor — but be aware of it when debugging. A few things worth noting: Trigger the workflow manually and confirm: Wait for the first scheduled run and verify again. Two successful data points before you trust it. Scheduled workflows are one of those things that feel reliable until the day they aren't. External monitoring is the difference between finding out immediately and finding out when someone asks why the weekly report is missing. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ name: Nightly export on: schedule: - cron: '0 2 * * *' # 2am UTC every day workflow_dispatch: # allows manual triggering for testing jobs: export: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run export run: python scripts/export.py - name: Ping DeadManCheck if: success() run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }} > /dev/null name: Nightly export on: schedule: - cron: '0 2 * * *' # 2am UTC every day workflow_dispatch: # allows manual triggering for testing jobs: export: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run export run: python scripts/export.py - name: Ping DeadManCheck if: success() run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }} > /dev/null name: Nightly export on: schedule: - cron: '0 2 * * *' # 2am UTC every day workflow_dispatch: # allows manual triggering for testing jobs: export: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run export run: python scripts/export.py - name: Ping DeadManCheck if: success() run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }} > /dev/null steps: - uses: actions/checkout@v4 - name: Ping -weight: 500;">start run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/-weight: 500;">start > /dev/null || true - name: Run ETL id: etl run: | python scripts/run_etl.py echo "rows=$(cat /tmp/etl_row_count.txt)" >> $GITHUB_OUTPUT - name: Ping done if: success() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}?count=${{ steps.etl.outputs.rows }}" \ > /dev/null || true - name: Ping fail if: failure() run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail > /dev/null || true steps: - uses: actions/checkout@v4 - name: Ping -weight: 500;">start run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/-weight: 500;">start > /dev/null || true - name: Run ETL id: etl run: | python scripts/run_etl.py echo "rows=$(cat /tmp/etl_row_count.txt)" >> $GITHUB_OUTPUT - name: Ping done if: success() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}?count=${{ steps.etl.outputs.rows }}" \ > /dev/null || true - name: Ping fail if: failure() run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail > /dev/null || true steps: - uses: actions/checkout@v4 - name: Ping -weight: 500;">start run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/-weight: 500;">start > /dev/null || true - name: Run ETL id: etl run: | python scripts/run_etl.py echo "rows=$(cat /tmp/etl_row_count.txt)" >> $GITHUB_OUTPUT - name: Ping done if: success() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}?count=${{ steps.etl.outputs.rows }}" \ > /dev/null || true - name: Ping fail if: failure() run: -weight: 500;">curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail > /dev/null || true gh workflow run nightly-export.yml gh workflow run nightly-export.yml gh workflow run nightly-export.yml name: Nightly database backup on: schedule: - cron: '0 2 * * *' workflow_dispatch: jobs: backup: runs-on: ubuntu-latest timeout-minutes: 30 # hard limit — prevent hung jobs accumulating steps: - uses: actions/checkout@v4 - name: Ping -weight: 500;">start run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/-weight: 500;">start" \ > /dev/null || true # don't fail if monitoring is down - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Run backup id: backup run: | python scripts/backup.py echo "rows=$(cat /tmp/backup_row_count.txt)" >> $GITHUB_OUTPUT - name: Upload to S3 run: aws s3 cp /backups/latest.dump s3://my-backups/ - name: Ping done if: success() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}?count=${{ steps.backup.outputs.rows }}" \ > /dev/null || true - name: Ping fail if: failure() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail" \ > /dev/null || true name: Nightly database backup on: schedule: - cron: '0 2 * * *' workflow_dispatch: jobs: backup: runs-on: ubuntu-latest timeout-minutes: 30 # hard limit — prevent hung jobs accumulating steps: - uses: actions/checkout@v4 - name: Ping -weight: 500;">start run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/-weight: 500;">start" \ > /dev/null || true # don't fail if monitoring is down - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Run backup id: backup run: | python scripts/backup.py echo "rows=$(cat /tmp/backup_row_count.txt)" >> $GITHUB_OUTPUT - name: Upload to S3 run: aws s3 cp /backups/latest.dump s3://my-backups/ - name: Ping done if: success() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}?count=${{ steps.backup.outputs.rows }}" \ > /dev/null || true - name: Ping fail if: failure() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail" \ > /dev/null || true name: Nightly database backup on: schedule: - cron: '0 2 * * *' workflow_dispatch: jobs: backup: runs-on: ubuntu-latest timeout-minutes: 30 # hard limit — prevent hung jobs accumulating steps: - uses: actions/checkout@v4 - name: Ping -weight: 500;">start run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/-weight: 500;">start" \ > /dev/null || true # don't fail if monitoring is down - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Run backup id: backup run: | python scripts/backup.py echo "rows=$(cat /tmp/backup_row_count.txt)" >> $GITHUB_OUTPUT - name: Upload to S3 run: aws s3 cp /backups/latest.dump s3://my-backups/ - name: Ping done if: success() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}?count=${{ steps.backup.outputs.rows }}" \ > /dev/null || true - name: Ping fail if: failure() run: | -weight: 500;">curl -fsS \ "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail" \ > /dev/null || true - timeout-minutes: 30 is a hard ceiling. Without it, a hung job can sit there for 6 hours consuming a runner. - || true on the monitoring pings means a DeadManCheck outage won't cause your backup job to report failed. - The row count flows from the backup step through $GITHUB_OUTPUT to the ping step. - The workflow runs end-to-end without errors - DeadManCheck shows a recent ping on your monitor dashboard - The count looks correct for what the job processed