$ ./bin/taskmaster config.yaml
Taskmaster> -weight: 500;">status
Task Status PID Uptime Restarts Command
nginx RUNNING 1234 2m15s 0 /usr/local/bin/nginx
worker_1 RUNNING 1235 2m15s 1 python3 worker.py
worker_2 STOPPED - - 0 python3 worker.py Taskmaster> -weight: 500;">start worker_2
Process 'worker_2' started with PID 1240 Taskmaster> logs worker_1 5
[2026-02-02 10:15:25] Processing task #1
[2026-02-02 10:15:26] Task completed
[2026-02-02 10:15:27] Waiting for tasks...
$ ./bin/taskmaster config.yaml
Taskmaster> -weight: 500;">status
Task Status PID Uptime Restarts Command
nginx RUNNING 1234 2m15s 0 /usr/local/bin/nginx
worker_1 RUNNING 1235 2m15s 1 python3 worker.py
worker_2 STOPPED - - 0 python3 worker.py Taskmaster> -weight: 500;">start worker_2
Process 'worker_2' started with PID 1240 Taskmaster> logs worker_1 5
[2026-02-02 10:15:25] Processing task #1
[2026-02-02 10:15:26] Task completed
[2026-02-02 10:15:27] Waiting for tasks...
$ ./bin/taskmaster config.yaml
Taskmaster> -weight: 500;">status
Task Status PID Uptime Restarts Command
nginx RUNNING 1234 2m15s 0 /usr/local/bin/nginx
worker_1 RUNNING 1235 2m15s 1 python3 worker.py
worker_2 STOPPED - - 0 python3 worker.py Taskmaster> -weight: 500;">start worker_2
Process 'worker_2' started with PID 1240 Taskmaster> logs worker_1 5
[2026-02-02 10:15:25] Processing task #1
[2026-02-02 10:15:26] Task completed
[2026-02-02 10:15:27] Waiting for tasks...
┌─────────────────────────────────────────────────────────────┐
│ Main Process │
│ ┌──────────────┐ ┌─────────────┐ ┌──────────────────┐ │
│ │ CLI Loop │ │ Config │ │ Signal Handler │ │
│ │ (readline) │ │ Parser │ │ (SIGHUP) │ │
│ └──────────────┘ └─────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘ │ ┌───────────┴───────────┐ │ Task Manager │ └───────────┬───────────┘ │ ┌───────────────────┼───────────────────┐ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │ Process │ │ Process │ │ Process │ │ Monitor │ ... │ Monitor │ │ Monitor │ │(gorout.)│ │(gorout.)│ │(gorout.)│ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │ Child │ │ Child │ │ Child │ │ Process │ │ Process │ │ Process │ └─────────┘ └─────────┘ └─────────┘
┌─────────────────────────────────────────────────────────────┐
│ Main Process │
│ ┌──────────────┐ ┌─────────────┐ ┌──────────────────┐ │
│ │ CLI Loop │ │ Config │ │ Signal Handler │ │
│ │ (readline) │ │ Parser │ │ (SIGHUP) │ │
│ └──────────────┘ └─────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘ │ ┌───────────┴───────────┐ │ Task Manager │ └───────────┬───────────┘ │ ┌───────────────────┼───────────────────┐ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │ Process │ │ Process │ │ Process │ │ Monitor │ ... │ Monitor │ │ Monitor │ │(gorout.)│ │(gorout.)│ │(gorout.)│ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │ Child │ │ Child │ │ Child │ │ Process │ │ Process │ │ Process │ └─────────┘ └─────────┘ └─────────┘
┌─────────────────────────────────────────────────────────────┐
│ Main Process │
│ ┌──────────────┐ ┌─────────────┐ ┌──────────────────┐ │
│ │ CLI Loop │ │ Config │ │ Signal Handler │ │
│ │ (readline) │ │ Parser │ │ (SIGHUP) │ │
│ └──────────────┘ └─────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘ │ ┌───────────┴───────────┐ │ Task Manager │ └───────────┬───────────┘ │ ┌───────────────────┼───────────────────┐ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │ Process │ │ Process │ │ Process │ │ Monitor │ ... │ Monitor │ │ Monitor │ │(gorout.)│ │(gorout.)│ │(gorout.)│ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │ Child │ │ Child │ │ Child │ │ Process │ │ Process │ │ Process │ └─────────┘ └─────────┘ └─────────┘
StartTaskManager
StartTaskManager
CLI ──── "-weight: 500;">stop nginx" ───► nginx's CmdChan ──► goroutine acts on it
CLI ──── "-weight: 500;">stop nginx" ───► nginx's CmdChan ──► goroutine acts on it
CLI ──── "-weight: 500;">stop nginx" ───► nginx's CmdChan ──► goroutine acts on it
[STOPPED] → -weight: 500;">start → [STARTED] → (after successfulStartTimeout) → [RUNNING] │ └─ unexpected exit ──► [FATAL] │ -weight: 500;">restart policy applies
[STOPPED] → -weight: 500;">start → [STARTED] → (after successfulStartTimeout) → [RUNNING] │ └─ unexpected exit ──► [FATAL] │ -weight: 500;">restart policy applies
[STOPPED] → -weight: 500;">start → [STARTED] → (after successfulStartTimeout) → [RUNNING] │ └─ unexpected exit ──► [FATAL] │ -weight: 500;">restart policy applies
successfulStartTimeout
tasks: web_server: command: "/usr/local/bin/nginx -g 'daemon off;'" instances: 1 autoLaunch: true -weight: 500;">restart: on-failure expectedExitCodes: [0] successfulStartTimeout: 3 restartsAttempts: 3 stopingSignal: SIGTERM gracefulStopTimeout: 10 stdout: /var/log/taskmaster/nginx.out.log stderr: /var/log/taskmaster/nginx.err.log environment: PORT: "8080" ENV: "production" workingDirectory: /var/www worker: command: "python3 worker.py" instances: 5 autoLaunch: true -weight: 500;">restart: always restartsAttempts: 5 gracefulStopTimeout: 15 stdout: /var/log/taskmaster/worker.out.log stderr: /var/log/taskmaster/worker.err.log
tasks: web_server: command: "/usr/local/bin/nginx -g 'daemon off;'" instances: 1 autoLaunch: true -weight: 500;">restart: on-failure expectedExitCodes: [0] successfulStartTimeout: 3 restartsAttempts: 3 stopingSignal: SIGTERM gracefulStopTimeout: 10 stdout: /var/log/taskmaster/nginx.out.log stderr: /var/log/taskmaster/nginx.err.log environment: PORT: "8080" ENV: "production" workingDirectory: /var/www worker: command: "python3 worker.py" instances: 5 autoLaunch: true -weight: 500;">restart: always restartsAttempts: 5 gracefulStopTimeout: 15 stdout: /var/log/taskmaster/worker.out.log stderr: /var/log/taskmaster/worker.err.log
tasks: web_server: command: "/usr/local/bin/nginx -g 'daemon off;'" instances: 1 autoLaunch: true -weight: 500;">restart: on-failure expectedExitCodes: [0] successfulStartTimeout: 3 restartsAttempts: 3 stopingSignal: SIGTERM gracefulStopTimeout: 10 stdout: /var/log/taskmaster/nginx.out.log stderr: /var/log/taskmaster/nginx.err.log environment: PORT: "8080" ENV: "production" workingDirectory: /var/www worker: command: "python3 worker.py" instances: 5 autoLaunch: true -weight: 500;">restart: always restartsAttempts: 5 gracefulStopTimeout: 15 stdout: /var/log/taskmaster/worker.out.log stderr: /var/log/taskmaster/worker.err.log
instances: 5
-weight: 500;">restart all
-weight: 500;">restart: on-failure
-weight: 500;">restart: always
expectedExitCodes
gracefulStopTimeout: 15
Taskmaster> reload
Configuration reloaded.
Taskmaster> reload
Configuration reloaded.
Taskmaster> reload
Configuration reloaded.
kill -HUP <pid>
sync.WaitGroup
// Simplified version of the shutdown flow
tasks.WaitGroup.Wait() // Block until all process monitors are done
os.Exit(0)
// Simplified version of the shutdown flow
tasks.WaitGroup.Wait() // Block until all process monitors are done
os.Exit(0)
// Simplified version of the shutdown flow
tasks.WaitGroup.Wait() // Block until all process monitors are done
os.Exit(0)
-weight: 500;">git clone https://github.com/UBA-code/taskmaster.-weight: 500;">git
cd taskmaster
make build
./bin/taskmaster # Generates an example config and starts the shell
-weight: 500;">git clone https://github.com/UBA-code/taskmaster.-weight: 500;">git
cd taskmaster
make build
./bin/taskmaster # Generates an example config and starts the shell
-weight: 500;">git clone https://github.com/UBA-code/taskmaster.-weight: 500;">git
cd taskmaster
make build
./bin/taskmaster # Generates an example config and starts the shell - Goroutines are lightweight (a few KB of stack vs. MB for threads) and can be spawned in the thousands without issue.
- Channels provide safe, structured communication between concurrent components — no mutexes, no shared state hell. - Listens for control commands over a buffered channel (CmdChan)
- Monitors the child process for exits and unexpected crashes
- Handles restarts based on the configured policy - Command Channels — carry control messages (-weight: 500;">start, -weight: 500;">stop, -weight: 500;">restart)
- Done Channels — signal that a child process has exited
- Timeout Channels — implement deadlines for startup grace periods and graceful shutdowns - STOPPED: Not running (intentionally or not yet started)
- STARTED: Running but in the startup grace period
- RUNNING: Confirmed healthy and past the startup timeout
- FATAL: Crashed and -weight: 500;">restart attempts exhausted - The daemon receiving SIGTERM (e.g., from the OS on shutdown)
- Propagating the right signal to each child process
- Waiting for all children to exit before the daemon itself exits