Tools: Update: The MASTERCLAW Architecture: Running 12 Autonomous Python Bots on One VPS

Tools: Update: The MASTERCLAW Architecture: Running 12 Autonomous Python Bots on One VPS

The MASTERCLAW Architecture: Running 12 Autonomous Python Bots on One VPS

The Problem: Python Processes are Fragile

The Four Layers of Resilience

Layer 1: tmux - The Interactive Cockpit

Layer 2: The Nanobot Gateway - One Script to Rule Them All

Layer 3: The Watchdog - Automated Self-Healing

Layer 4: systemd - The Unkillable Supervisor

The Full masterclaw.py Code

Performance and Memory Optimization

Want This Built for Your Business? My Hetzner VPS costs €6.18 a month. It has 2 vCPUs, 2GB of RAM, and 40GB of storage. Right now, as I write this, it's running 12 independent, autonomous Python bots 24/7. They're scraping data, managing social media accounts, publishing content, and monitoring financial markets. This isn't a toy project; it's a production system that has been running with 99%+ uptime for months. It's the engine behind my Content Publishing Bot and the core infrastructure for my Multi-Lane Autonomous Income System. The common way to do this is a mess. You ssh in, run nohup python bot1.py &, then nohup python bot2.py &, and so on. Your process list becomes a nightmare, logs are scattered, and when a bot inevitably crashes in the middle of the night, it stays dead until you manually intervene. There's a better way. I call it the MASTERCLAW architecture. It's a four-layer system designed for maximum resilience and manageability on a single, low-cost server. Let's break it down. Long-running Python scripts are inherently fragile. They can crash for a thousand reasons: a third-party API returns a 503 error, a web scrape target changes its HTML structure, a database connection flakes out, or you just have a plain old unhandled exception. When you're running one script, it's manageable. When you're running a dozen, the probability of one of them being dead at any given time approaches 1. The challenge isn't just running the scripts; it's ensuring they keep running, no matter what. This leads to a few key requirements for any serious multi-bot system: The MASTERCLAW architecture solves all five. The architecture is a layered approach, starting from a manual, interactive layer and building up to a fully automated, OS-integrated service. First, forget nohup and backgrounding with &. For development and manual debugging, tmux is your best friend. It's a terminal multiplexer that lets you create persistent sessions with multiple windows and panes. My entire bot system runs inside a single tmux session named bots. Why is this better? When I attach, I have a direct, interactive view of my entire system. I can see the live stdout of my main controller script, kill it with Ctrl+c, edit the code with vim, and restart it, all within one persistent SSH connection. It provides an "air traffic control" view that is indispensable for debugging. But tmux is just for manual control. It doesn't solve automatic restarts or boot persistence. It's the cockpit, not the autopilot. The core of the architecture is a single Python script I call masterclaw.py. This script has one job: to launch and manage all the other bots. I call the individual bots "nanobots" because they follow the single-responsibility principle. Each one is a small, simple script that does one thing well (e.g., twitter_bot.py, devto_publisher.py, price_scraper.py). The masterclaw.py script uses Python's subprocess module to spawn each nanobot as an independent child process. This provides crucial isolation. If twitter_bot.py has a memory leak and crashes, it doesn't touch the masterclaw.py process or any of its siblings. Here's a simplified look at the configuration and spawning logic: This gateway pattern centralizes the management of all bots into one place. Adding a new bot is as simple as adding a line to the BOTS dictionary. The gateway script is running, and it has spawned all the nanobots. But what happens when one crashes? This is where the self-healing watchdog loop comes in. The main masterclaw.py script enters an infinite loop after starting the bots. In this loop, it iterates through all the subprocesses it's managing and calls the .poll() method on each. When the watchdog finds a terminated process, it logs the event and immediately restarts that specific bot using the same command it used initially. This is the automatic self-healing mechanism. This simple loop turns a collection of fragile scripts into a resilient, self-healing system. We have a self-healing system, but it's all running inside a tmux session. If I accidentally kill that session, or more likely, if the server reboots for maintenance, the entire system goes down. The final layer of resilience is to hand over the management of the masterclaw.py script itself to the operating system's own process supervisor: systemd. systemd is the standard init system for most modern Linux distributions. We can write a simple service file that tells systemd how to start, stop, and manage our gateway script. Here's my actual masterclaw.service file, located at /etc/systemd/system/masterclaw.service: Let's break down the critical lines: Once this file is in place, you enable and start it with a few commands: Now, the masterclaw.py gateway is a true system service. It will start on boot and will be restarted if it ever dies. And since it is responsible for running the nanobots, the entire system is now fully persistent and self-healing. Here is a more complete, runnable version of the gateway and watchdog script. It combines the spawner and the self-healing loop. This script is robust. It logs everything, handles bot crashes, and can be shut down gracefully with Ctrl+c, which is important for closing file handles and database connections properly. "12 Python scripts? That must use a ton of RAM!" Not really. Here's a current snapshot from htop on my €6 VPS: The key to low resource usage is the "nanobot" philosophy. Each bot is a small, focused script that imports only the libraries it needs. A bot that just scrapes a website with requests and BeautifulSoup might only use 30-40MB of RAM. A more complex one using selenium might use 100MB. Because they are separate processes, the Python interpreter is loaded into memory for each one, which has some overhead. But this is a small price to pay for the incredible resilience and isolation it provides. On a 2GB RAM server, I could comfortably run 20-25 of these typical bots before memory becomes a concern. This architecture works because it embraces the Unix philosophy: build small, single-purpose tools and compose them into a larger system. The MASTERCLAW gateway is the composer, the nanobots are the instruments, and systemd is the concert hall that ensures the show always goes on. I build custom Python automation systems, trading bots, and AI-powered tools that run 24/7 in production. Currently available for consulting and contract work: DM me on dev.to or reach out on either platform. I respond within 24 hours. Need automation built? I build Python bots, Telegram systems, and trading automation. View my Fiverr gigs → — Starting at $75. Delivered in 24 hours. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# To create the session for the first time tmux new -s bots # To detach from the session (it keeps running in the background) Ctrl+b, then d # To re-attach to the session later from anywhere tmux attach -s bots # To create the session for the first time tmux new -s bots # To detach from the session (it keeps running in the background) Ctrl+b, then d # To re-attach to the session later from anywhere tmux attach -s bots # To create the session for the first time tmux new -s bots # To detach from the session (it keeps running in the background) Ctrl+b, then d # To re-attach to the session later from anywhere tmux attach -s bots import subprocess import time # Configuration of all bots to be managed # The key is a friendly name, the value is the command to run BOTS = { "content_publisher": ["python", "bots/content_publisher.py"], "social_media_manager": ["python", "bots/social_media_manager.py"], "data_scraper_A": ["python", "bots/data_scraper_a.py"], # ... add all 12+ bots here } # A dictionary to hold the running subprocess objects running_bots = {} def start_all_bots(): print("--- MASTERCLAW: Starting all nanobots ---") for bot_name, bot_command in BOTS.items(): print(f"Starting bot: {bot_name}...") # We redirect stdout and stderr to a log file for each bot log_file = open(f"logs/{bot_name}.log", "a") process = subprocess.Popen( bot_command, stdout=log_file, stderr=log_file ) running_bots[bot_name] = (process, bot_command, log_file) print("--- MASTERCLAW: All nanobots started ---") # --- Main execution --- if __name__ == "__main__": start_all_bots() # In the next step, we'll add the watchdog loop here while True: time.sleep(60) # Keep the main script alive import subprocess import time # Configuration of all bots to be managed # The key is a friendly name, the value is the command to run BOTS = { "content_publisher": ["python", "bots/content_publisher.py"], "social_media_manager": ["python", "bots/social_media_manager.py"], "data_scraper_A": ["python", "bots/data_scraper_a.py"], # ... add all 12+ bots here } # A dictionary to hold the running subprocess objects running_bots = {} def start_all_bots(): print("--- MASTERCLAW: Starting all nanobots ---") for bot_name, bot_command in BOTS.items(): print(f"Starting bot: {bot_name}...") # We redirect stdout and stderr to a log file for each bot log_file = open(f"logs/{bot_name}.log", "a") process = subprocess.Popen( bot_command, stdout=log_file, stderr=log_file ) running_bots[bot_name] = (process, bot_command, log_file) print("--- MASTERCLAW: All nanobots started ---") # --- Main execution --- if __name__ == "__main__": start_all_bots() # In the next step, we'll add the watchdog loop here while True: time.sleep(60) # Keep the main script alive import subprocess import time # Configuration of all bots to be managed # The key is a friendly name, the value is the command to run BOTS = { "content_publisher": ["python", "bots/content_publisher.py"], "social_media_manager": ["python", "bots/social_media_manager.py"], "data_scraper_A": ["python", "bots/data_scraper_a.py"], # ... add all 12+ bots here } # A dictionary to hold the running subprocess objects running_bots = {} def start_all_bots(): print("--- MASTERCLAW: Starting all nanobots ---") for bot_name, bot_command in BOTS.items(): print(f"Starting bot: {bot_name}...") # We redirect stdout and stderr to a log file for each bot log_file = open(f"logs/{bot_name}.log", "a") process = subprocess.Popen( bot_command, stdout=log_file, stderr=log_file ) running_bots[bot_name] = (process, bot_command, log_file) print("--- MASTERCLAW: All nanobots started ---") # --- Main execution --- if __name__ == "__main__": start_all_bots() # In the next step, we'll add the watchdog loop here while True: time.sleep(60) # Keep the main script alive [Unit] Description=Masterclaw Bot Management Service After=network.target [Service] User=your_user # IMPORTANT: run as a non-root user Group=your_group WorkingDirectory=/home/your_user/masterclaw_project ExecStart=/usr/bin/python3 /home/your_user/masterclaw_project/masterclaw.py Restart=always RestartSec=10 [Install] WantedBy=multi-user.target [Unit] Description=Masterclaw Bot Management Service After=network.target [Service] User=your_user # IMPORTANT: run as a non-root user Group=your_group WorkingDirectory=/home/your_user/masterclaw_project ExecStart=/usr/bin/python3 /home/your_user/masterclaw_project/masterclaw.py Restart=always RestartSec=10 [Install] WantedBy=multi-user.target [Unit] Description=Masterclaw Bot Management Service After=network.target [Service] User=your_user # IMPORTANT: run as a non-root user Group=your_group WorkingDirectory=/home/your_user/masterclaw_project ExecStart=/usr/bin/python3 /home/your_user/masterclaw_project/masterclaw.py Restart=always RestartSec=10 [Install] WantedBy=multi-user.target # Reload systemd to recognize the new -weight: 500;">service file -weight: 600;">sudo -weight: 500;">systemctl daemon-reload # Enable the -weight: 500;">service to -weight: 500;">start on boot -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable masterclaw.-weight: 500;">service # Start the -weight: 500;">service immediately -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start masterclaw.-weight: 500;">service # Check its -weight: 500;">status -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status masterclaw.-weight: 500;">service # Reload systemd to recognize the new -weight: 500;">service file -weight: 600;">sudo -weight: 500;">systemctl daemon-reload # Enable the -weight: 500;">service to -weight: 500;">start on boot -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable masterclaw.-weight: 500;">service # Start the -weight: 500;">service immediately -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start masterclaw.-weight: 500;">service # Check its -weight: 500;">status -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status masterclaw.-weight: 500;">service # Reload systemd to recognize the new -weight: 500;">service file -weight: 600;">sudo -weight: 500;">systemctl daemon-reload # Enable the -weight: 500;">service to -weight: 500;">start on boot -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">enable masterclaw.-weight: 500;">service # Start the -weight: 500;">service immediately -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">start masterclaw.-weight: 500;">service # Check its -weight: 500;">status -weight: 600;">sudo -weight: 500;">systemctl -weight: 500;">status masterclaw.-weight: 500;">service import subprocess import time import logging from datetime import datetime # --- Configuration --- LOG_FILE = "logs/masterclaw.log" BOT_LOG_DIR = "logs/bots" # Define all bots to be managed BOTS = { "publisher": ["python", "bots/publisher.py"], "scraper_A": ["python", "bots/scraper_a.py"], "scraper_B": ["python", "bots/scraper_b.py"], "social_poster": ["python", "bots/social.py"], # ... add as many as you need } # --- Logging Setup --- logging.basicConfig( level=logging.INFO, format="%(asctime)s - [%(levelname)s] - %(message)s", handlers=[ logging.FileHandler(LOG_FILE), logging.StreamHandler() # Also print to console ] ) # --- Core Logic --- class MasterClaw: def __init__(self): # { bot_name: (process, command, log_file_handle) } self.running_bots = {} def start_bot(self, name): """Starts a single, specified bot.""" if name in self.running_bots: logging.warning(f"Bot '{name}' is already running. Cannot -weight: 500;">start.") return if name not in BOTS: logging.error(f"Bot '{name}' not found in configuration.") return command = BOTS[name] try: log_path = f"{BOT_LOG_DIR}/{name}.log" log_file = open(log_path, "a") process = subprocess.Popen( command, stdout=log_file, stderr=subprocess.STDOUT, text=True ) self.running_bots[name] = (process, command, log_file) logging.info(f"Successfully started bot '{name}' with PID {process.pid}.") except Exception as e: logging.error(f"Failed to -weight: 500;">start bot '{name}': {e}") def start_all(self): """Initial -weight: 500;">start of all configured bots.""" logging.info("--- MASTERCLAW Initializing ---") for bot_name in BOTS: self.start_bot(bot_name) logging.info("--- All bots have been launched ---") def watchdog_loop(self): """The main self-healing loop.""" logging.info("--- Watchdog is now active ---") while True: time.sleep(15) # Check every 15 seconds for name in list(self.running_bots.keys()): process, command, log_file = self.running_bots[name] return_code = process.poll() if return_code is not None: # Process has terminated logging.warning(f"WATCHDOG: Bot '{name}' has terminated with code {return_code}.") # Clean up old resources log_file.close() del self.running_bots[name] # Restart the bot logging.info(f"WATCHDOG: Attempting to -weight: 500;">restart bot '{name}'...") self.start_bot(name) def shutdown(self): """Gracefully shut down all bot processes.""" logging.info("--- MASTERCLAW Shutting Down ---") for name, (process, _, log_file) in self.running_bots.items(): logging.info(f"Terminating bot '{name}' (PID: {process.pid})") process.terminate() # Send SIGTERM try: process.wait(timeout=10) # Wait up to 10 seconds except subprocess.TimeoutExpired: logging.warning(f"Bot '{name}' did not terminate gracefully. Sending SIGKILL.") process.kill() log_file.close() logging.info("--- All bots have been shut down ---") if __name__ == "__main__": claw = MasterClaw() try: claw.start_all() claw.watchdog_loop() except KeyboardInterrupt: print("\nKeyboard interrupt received.") finally: claw.shutdown() import subprocess import time import logging from datetime import datetime # --- Configuration --- LOG_FILE = "logs/masterclaw.log" BOT_LOG_DIR = "logs/bots" # Define all bots to be managed BOTS = { "publisher": ["python", "bots/publisher.py"], "scraper_A": ["python", "bots/scraper_a.py"], "scraper_B": ["python", "bots/scraper_b.py"], "social_poster": ["python", "bots/social.py"], # ... add as many as you need } # --- Logging Setup --- logging.basicConfig( level=logging.INFO, format="%(asctime)s - [%(levelname)s] - %(message)s", handlers=[ logging.FileHandler(LOG_FILE), logging.StreamHandler() # Also print to console ] ) # --- Core Logic --- class MasterClaw: def __init__(self): # { bot_name: (process, command, log_file_handle) } self.running_bots = {} def start_bot(self, name): """Starts a single, specified bot.""" if name in self.running_bots: logging.warning(f"Bot '{name}' is already running. Cannot -weight: 500;">start.") return if name not in BOTS: logging.error(f"Bot '{name}' not found in configuration.") return command = BOTS[name] try: log_path = f"{BOT_LOG_DIR}/{name}.log" log_file = open(log_path, "a") process = subprocess.Popen( command, stdout=log_file, stderr=subprocess.STDOUT, text=True ) self.running_bots[name] = (process, command, log_file) logging.info(f"Successfully started bot '{name}' with PID {process.pid}.") except Exception as e: logging.error(f"Failed to -weight: 500;">start bot '{name}': {e}") def start_all(self): """Initial -weight: 500;">start of all configured bots.""" logging.info("--- MASTERCLAW Initializing ---") for bot_name in BOTS: self.start_bot(bot_name) logging.info("--- All bots have been launched ---") def watchdog_loop(self): """The main self-healing loop.""" logging.info("--- Watchdog is now active ---") while True: time.sleep(15) # Check every 15 seconds for name in list(self.running_bots.keys()): process, command, log_file = self.running_bots[name] return_code = process.poll() if return_code is not None: # Process has terminated logging.warning(f"WATCHDOG: Bot '{name}' has terminated with code {return_code}.") # Clean up old resources log_file.close() del self.running_bots[name] # Restart the bot logging.info(f"WATCHDOG: Attempting to -weight: 500;">restart bot '{name}'...") self.start_bot(name) def shutdown(self): """Gracefully shut down all bot processes.""" logging.info("--- MASTERCLAW Shutting Down ---") for name, (process, _, log_file) in self.running_bots.items(): logging.info(f"Terminating bot '{name}' (PID: {process.pid})") process.terminate() # Send SIGTERM try: process.wait(timeout=10) # Wait up to 10 seconds except subprocess.TimeoutExpired: logging.warning(f"Bot '{name}' did not terminate gracefully. Sending SIGKILL.") process.kill() log_file.close() logging.info("--- All bots have been shut down ---") if __name__ == "__main__": claw = MasterClaw() try: claw.start_all() claw.watchdog_loop() except KeyboardInterrupt: print("\nKeyboard interrupt received.") finally: claw.shutdown() import subprocess import time import logging from datetime import datetime # --- Configuration --- LOG_FILE = "logs/masterclaw.log" BOT_LOG_DIR = "logs/bots" # Define all bots to be managed BOTS = { "publisher": ["python", "bots/publisher.py"], "scraper_A": ["python", "bots/scraper_a.py"], "scraper_B": ["python", "bots/scraper_b.py"], "social_poster": ["python", "bots/social.py"], # ... add as many as you need } # --- Logging Setup --- logging.basicConfig( level=logging.INFO, format="%(asctime)s - [%(levelname)s] - %(message)s", handlers=[ logging.FileHandler(LOG_FILE), logging.StreamHandler() # Also print to console ] ) # --- Core Logic --- class MasterClaw: def __init__(self): # { bot_name: (process, command, log_file_handle) } self.running_bots = {} def start_bot(self, name): """Starts a single, specified bot.""" if name in self.running_bots: logging.warning(f"Bot '{name}' is already running. Cannot -weight: 500;">start.") return if name not in BOTS: logging.error(f"Bot '{name}' not found in configuration.") return command = BOTS[name] try: log_path = f"{BOT_LOG_DIR}/{name}.log" log_file = open(log_path, "a") process = subprocess.Popen( command, stdout=log_file, stderr=subprocess.STDOUT, text=True ) self.running_bots[name] = (process, command, log_file) logging.info(f"Successfully started bot '{name}' with PID {process.pid}.") except Exception as e: logging.error(f"Failed to -weight: 500;">start bot '{name}': {e}") def start_all(self): """Initial -weight: 500;">start of all configured bots.""" logging.info("--- MASTERCLAW Initializing ---") for bot_name in BOTS: self.start_bot(bot_name) logging.info("--- All bots have been launched ---") def watchdog_loop(self): """The main self-healing loop.""" logging.info("--- Watchdog is now active ---") while True: time.sleep(15) # Check every 15 seconds for name in list(self.running_bots.keys()): process, command, log_file = self.running_bots[name] return_code = process.poll() if return_code is not None: # Process has terminated logging.warning(f"WATCHDOG: Bot '{name}' has terminated with code {return_code}.") # Clean up old resources log_file.close() del self.running_bots[name] # Restart the bot logging.info(f"WATCHDOG: Attempting to -weight: 500;">restart bot '{name}'...") self.start_bot(name) def shutdown(self): """Gracefully shut down all bot processes.""" logging.info("--- MASTERCLAW Shutting Down ---") for name, (process, _, log_file) in self.running_bots.items(): logging.info(f"Terminating bot '{name}' (PID: {process.pid})") process.terminate() # Send SIGTERM try: process.wait(timeout=10) # Wait up to 10 seconds except subprocess.TimeoutExpired: logging.warning(f"Bot '{name}' did not terminate gracefully. Sending SIGKILL.") process.kill() log_file.close() logging.info("--- All bots have been shut down ---") if __name__ == "__main__": claw = MasterClaw() try: claw.start_all() claw.watchdog_loop() except KeyboardInterrupt: print("\nKeyboard interrupt received.") finally: claw.shutdown() PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ COMMAND 1234 myuser 20 0 180.2M 85.1M 15.2M S 1.3 4.3 1h25:12 python3 masterclaw.py 5678 myuser 20 0 110.5M 60.3M 12.1M S 0.7 3.0 0:45.33 python3 bots/publisher.py 5680 myuser 20 0 95.7M 45.8M 10.9M S 0.0 2.3 0:22.11 python3 bots/scraper_a.py 5682 myuser 20 0 98.2M 42.1M 11.5M S 0.0 2.1 0:18.45 python3 bots/social.py 5684 myuser 20 0 89.9M 38.5M 9.8M S 0.0 1.9 0:15.78 python3 bots/scraper_b.py ... (8 more similar processes) ... PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ COMMAND 1234 myuser 20 0 180.2M 85.1M 15.2M S 1.3 4.3 1h25:12 python3 masterclaw.py 5678 myuser 20 0 110.5M 60.3M 12.1M S 0.7 3.0 0:45.33 python3 bots/publisher.py 5680 myuser 20 0 95.7M 45.8M 10.9M S 0.0 2.3 0:22.11 python3 bots/scraper_a.py 5682 myuser 20 0 98.2M 42.1M 11.5M S 0.0 2.1 0:18.45 python3 bots/social.py 5684 myuser 20 0 89.9M 38.5M 9.8M S 0.0 1.9 0:15.78 python3 bots/scraper_b.py ... (8 more similar processes) ... PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ COMMAND 1234 myuser 20 0 180.2M 85.1M 15.2M S 1.3 4.3 1h25:12 python3 masterclaw.py 5678 myuser 20 0 110.5M 60.3M 12.1M S 0.7 3.0 0:45.33 python3 bots/publisher.py 5680 myuser 20 0 95.7M 45.8M 10.9M S 0.0 2.3 0:22.11 python3 bots/scraper_a.py 5682 myuser 20 0 98.2M 42.1M 11.5M S 0.0 2.1 0:18.45 python3 bots/social.py 5684 myuser 20 0 89.9M 38.5M 9.8M S 0.0 1.9 0:15.78 python3 bots/scraper_b.py ... (8 more similar processes) ... - Process Supervision: Something needs to be watching the bots. - Automatic Restarts: If a bot dies, it must be brought back to life immediately. - Isolation: A crash in one bot should not affect any of the others. - Manageability: I need a simple way to view logs, see what's running, and manually -weight: 500;">start/-weight: 500;">stop/-weight: 500;">restart individual bots without bringing down the whole system. - Boot Persistence: The whole system must automatically -weight: 500;">start up if the server reboots. - If .poll() returns None, the process is still running happily. - If .poll() returns an integer (the exit code), the process has terminated. - WorkingDirectory: Sets the CWD so all relative paths in your script (like logs/ or bots/) work correctly. - ExecStart: The full path to the Python interpreter and your gateway script. - Restart=always: The magic directive. If the masterclaw.py script itself ever crashes for any reason, systemd will automatically -weight: 500;">restart it. - RestartSec=10: Wait 10 seconds before attempting a -weight: 500;">restart. - Hire me on Upwork — Python automation, API integrations, trading systems - Check my Fiverr gigs — Bot development, web scraping, data pipelines