Tools
Solved: I built an automated Talos + Proxmox + GitOps homelab starter (ArgoCD + Workflows + DR)
2026-01-01
0 views
admin
π Executive Summary ## π― Key Takeaways ## Symptoms: The Homelab Headache ## Solution 1: Proxmox + Talos for a Robust & Minimalist Infrastructure Base ## Proxmox VE: The Virtualization Workhorse ## Example: Automating VM Provisioning (Conceptual) ## Talos Linux: Kubernetes-Native OS ## Example: Generating Talos Configuration ## Solution 2: GitOps with ArgoCD for Automated Configuration Management ## GitOps Principles ## ArgoCD: The GitOps Controller ## Key Features: ## Example: Deploying an Application with ArgoCD ## Solution 3: Argo Workflows & Integrated DR for Operational Automation & Resilience ## Argo Workflows: The Workflow Engine ## Use Cases in a Homelab: ## Example: Conceptual Backup Workflow ## Integrated Disaster Recovery (DR) ## Conclusion TL;DR: This blog post solves the problem of manual, inconsistent, and fragile homelab setups by detailing an automated, resilient system. It integrates Talos Linux, Proxmox, and a GitOps approach using ArgoCD and Argo Workflows for infrastructure provisioning, application management, and strategic disaster recovery. Building a robust, automated homelab or small-scale IT environment presents unique challenges. This post details how integrating Talos Linux, Proxmox, and a GitOps approach with ArgoCD, Argo Workflows, and strategic Disaster Recovery (DR) can transform a manual, fragile setup into an resilient, self-healing system. Many IT professionals building or maintaining homelabs face a common set of frustrations that hinder scalability, reliability, and efficient management. These symptoms often stem from a lack of automation and a reactive approach to infrastructure. Provisioning new virtual machines on hypervisors like Proxmox, then manually installing an OS, configuring networking, and bootstrapping a Kubernetes cluster, is incredibly time-consuming and prone to human error. Each node becomes a snowflake, making consistency impossible. Environments quickly diverge from their intended state. Manual changes to VMs, Kubernetes manifests, or network configurations lead to inconsistencies across nodes, making troubleshooting difficult and deployments unreliable. The desired state is rarely codified and enforced. Deploying new applications, updating services, or even patching the underlying operating system often involves manual SSH sessions, script execution, or dashboard clicks. This process is slow, inefficient, and often leads to downtime or unexpected failures. Without a clear, automated DR plan, a single hardware failure or misconfiguration can lead to significant data loss or extended service outages. Manual backups are often outdated, and the recovery process is untested, complex, and time-consuming. While powerful, managing Kubernetes itself adds overhead. Keeping the control plane healthy, nodes updated, and applications resilient requires constant vigilance. Without automation, the operational complexity can quickly overwhelm a homelab enthusiast. The foundation of a reliable homelab begins with a solid, automated infrastructure layer. This solution combines Proxmox VE for virtualization with Talos Linux for a secure, minimal, and immutable Kubernetes operating system. Proxmox VE provides a powerful, open-source platform for managing virtual machines, containers, and storage. Its API-driven nature makes it an ideal candidate for infrastructure automation, allowing you to programmatically provision VMs rather than relying on manual GUI clicks. While full Terraform configurations are extensive, the principle involves using Proxmoxβs API or tools like qm to create VMs based on templates. Imagine a script that defines your Kubernetes nodes: Talos Linux is a secure, minimal, and immutable operating system designed specifically for running Kubernetes. It eliminates unnecessary components, reducing the attack surface and operational overhead. Its API-driven management model aligns perfectly with a GitOps approach. After provisioning your VMs, you generate a Talos configuration that bootstraps your Kubernetes cluster. This configuration defines your control plane and worker nodes, their IP addresses, and essential Kubernetes settings. You then provision this configuration to your VMs (e.g., via Cloud-Init or directly using talosctl). Once your infrastructure is provisioned, GitOps takes over to manage the desired state of your Kubernetes cluster and applications. ArgoCD serves as the engine, continuously synchronizing your cluster with configurations stored in a Git repository, ensuring consistency and preventing configuration drift. ArgoCD is a declarative, GitOps continuous delivery tool for Kubernetes. It automatically synchronizes the state of applications from a Git repository to a Kubernetes cluster. First, you install ArgoCD into your Talos Kubernetes cluster (e.g., via a Helm chart or direct manifest application). Then, you define an ArgoCD Application resource in your Git repository. This resource points ArgoCD to where your applicationβs Kubernetes manifests are stored. Once this Application manifest is committed to your Git repository and ArgoCD is configured to sync from that repository, ArgoCD will automatically deploy and manage the nginx-hello-world application in your cluster. Any changes to deployment.yaml or service.yaml in Git will be automatically applied by ArgoCD. Beyond declarative application deployments, operational tasks like automated backups, DR testing, and complex multi-step processes still require orchestration. Argo Workflows, combined with a robust DR strategy, ensures your homelab is not just automated but also resilient. Argo Workflows is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Itβs ideal for tasks that require sequential steps, conditional logic, or parallel execution, such as CI/CD pipelines, data processing, or, crucially, operational automation and DR. This workflow outlines a conceptual plan to back up both Proxmox VMs and Kubernetes applications. A true DR strategy for a GitOps-driven homelab integrates several components: By adopting a comprehensive strategy leveraging Proxmox for virtualization, Talos Linux for a minimalist Kubernetes OS, and a robust GitOps workflow driven by ArgoCD and Argo Workflows for automation and DR, you can transform your homelab. This approach minimizes manual intervention, ensures consistency, enhances security, and provides a clear, automated path to recovery from disaster. The initial investment in setting up these systems pays dividends in stability, scalability, and peace of mind, allowing you to focus on experimentation and innovation rather than constant firefighting. π Read the original article on TechResolve.blog Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
# Example: Basic VM creation using qm (simplified for illustration)
# This would typically be wrapped in a script or Terraform module
# with dynamic parameters. VMID="101"
VMNAME="talos-node-01"
MEM="4096" # 4GB RAM
CPUS="2"
DISK_SIZE="32G"
ISO_STORAGE="local:iso" # Storage for ISO images
OS_TYPE="l26" # Linux 2.6+ kernel
NET_BRIDGE="vmbr0" # Network bridge # Create the VM
qm create $VMID --name $VMNAME --memory $MEM --cores $CPUS --ostype $OS_TYPE # Add a storage device (e.g., raw disk from local storage)
# This example assumes a pre-existing storage pool named 'local-lvm'
qm set $VMID --scsihw virtio-scsi-pci --scsi0 local-lvm:$DISK_SIZE # Add a network device
qm set $VMID --net0 virtio,bridge=$NET_BRIDGE # Mount a Cloud-Init CD-ROM for initial configuration (crucial for automation)
# The Cloud-Init content would contain the Talos installer command
qm set $VMID --ide2 local:cloudinit --boot order=ide2 # Configure boot order to boot from the Cloud-Init ISO first, then disk
qm set $VMID --boot order="ide2;scsi0" # Start the VM (this is where Cloud-Init would kick in and install Talos)
qm start $VMID Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Example: Basic VM creation using qm (simplified for illustration)
# This would typically be wrapped in a script or Terraform module
# with dynamic parameters. VMID="101"
VMNAME="talos-node-01"
MEM="4096" # 4GB RAM
CPUS="2"
DISK_SIZE="32G"
ISO_STORAGE="local:iso" # Storage for ISO images
OS_TYPE="l26" # Linux 2.6+ kernel
NET_BRIDGE="vmbr0" # Network bridge # Create the VM
qm create $VMID --name $VMNAME --memory $MEM --cores $CPUS --ostype $OS_TYPE # Add a storage device (e.g., raw disk from local storage)
# This example assumes a pre-existing storage pool named 'local-lvm'
qm set $VMID --scsihw virtio-scsi-pci --scsi0 local-lvm:$DISK_SIZE # Add a network device
qm set $VMID --net0 virtio,bridge=$NET_BRIDGE # Mount a Cloud-Init CD-ROM for initial configuration (crucial for automation)
# The Cloud-Init content would contain the Talos installer command
qm set $VMID --ide2 local:cloudinit --boot order=ide2 # Configure boot order to boot from the Cloud-Init ISO first, then disk
qm set $VMID --boot order="ide2;scsi0" # Start the VM (this is where Cloud-Init would kick in and install Talos)
qm start $VMID COMMAND_BLOCK:
# Example: Basic VM creation using qm (simplified for illustration)
# This would typically be wrapped in a script or Terraform module
# with dynamic parameters. VMID="101"
VMNAME="talos-node-01"
MEM="4096" # 4GB RAM
CPUS="2"
DISK_SIZE="32G"
ISO_STORAGE="local:iso" # Storage for ISO images
OS_TYPE="l26" # Linux 2.6+ kernel
NET_BRIDGE="vmbr0" # Network bridge # Create the VM
qm create $VMID --name $VMNAME --memory $MEM --cores $CPUS --ostype $OS_TYPE # Add a storage device (e.g., raw disk from local storage)
# This example assumes a pre-existing storage pool named 'local-lvm'
qm set $VMID --scsihw virtio-scsi-pci --scsi0 local-lvm:$DISK_SIZE # Add a network device
qm set $VMID --net0 virtio,bridge=$NET_BRIDGE # Mount a Cloud-Init CD-ROM for initial configuration (crucial for automation)
# The Cloud-Init content would contain the Talos installer command
qm set $VMID --ide2 local:cloudinit --boot order=ide2 # Configure boot order to boot from the Cloud-Init ISO first, then disk
qm set $VMID --boot order="ide2;scsi0" # Start the VM (this is where Cloud-Init would kick in and install Talos)
qm start $VMID COMMAND_BLOCK:
# Generate an initial Talos configuration
# Replace IPs with your actual cluster IPs
# --with-kubespan enables encryption for inter-node communication
talosctl gen config my-talos-cluster https://192.168.1.10:6443 \ --control-plane 192.168.1.10,192.168.1.11,192.168.1.12 \ --workers 192.168.1.13,192.168.1.14 \ --output ./cluster-configs \ --with-kubespan # Example of applying the configuration (e.g., via Cloud-Init user data)
# The output `controlplane.yaml` and `worker.yaml` are the configurations
# that would be used to install Talos on the respective nodes.
# You might `base64` encode this content for Cloud-Init. # To install Talos after boot (e.g., from a live ISO or Cloud-Init)
# On a control plane node:
# talosctl apply-config --nodes 192.168.1.10 --file ./cluster-configs/controlplane.yaml --preserve-client-id --wait # On a worker node:
# talosctl apply-config --nodes 192.168.1.13 --file ./cluster-configs/worker.yaml --preserve-client-id --wait Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Generate an initial Talos configuration
# Replace IPs with your actual cluster IPs
# --with-kubespan enables encryption for inter-node communication
talosctl gen config my-talos-cluster https://192.168.1.10:6443 \ --control-plane 192.168.1.10,192.168.1.11,192.168.1.12 \ --workers 192.168.1.13,192.168.1.14 \ --output ./cluster-configs \ --with-kubespan # Example of applying the configuration (e.g., via Cloud-Init user data)
# The output `controlplane.yaml` and `worker.yaml` are the configurations
# that would be used to install Talos on the respective nodes.
# You might `base64` encode this content for Cloud-Init. # To install Talos after boot (e.g., from a live ISO or Cloud-Init)
# On a control plane node:
# talosctl apply-config --nodes 192.168.1.10 --file ./cluster-configs/controlplane.yaml --preserve-client-id --wait # On a worker node:
# talosctl apply-config --nodes 192.168.1.13 --file ./cluster-configs/worker.yaml --preserve-client-id --wait COMMAND_BLOCK:
# Generate an initial Talos configuration
# Replace IPs with your actual cluster IPs
# --with-kubespan enables encryption for inter-node communication
talosctl gen config my-talos-cluster https://192.168.1.10:6443 \ --control-plane 192.168.1.10,192.168.1.11,192.168.1.12 \ --workers 192.168.1.13,192.168.1.14 \ --output ./cluster-configs \ --with-kubespan # Example of applying the configuration (e.g., via Cloud-Init user data)
# The output `controlplane.yaml` and `worker.yaml` are the configurations
# that would be used to install Talos on the respective nodes.
# You might `base64` encode this content for Cloud-Init. # To install Talos after boot (e.g., from a live ISO or Cloud-Init)
# On a control plane node:
# talosctl apply-config --nodes 192.168.1.10 --file ./cluster-configs/controlplane.yaml --preserve-client-id --wait # On a worker node:
# talosctl apply-config --nodes 192.168.1.13 --file ./cluster-configs/worker.yaml --preserve-client-id --wait COMMAND_BLOCK:
# Example: Git repository structure
# my-homelab-gitops/
# βββ infrastructure/
# β βββ talos/
# β βββ cluster-config-patches/
# βββ applications/
# β βββ nginx-hello-world/
# β β βββ deployment.yaml
# β β βββ service.yaml
# β βββ argocd/
# β βββ application-nginx.yaml
# βββ argocd-apps/
# βββ homelab-infra.yaml
# βββ homelab-apps.yaml # applications/argocd/application-nginx.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata: name: nginx-hello-world namespace: argocd
spec: project: default source: repoURL: https://github.com/your-org/my-homelab-gitops.git # Your Git repository targetRevision: HEAD # Or a specific branch/tag like 'main' path: applications/nginx-hello-world # Path within the repo to the manifests destination: server: https://kubernetes.default.svc # The target cluster namespace: default # The target namespace for the application syncPolicy: automated: prune: true # Delete resources that are no longer in Git selfHeal: true # Revert any manual changes to match Git state syncOptions: - CreateNamespace=true # Automatically create the namespace if it doesn't exist Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Example: Git repository structure
# my-homelab-gitops/
# βββ infrastructure/
# β βββ talos/
# β βββ cluster-config-patches/
# βββ applications/
# β βββ nginx-hello-world/
# β β βββ deployment.yaml
# β β βββ service.yaml
# β βββ argocd/
# β βββ application-nginx.yaml
# βββ argocd-apps/
# βββ homelab-infra.yaml
# βββ homelab-apps.yaml # applications/argocd/application-nginx.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata: name: nginx-hello-world namespace: argocd
spec: project: default source: repoURL: https://github.com/your-org/my-homelab-gitops.git # Your Git repository targetRevision: HEAD # Or a specific branch/tag like 'main' path: applications/nginx-hello-world # Path within the repo to the manifests destination: server: https://kubernetes.default.svc # The target cluster namespace: default # The target namespace for the application syncPolicy: automated: prune: true # Delete resources that are no longer in Git selfHeal: true # Revert any manual changes to match Git state syncOptions: - CreateNamespace=true # Automatically create the namespace if it doesn't exist COMMAND_BLOCK:
# Example: Git repository structure
# my-homelab-gitops/
# βββ infrastructure/
# β βββ talos/
# β βββ cluster-config-patches/
# βββ applications/
# β βββ nginx-hello-world/
# β β βββ deployment.yaml
# β β βββ service.yaml
# β βββ argocd/
# β βββ application-nginx.yaml
# βββ argocd-apps/
# βββ homelab-infra.yaml
# βββ homelab-apps.yaml # applications/argocd/application-nginx.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata: name: nginx-hello-world namespace: argocd
spec: project: default source: repoURL: https://github.com/your-org/my-homelab-gitops.git # Your Git repository targetRevision: HEAD # Or a specific branch/tag like 'main' path: applications/nginx-hello-world # Path within the repo to the manifests destination: server: https://kubernetes.default.svc # The target cluster namespace: default # The target namespace for the application syncPolicy: automated: prune: true # Delete resources that are no longer in Git selfHeal: true # Revert any manual changes to match Git state syncOptions: - CreateNamespace=true # Automatically create the namespace if it doesn't exist COMMAND_BLOCK:
# Example: Argo Workflow for Homelab Backup Strategy
# This is a conceptual workflow, specific commands and client tools
# (e.g., proxmox-backup-client, velero CLI) would need to be in your container images.
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata: generateName: homelab-backup-
spec: entrypoint: backup-strategy templates: - name: backup-strategy dag: tasks: - name: backup-proxmox-vms template: backup-proxmox - name: backup-kubernetes-apps template: backup-velero dependencies: # Run K8s backup after Proxmox backup starts, or in parallel - backup-proxmox-vms - name: backup-proxmox container: image: your-custom-backup-image:latest # Image with Proxmox API client or PBS client command: ["/bin/sh", "-c"] args: - | echo "Starting Proxmox VM backups..." # Example: Triggering a backup via Proxmox API or proxmox-backup-client # You'd need credentials/API tokens mounted as secrets proxmox-backup-client backup --vm 101 --repository my-pbs-repo proxmox-backup-client backup --vm 102 --repository my-pbs-repo echo "Proxmox VM backups complete." - name: backup-velero container: image: velero/velero:latest # Official Velero image command: ["/bin/sh", "-c"] args: - | echo "Starting Kubernetes application backups with Velero..." # Assumes Velero is already installed in the cluster and configured with a backup location velero backup create k8s-apps-$(date +%Y%m%d%H%M%S) --include-namespaces '*' --default-volumes-to-restic echo "Kubernetes application backups complete." Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Example: Argo Workflow for Homelab Backup Strategy
# This is a conceptual workflow, specific commands and client tools
# (e.g., proxmox-backup-client, velero CLI) would need to be in your container images.
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata: generateName: homelab-backup-
spec: entrypoint: backup-strategy templates: - name: backup-strategy dag: tasks: - name: backup-proxmox-vms template: backup-proxmox - name: backup-kubernetes-apps template: backup-velero dependencies: # Run K8s backup after Proxmox backup starts, or in parallel - backup-proxmox-vms - name: backup-proxmox container: image: your-custom-backup-image:latest # Image with Proxmox API client or PBS client command: ["/bin/sh", "-c"] args: - | echo "Starting Proxmox VM backups..." # Example: Triggering a backup via Proxmox API or proxmox-backup-client # You'd need credentials/API tokens mounted as secrets proxmox-backup-client backup --vm 101 --repository my-pbs-repo proxmox-backup-client backup --vm 102 --repository my-pbs-repo echo "Proxmox VM backups complete." - name: backup-velero container: image: velero/velero:latest # Official Velero image command: ["/bin/sh", "-c"] args: - | echo "Starting Kubernetes application backups with Velero..." # Assumes Velero is already installed in the cluster and configured with a backup location velero backup create k8s-apps-$(date +%Y%m%d%H%M%S) --include-namespaces '*' --default-volumes-to-restic echo "Kubernetes application backups complete." COMMAND_BLOCK:
# Example: Argo Workflow for Homelab Backup Strategy
# This is a conceptual workflow, specific commands and client tools
# (e.g., proxmox-backup-client, velero CLI) would need to be in your container images.
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata: generateName: homelab-backup-
spec: entrypoint: backup-strategy templates: - name: backup-strategy dag: tasks: - name: backup-proxmox-vms template: backup-proxmox - name: backup-kubernetes-apps template: backup-velero dependencies: # Run K8s backup after Proxmox backup starts, or in parallel - backup-proxmox-vms - name: backup-proxmox container: image: your-custom-backup-image:latest # Image with Proxmox API client or PBS client command: ["/bin/sh", "-c"] args: - | echo "Starting Proxmox VM backups..." # Example: Triggering a backup via Proxmox API or proxmox-backup-client # You'd need credentials/API tokens mounted as secrets proxmox-backup-client backup --vm 101 --repository my-pbs-repo proxmox-backup-client backup --vm 102 --repository my-pbs-repo echo "Proxmox VM backups complete." - name: backup-velero container: image: velero/velero:latest # Official Velero image command: ["/bin/sh", "-c"] args: - | echo "Starting Kubernetes application backups with Velero..." # Assumes Velero is already installed in the cluster and configured with a backup location velero backup create k8s-apps-$(date +%Y%m%d%H%M%S) --include-namespaces '*' --default-volumes-to-restic echo "Kubernetes application backups complete." - Proxmox VE combined with Talos Linux forms a robust, API-driven infrastructure base for automated VM provisioning and a secure, immutable Kubernetes operating system.
- ArgoCD implements a GitOps strategy, ensuring continuous synchronization of Kubernetes cluster configurations and applications from a Git repository, preventing configuration drift and enabling automated deployments.
- Argo Workflows orchestrates complex operational tasks like automated backups (Proxmox VMs via PBS, Kubernetes apps via Velero) and disaster recovery testing, significantly enhancing homelab resilience and recovery capabilities. - ### Manual VM and Kubernetes Provisioning - ### Configuration Drift and Inconsistency - ### Lack of Automated Deployments and Updates - ### Fragile Disaster Recovery (DR) Strategy - ### Operational Burden of Kubernetes - **Minimal Footprint:** No shell, no package manager, no unnecessary services.
- **Immutability:** The OS never drifts; all changes are applied via atomic updates.
- **API-Driven:** All configuration and operations are performed via a gRPC API, making it ideal for automation.
- **Enhanced Security:** Reduced attack surface and cryptographic integrity checks. - **Declarative:** The desired state of your infrastructure and applications is declared in Git (e.g., YAML manifests).
- **Version Controlled:** All changes are committed to Git, providing an auditable history and easy rollbacks.
- **Automated:** Changes in Git automatically trigger updates in the cluster.
- **Reconciled:** A controller continuously observes the clusterβs actual state and reconciles it with the desired state in Git. - **Automated Sync:** Keeps your clusterβs applications in sync with your Git repo.
- **Rollback/Roll-forward:** Easy to revert to previous states or deploy new versions.
- **Health Monitoring:** Provides visibility into the health of your deployed applications.
- **Multi-cluster Support:** Manage applications across multiple Kubernetes clusters. - **Automated Backups:** Triggering Proxmox VM backups, Kubernetes application backups (Velero).
- **DR Testing:** Periodically spinning up a test environment, restoring backups, and validating service functionality.
- **Infrastructure Provisioning:** Orchestrating the creation of new Talos nodes on Proxmox.
- **Application Release Pipelines:** Orchestrating complex deployments that involve pre-hooks, post-hooks, and external integrations. - **Infrastructure as Code:** Your entire Proxmox + Talos setup is defined in Git. In a disaster, you can rebuild your hypervisor and then redeploy Talos nodes from scratch.
- **ArgoCD for Applications:** ArgoCD ensures all your Kubernetes applications can be quickly restored by syncing their desired state from Git to a new or recovered cluster.
- **Proxmox Backup Server (PBS):** For hypervisor-level VM backups. Critical for stateful applications running directly on VMs or for restoring the base OS of Talos nodes if not using full infrastructure as code for OS deployment.
- **Velero:** For Kubernetes-native application backups, including persistent volumes and Kubernetes resource manifests.
- **Argo Workflows for Orchestration:** Automating the recovery process, from provisioning VMs to restoring backups and verifying service health.
how-totutorialguidedev.toaimllinuxkernelservershellnetworknetworkingnginxnodekubernetes