Tools

Tools: Your KServe InferenceService Won't Become Ready: Four Production Failures and Fixes Why

2026-03-30 0 views admin

What I Was Building

Failure 1: KServe InferenceService Stuck Not Ready — Istio vs Kourier Ingress Mismatch Causes ReconcileError Loop

Symptom

Digging In

Root Cause: Default KServe Ingress Mode Assumes Istio

The Fix: ConfigMap Patch in serving-stack

Failure 2: ArgoCD Serving-Stack Sync Fails — Duplicate Knative CRD Exceeds 256 KB Annotation Size Limit

Symptom

Root Cause

Failure 3: kube-rbac-proxy ImagePullBackOff Blocks KServe Admission Webhook — gcr.io Access Restriction

Symptom

Root Cause

Fix: Remove the Sidecar via Kustomize Strategic Merge Patch

Failure 4: Inference Request Returns HTTP 405 — IngressDomain Placeholder Resolves to Public Internet

Symptom

Root Cause

Fix: Direct Predictor Pod Port-Forward

What This Proves After the Failures

What This Setup Does NOT Solve (Known Tradeoffs)

Debugging Commands Reference

1 — InferenceService Conditions

2 — Webhook Endpoint Availability

3 — ConfigMap and Pod Status

The One Thing to Remember

See Also A practitioner's account of the errors the KServe getting-started documentation doesn't tell you about — with exact terminal output, root causes, and working Kustomize patches. This article documents four production failures I encountered while deploying KServe on a local k3d cluster as part of building NeuroScale — a self-service AI inference platform. None of these failures appear in the official KServe getting-started documentation. If you are deploying KServe without Istio, this will save you several hours of debugging. NeuroScale is a self-service AI inference platform on Kubernetes. The goal was simple: one InferenceService named sklearn-iris reaches Ready=True and responds to a prediction request. The install had to be GitOps-managed via ArgoCD — not "I ran some scripts." Getting there took two days and four distinct failures. Here is every one of them. Stack: k3d (local Kubernetes) · KServe 0.12.1 · ArgoCD · Kourier (no Istio) · Knative Serving 📝 Author's Note: This article was originally documented in the NeuroScale platform repository.

File: docs/REALITY_CHECK_MILESTONE_2_KSERVE_SERVING.md

Repo: github.com/sodiq-code/neuroscale-platform After applying the KServe installation via ArgoCD (serving-stack app), the InferenceService was created but never became Ready: The error referenced a virtual service — that is an Istio concept. But we were running Kourier. The KServe controller was attempting to create an Istio VirtualService in a cluster that had no Istio control plane. KServe's default inferenceservice-config ConfigMap expects Istio as the ingress provider. It sets ingressClassName: istio and the key disableIstioVirtualHost defaults to false. When Istio is absent, the controller enters an error loop trying to create resources that will never exist. Setting disableIstioVirtualHost: true tells KServe to skip Istio and fall back to Knative route objects that Kourier can handle. Why Kourier instead of Istio: Istio adds ~1 GB of memory overhead. On a local k3d cluster shared with Docker Desktop, Backstage, and the KServe controller, that exhausts available RAM. Kourier's entire footprint is under 200 MB. After this patch was applied and the KServe controller restarted: Business impact: This failure cost approximately 3 hours. The KServe documentation does not prominently state that the default configuration requires Istio. The error message "virtual service not found" is Istio-specific vocabulary that only makes sense if you already know Istio is the default — a classic undocumented assumption in infrastructure tooling. Time lost: ~30 minutes ArgoCD stores kubectl.kubernetes.io/last-applied-configuration in the annotation. For large CRDs, this annotation plus the apply payload exceeds Kubernetes' 256 KB annotation size limit. The Knative CRD is approximately 400 KB as a YAML object. A rendering overlap compounded the issue: the kserve.yaml bundle already includes its own version of the Knative Serving CRDs, and we were also referencing serving-core.yaml directly. This created two attempts to manage the same CRDs, causing comparison instability. Business impact: ArgoCD's error says "Too long" but does not tell you which annotation or why it got too long. Debugging requires knowing ArgoCD's internal server-side apply mechanism. Time lost: ~1 hour | Cluster-wide impact KServe 0.12.1's kserve-controller-manager Deployment includes a kube-rbac-proxy sidecar from gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1. Google Container Registry restricted access to kubebuilder images in late 2025. The manager container itself was healthy (1 of 2 ready). But without the sidecar, the webhook server certificate was not being served, so the admission webhook had no healthy endpoints. The alternative registry.k8s.io/kube-rbac-proxy:v0.13.1 did not exist at the new location either. After this patch and a re-sync: Known tradeoff: Removing kube-rbac-proxy disables the Prometheus metrics proxy endpoint for the KServe controller. In production, source a verified replacement image from an accessible registry before deploying. Business impact: An external registry access change cascaded into a complete admission webhook outage. Any InferenceService creation or update was blocked cluster-wide while the sidecar was failing. This class of failure has no good solution without upstream monitoring of your image dependencies. The ingressDomain in the KServe ConfigMap was set to example.com — a literal placeholder. The generated URL resolves publicly to Cloudflare/IANA servers, not the local cluster. Additionally, Kourier routes by Host header, not by IP. Just port-forwarding Kourier and hitting 127.0.0.1 does not work without the correct Host header. Bypass Knative routing and Kourier entirely for local verification: For the full Kourier routing path, always pass the Host header: Business impact: False-negative inference verification. A healthy endpoint looked broken because the test URL resolved to the wrong server. Always verify the complete network path — DNS resolution, ingress routing, pod health — as separate steps rather than assuming a single curl test is conclusive. After working through the above failures, the inference baseline worked: The Istio/Kourier mismatch is the canonical example of why "default configuration" is dangerous in complex systems. KServe's default assumes a specific network topology that is not disclosed in the getting-started docs. Recognizing this class of failure — configuration that works in the tool author's environment but not yours — is a senior platform engineering competency. Run these in order when an InferenceService will not become Ready. KServe's default configuration assumes Istio is installed. This assumption is not prominently stated in the getting-started documentation. Every engineer running KServe on k3d, k3s, GKE Autopilot, or any non-Istio cluster will hit ReconcileError and see error messages referencing "virtual services" — an Istio concept — with no obvious resolution path. The fix is one ConfigMap patch. It takes 30 seconds to apply. Finding it took three hours. The kube-rbac-proxy 403 from gcr.io is an external dependency failure that silently kills your admission webhook cluster-wide. The $patch: delete Kustomize strategy is the fastest recovery path when no alternative registry image is available. Full platform source — all six Reality Check documents, Backstage Golden Path, Kyverno policy enforcement, cost attribution, and a Cloud Promotion Guide to EKS/GKE: Check out the full NeuroScale repo here. Jimoh Sodiq Bolaji | Platform Engineer | Technical Content Engineer | Abuja, Nigeria | NeuroScale Platform Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY PREV LATEST AGE sklearn-iris False 100 8m # READY=False with no URL = KServe controller did not complete ingress setup. # No Knative Route was created. No external URL was assigned. $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY PREV LATEST AGE sklearn-iris False 100 8m # READY=False with no URL = KServe controller did not complete ingress setup. # No Knative Route was created. No external URL was assigned. $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY PREV LATEST AGE sklearn-iris False 100 8m # READY=False with no URL = KServe controller did not complete ingress setup. # No Knative Route was created. No external URL was assigned. $ -weight: 500;">kubectl -n default describe inferenceservice sklearn-iris ... Status: Conditions: Message: Failed to reconcile ingress Reason: ReconcileError Status: False Type: IngressReady $ -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager --tail=50 ... ERROR controller.inferenceservice Failed to reconcile ingress {"error": "virtual -weight: 500;">service not found: sklearn-iris.default.svc.cluster.local"} $ -weight: 500;">kubectl -n default describe inferenceservice sklearn-iris ... Status: Conditions: Message: Failed to reconcile ingress Reason: ReconcileError Status: False Type: IngressReady $ -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager --tail=50 ... ERROR controller.inferenceservice Failed to reconcile ingress {"error": "virtual -weight: 500;">service not found: sklearn-iris.default.svc.cluster.local"} $ -weight: 500;">kubectl -n default describe inferenceservice sklearn-iris ... Status: Conditions: Message: Failed to reconcile ingress Reason: ReconcileError Status: False Type: IngressReady $ -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager --tail=50 ... ERROR controller.inferenceservice Failed to reconcile ingress {"error": "virtual -weight: 500;">service not found: sklearn-iris.default.svc.cluster.local"} # infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml apiVersion: v1 kind: ConfigMap metadata: name: inferenceservice-config namespace: kserve data: ingress: |- { "ingressGateway": "knative-serving/knative-ingress-gateway", "ingressDomain": "example.com", "ingressClassName": "istio", "urlScheme": "http", "disableIstioVirtualHost": true, "disableIngressCreation": false } # infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml apiVersion: v1 kind: ConfigMap metadata: name: inferenceservice-config namespace: kserve data: ingress: |- { "ingressGateway": "knative-serving/knative-ingress-gateway", "ingressDomain": "example.com", "ingressClassName": "istio", "urlScheme": "http", "disableIstioVirtualHost": true, "disableIngressCreation": false } # infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml apiVersion: v1 kind: ConfigMap metadata: name: inferenceservice-config namespace: kserve data: ingress: |- { "ingressGateway": "knative-serving/knative-ingress-gateway", "ingressDomain": "example.com", "ingressClassName": "istio", "urlScheme": "http", "disableIstioVirtualHost": true, "disableIngressCreation": false } $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY AGE sklearn-iris http://sklearn-iris.default.example.com True 2m $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY AGE sklearn-iris http://sklearn-iris.default.example.com True 2m $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY AGE sklearn-iris http://sklearn-iris.default.example.com True 2m $ -weight: 500;">kubectl -n argocd get application serving-stack NAME SYNC STATUS HEALTH STATUS serving-stack OutOfSync Degraded $ -weight: 500;">kubectl -n argocd describe application serving-stack ... Message: CustomResourceDefinition "services.serving.knative.dev" is invalid: metadata.annotations: Too long: may not be more than 262144 bytes $ -weight: 500;">kubectl -n argocd get application serving-stack NAME SYNC STATUS HEALTH STATUS serving-stack OutOfSync Degraded $ -weight: 500;">kubectl -n argocd describe application serving-stack ... Message: CustomResourceDefinition "services.serving.knative.dev" is invalid: metadata.annotations: Too long: may not be more than 262144 bytes $ -weight: 500;">kubectl -n argocd get application serving-stack NAME SYNC STATUS HEALTH STATUS serving-stack OutOfSync Degraded $ -weight: 500;">kubectl -n argocd describe application serving-stack ... Message: CustomResourceDefinition "services.serving.knative.dev" is invalid: metadata.annotations: Too long: may not be more than 262144 bytes # infrastructure/serving-stack/kustomization.yaml # 1. Use server-side apply to bypass the annotation size limit commonAnnotations: argocd.argoproj.io/sync-options: ServerSideApply=true # 2. Ignore runtime-mutated fields on Knative CRDs # (In ArgoCD Application spec) ignoreDifferences: - group: apiextensions.k8s.io kind: CustomResourceDefinition name: services.serving.knative.dev jsonPointers: - /spec/preserveUnknownFields # infrastructure/serving-stack/kustomization.yaml # 1. Use server-side apply to bypass the annotation size limit commonAnnotations: argocd.argoproj.io/sync-options: ServerSideApply=true # 2. Ignore runtime-mutated fields on Knative CRDs # (In ArgoCD Application spec) ignoreDifferences: - group: apiextensions.k8s.io kind: CustomResourceDefinition name: services.serving.knative.dev jsonPointers: - /spec/preserveUnknownFields # infrastructure/serving-stack/kustomization.yaml # 1. Use server-side apply to bypass the annotation size limit commonAnnotations: argocd.argoproj.io/sync-options: ServerSideApply=true # 2. Ignore runtime-mutated fields on Knative CRDs # (In ArgoCD Application spec) ignoreDifferences: - group: apiextensions.k8s.io kind: CustomResourceDefinition name: services.serving.knative.dev jsonPointers: - /spec/preserveUnknownFields $ -weight: 500;">kubectl -n argocd describe application ai-model-alpha ... Message: admission webhook "inferenceservice.kserve-webhook-server.validator.webhook" denied the request: no endpoints available for -weight: 500;">service "kserve-webhook-server--weight: 500;">service" $ -weight: 500;">kubectl -n kserve get pods NAME READY STATUS kserve-controller-manager-xxx 1/2 Running # only 1 of 2 ready $ -weight: 500;">kubectl -n kserve describe pod kserve-controller-manager-xxx kube-rbac-proxy: State: Waiting Reason: ImagePullBackOff Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1 Events: Warning Failed kubelet Failed to pull image: unexpected -weight: 500;">status code 403 Forbidden $ -weight: 500;">kubectl -n argocd describe application ai-model-alpha ... Message: admission webhook "inferenceservice.kserve-webhook-server.validator.webhook" denied the request: no endpoints available for -weight: 500;">service "kserve-webhook-server--weight: 500;">service" $ -weight: 500;">kubectl -n kserve get pods NAME READY STATUS kserve-controller-manager-xxx 1/2 Running # only 1 of 2 ready $ -weight: 500;">kubectl -n kserve describe pod kserve-controller-manager-xxx kube-rbac-proxy: State: Waiting Reason: ImagePullBackOff Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1 Events: Warning Failed kubelet Failed to pull image: unexpected -weight: 500;">status code 403 Forbidden $ -weight: 500;">kubectl -n argocd describe application ai-model-alpha ... Message: admission webhook "inferenceservice.kserve-webhook-server.validator.webhook" denied the request: no endpoints available for -weight: 500;">service "kserve-webhook-server--weight: 500;">service" $ -weight: 500;">kubectl -n kserve get pods NAME READY STATUS kserve-controller-manager-xxx 1/2 Running # only 1 of 2 ready $ -weight: 500;">kubectl -n kserve describe pod kserve-controller-manager-xxx kube-rbac-proxy: State: Waiting Reason: ImagePullBackOff Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1 Events: Warning Failed kubelet Failed to pull image: unexpected -weight: 500;">status code 403 Forbidden # infrastructure/serving-stack/patches/ # kserve-controller-kube-rbac-proxy-image.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kserve-controller-manager namespace: kserve spec: template: spec: containers: - name: kube-rbac-proxy $patch: delete # infrastructure/serving-stack/patches/ # kserve-controller-kube-rbac-proxy-image.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kserve-controller-manager namespace: kserve spec: template: spec: containers: - name: kube-rbac-proxy $patch: delete # infrastructure/serving-stack/patches/ # kserve-controller-kube-rbac-proxy-image.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kserve-controller-manager namespace: kserve spec: template: spec: containers: - name: kube-rbac-proxy $patch: delete $ -weight: 500;">kubectl -n kserve get pods NAME READY STATUS kserve-controller-manager-yyy 1/1 Running # fixed $ -weight: 500;">kubectl -n kserve get endpoints kserve-webhook-server--weight: 500;">service NAME ENDPOINTS AGE kserve-webhook-server--weight: 500;">service 10.42.0.23:9443 45s $ -weight: 500;">kubectl -n kserve get pods NAME READY STATUS kserve-controller-manager-yyy 1/1 Running # fixed $ -weight: 500;">kubectl -n kserve get endpoints kserve-webhook-server--weight: 500;">service NAME ENDPOINTS AGE kserve-webhook-server--weight: 500;">service 10.42.0.23:9443 45s $ -weight: 500;">kubectl -n kserve get pods NAME READY STATUS kserve-controller-manager-yyy 1/1 Running # fixed $ -weight: 500;">kubectl -n kserve get endpoints kserve-webhook-server--weight: 500;">service NAME ENDPOINTS AGE kserve-webhook-server--weight: 500;">service 10.42.0.23:9443 45s $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris \ -o jsonpath='{.-weight: 500;">status.url}' http://sklearn-iris.default.example.com $ -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \ http://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict <html><head><title>405 Not Allowed</title></head>... # The request hit the public example.com server, not our Kourier gateway. $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris \ -o jsonpath='{.-weight: 500;">status.url}' http://sklearn-iris.default.example.com $ -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \ http://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict <html><head><title>405 Not Allowed</title></head>... # The request hit the public example.com server, not our Kourier gateway. $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris \ -o jsonpath='{.-weight: 500;">status.url}' http://sklearn-iris.default.example.com $ -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \ http://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict <html><head><title>405 Not Allowed</title></head>... # The request hit the public example.com server, not our Kourier gateway. # Step 1: Get the predictor pod name -weight: 500;">kubectl -n default get pods \ -l serving.knative.dev/revision=sklearn-iris-predictor-00001 # Step 2: Port-forward directly to the predictor container -weight: 500;">kubectl -n default port-forward \ pod/sklearn-iris-predictor-00001-deployment-<hash> 18080:8080 # Step 3: Predict (no Host header, no Kourier, no DNS needed) -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict {"predictions":[0,2]} # Step 1: Get the predictor pod name -weight: 500;">kubectl -n default get pods \ -l serving.knative.dev/revision=sklearn-iris-predictor-00001 # Step 2: Port-forward directly to the predictor container -weight: 500;">kubectl -n default port-forward \ pod/sklearn-iris-predictor-00001-deployment-<hash> 18080:8080 # Step 3: Predict (no Host header, no Kourier, no DNS needed) -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict {"predictions":[0,2]} # Step 1: Get the predictor pod name -weight: 500;">kubectl -n default get pods \ -l serving.knative.dev/revision=sklearn-iris-predictor-00001 # Step 2: Port-forward directly to the predictor container -weight: 500;">kubectl -n default port-forward \ pod/sklearn-iris-predictor-00001-deployment-<hash> 18080:8080 # Step 3: Predict (no Host header, no Kourier, no DNS needed) -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict {"predictions":[0,2]} -weight: 500;">kubectl -n kourier-system port-forward svc/kourier 18080:80 -weight: 500;">curl -sS \ -H 'Host: sklearn-iris-predictor.default.127.0.0.1.sslip.io' \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict -weight: 500;">kubectl -n kourier-system port-forward svc/kourier 18080:80 -weight: 500;">curl -sS \ -H 'Host: sklearn-iris-predictor.default.127.0.0.1.sslip.io' \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict -weight: 500;">kubectl -n kourier-system port-forward svc/kourier 18080:80 -weight: 500;">curl -sS \ -H 'Host: sklearn-iris-predictor.default.127.0.0.1.sslip.io' \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY AGE sklearn-iris http://sklearn-iris.default.example.com True 45m $ -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict {"predictions":[0,2]} $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY AGE sklearn-iris http://sklearn-iris.default.example.com True 45m $ -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict {"predictions":[0,2]} $ -weight: 500;">kubectl -n default get inferenceservice sklearn-iris NAME URL READY AGE sklearn-iris http://sklearn-iris.default.example.com True 45m $ -weight: 500;">curl -sS \ -H "Content-Type: application/json" \ -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \ http://127.0.0.1:18080/v1/models/sklearn-iris:predict {"predictions":[0,2]} -weight: 500;">kubectl -n default describe inferenceservice sklearn-iris -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager --tail=50 -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager -c manager --tail=50 -weight: 500;">kubectl -n default describe inferenceservice sklearn-iris -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager --tail=50 -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager -c manager --tail=50 -weight: 500;">kubectl -n default describe inferenceservice sklearn-iris -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager --tail=50 -weight: 500;">kubectl -n kserve logs deploy/kserve-controller-manager -c manager --tail=50 -weight: 500;">kubectl -n kserve get endpoints kserve-webhook-server--weight: 500;">service -weight: 500;">kubectl -n kserve describe endpoints kserve-webhook-server--weight: 500;">service -weight: 500;">kubectl -n default get ksvc -weight: 500;">kubectl -n default get route -weight: 500;">kubectl -n kserve get endpoints kserve-webhook-server--weight: 500;">service -weight: 500;">kubectl -n kserve describe endpoints kserve-webhook-server--weight: 500;">service -weight: 500;">kubectl -n default get ksvc -weight: 500;">kubectl -n default get route -weight: 500;">kubectl -n kserve get endpoints kserve-webhook-server--weight: 500;">service -weight: 500;">kubectl -n kserve describe endpoints kserve-webhook-server--weight: 500;">service -weight: 500;">kubectl -n default get ksvc -weight: 500;">kubectl -n default get route -weight: 500;">kubectl -n kserve get configmap inferenceservice-config -o yaml -weight: 500;">kubectl -n kserve get pods -o wide -weight: 500;">kubectl -n kserve describe pod <pod-name> -weight: 500;">kubectl -n kserve get configmap inferenceservice-config -o yaml -weight: 500;">kubectl -n kserve get pods -o wide -weight: 500;">kubectl -n kserve describe pod <pod-name> -weight: 500;">kubectl -n kserve get configmap inferenceservice-config -o yaml -weight: 500;">kubectl -n kserve get pods -o wide -weight: 500;">kubectl -n kserve describe pod <pod-name> - No Istio -weight: 500;">service mesh: No mTLS between services, no advanced traffic management. Acceptable for local dev; requires a replacement security layer in production. - kube-rbac-proxy removed: Prometheus metrics from the KServe controller are unavailable. Re-add this sidecar from a working registry before any production deployment. - Port-forward for inference: The Host-header workaround is local only. Cloud deployment requires a real ingress with DNS and TLS. On EKS, swap Kourier for an ALB and set ingressDomain to your real domain. See the Cloud Promotion Guide in the repository. - infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml — Kourier config patch - infrastructure/serving-stack/patches/kserve-controller-kube-rbac-proxy-image.yaml — sidecar removal patch - infrastructure/kserve/sklearn-runtime.yaml — ClusterServingRuntime definition - docs/CLOUD_PROMOTION_GUIDE.md — how to replace Kourier with ALB/NGINX on EKS/GKE - docs/REALITY_CHECK_MILESTONE_3_GOLDEN_PATH.md — nine Backstage failures documented at the same depth - docs/REALITY_CHECK_MILESTONE_4_GUARDRAILS.md — how kyverno-cli exits 0 on violations and why $PIPESTATUS[0] matters

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolskserveinferenceservicebecomereadyproductionfailuresfixes

More from Tools

Tools: How to Set Up Nginx Reverse Proxy for Next.js with a BasePath Under /market Without Causing Redirect Loops or 404s When Sharing Port 80 with Another Site? - Complete Guide

2026-03-30 0

Tools: Update: Codacy for Python: Code Quality and Static Analysis

2026-03-30 0

Tools: Latest: Why I Switched to a VPS with Coolify for Hosting My Full Stack Apps

2026-03-30 0

Tools: Report: MiniStack v1.1.2 — Cognito, EC2, EMR, 656 Tests, and Zero Docker Leaks

2026-03-30 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Your KServe InferenceService Won't Become Ready: Four Production Failures and Fixes Why

What I Was Building

Failure 1: KServe InferenceService Stuck Not Ready — Istio vs Kourier Ingress Mismatch Causes ReconcileError Loop

Symptom

Digging In

Root Cause: Default KServe Ingress Mode Assumes Istio

The Fix: ConfigMap Patch in serving-stack

Failure 2: ArgoCD Serving-Stack Sync Fails — Duplicate Knative CRD Exceeds 256 KB Annotation Size Limit

Symptom

Root Cause

Failure 3: kube-rbac-proxy ImagePullBackOff Blocks KServe Admission Webhook — gcr.io Access Restriction

Symptom

Root Cause

Fix: Remove the Sidecar via Kustomize Strategic Merge Patch

Failure 4: Inference Request Returns HTTP 405 — IngressDomain Placeholder Resolves to Public Internet

Symptom

Root Cause

Fix: Direct Predictor Pod Port-Forward

What This Proves After the Failures

What This Setup Does NOT Solve (Known Tradeoffs)

Debugging Commands Reference

1 — InferenceService Conditions

2 — Webhook Endpoint Availability

3 — ConfigMap and Pod Status

The One Thing to Remember

🏷️ Tags

More from Tools

Tools: How to Set Up Nginx Reverse Proxy for Next.js with a BasePath Under /market Without Causing Redirect Loops or 404s When Sharing Port 80 with Another Site? - Complete Guide

Tools: Update: Codacy for Python: Code Quality and Static Analysis

Tools: Latest: Why I Switched to a VPS with Coolify for Hosting My Full Stack Apps

Tools: Report: MiniStack v1.1.2 — Cognito, EC2, EMR, 656 Tests, and Zero Docker Leaks

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting