Tools: Migrating from Community ingress-nginx to F5 NGINX Ingress Controller Across 3 AKS Clusters (2026)

Tools: Migrating from Community ingress-nginx to F5 NGINX Ingress Controller Across 3 AKS Clusters (2026)

Why Migrate

What Stayed the Same

The Migration Playbook

The IngressClass Immutability Trap

Helm Values: Structural Differences

Rate Limiting: From Annotations to NGINX Snippets

WebSocket Service: Keepalive Surprises

Zero-Downtime Service Selector Patch

cert-manager

Datadog Metrics

Node Selector Key Update

Helm Upgrade Stability

Rollout Issues Timeline

Key Differences Cheat Sheet

What I Would Do Differently Earlier this month I migrated three production AKS clusters off the community ingress-nginx controller and onto the F5 NGINX Ingress Controller OSS (v2.5.1). The three workloads were a compliance API service, a real-time WebSocket trading server, and a charting frontend. Same controller name, completely different internals — and enough sharp edges to fill a post. This is the full account: what changed, what broke, and the patterns I standardised across all three. The community Helm chart (kubernetes/ingress-nginx) and the F5 chart (nginx-stable/nginx-ingress) both proxy traffic through NGINX, but they diverge at almost every other layer — Helm structure, annotation prefixes, config key names, metrics port, and label selectors. F5 NGINX IC is the upstream-maintained version aligned with NGINX OSS releases and gives tighter control over the NGINX config without relying on the community's annotation translation layer. The practical trigger was a mix of factors: the community chart had accumulated workarounds for bugs we no longer needed, the annotation surface was getting hard to audit, and we wanted a single, consistent ingress stack across clusters. Before diving into the diffs, here is what did not change: Every cluster followed the same five-step pipeline: Step 3 deserves its own section. spec.controller on an IngressClass resource is immutable after creation. The community controller sets it to k8s.io/ingress-nginx; the F5 controller expects nginx.org/ingress-controller. If you just run helm upgrade, F5 will fail to adopt the existing IngressClass and create a conflicting one — or worse, silently ignore it and not process any Ingress resources. The solution is to delete the IngressClass before the first F5 install. But a naive unconditional delete is dangerous in an idempotent pipeline — if someone reruns the pipeline after migration, they'd delete the already-correct F5-owned IngressClass mid-flight, causing a brief outage. The guard condition solves this: After the first successful F5 install, spec.controller reads nginx.org/ingress-controller, so every subsequent pipeline run skips the delete. One-time, idempotent, safe. The community chart uses a flat controller.config map. F5 nests everything under controller.config.entries. Small diff, big gotcha if you copy-paste. A number of community config keys simply do not exist in F5 and are silently ignored if you leave them in. I audited every key against the F5 config documentation and removed: allow-snippet-annotations, allow-backend-server-header, block-user-agents, enable-vts-status, generate-request-id, limit-req-status-code, use-forwarded-headers, use-geoip, upstream-keepalive-*. Other keys that F5 does support but with different names: The full base controller config across all three clusters: Two settings that tripped things up before I caught them: telemetryReporting.enable: false — F5 attempts to phone home to oss.edge.df.f5.com. In a cluster with no outbound internet on the node pool, this causes the controller pod to crash-loop on startup waiting for the connection to time out. Must be disabled explicitly. enableCustomResources: false — F5 ships its own CRDs (VirtualServer, TransportServer, Policy). If you leave this enabled and those CRDs aren't pre-installed, the controller crashes. Since all three clusters use standard Kubernetes Ingress resources, I disabled them entirely. Azure LB health probe — The community controller serves /healthz on port 80. F5 does not. Azure's default HTTP probe on that path will mark all backends unhealthy. Switch to TCP probe. Community ingress-nginx ships first-class annotations for rate limiting: F5 NGINX IC does not have equivalent annotation primitives. The correct F5 approach is to declare the rate limit zones globally in http-snippets (controller values) and apply them per-ingress via server-snippets. Controller values — shared zones: Ingress manifest — apply per route: The geo+map pattern lets specific IP ranges (office networks, CI runners, load testing hosts) bypass rate limits by mapping to an empty key — which limit_req_zone treats as unlimited. This is cleaner than maintaining allow-lists in multiple annotation blocks across ingress manifests. One of the services is a Socket.io server behind WebSocket connections. Everything looked healthy post-migration — pods up, ingress adopted — but Socket.io clients started disconnecting every 30–60 seconds. The root cause: F5's default keepalive-timeout is 0s (disabled), whereas the community chart defaults to 60s. WebSocket connections through NGINX depend on TCP keepalive to stay alive during idle periods. With keepalive disabled, NGINX was closing the connection server-side. Also required adding the F5 WebSocket annotation to the ingress manifest: Without this annotation, F5 does not set the necessary Upgrade and Connection proxy headers for WebSocket handshakes. The community controller handled this automatically; F5 requires you to be explicit. One cluster runs a secondary Service that routes specific traffic, and its label selector was hardcoded to the community controller labels: F5 uses app.kubernetes.io/name=nginx-ingress. After migration, the service selector matched nothing — endpoints went empty, traffic dropped. A plain kubectl apply won't fix this because Kubernetes rejects selector changes on existing Services. Instead, I patched it as a pre-upgrade pipeline step: The --type='merge' strategy replaces only the specified keys, leaving the rest of the selector intact. Running this before helm upgrade means the service selector matches the new pods the moment they come up. The broader lesson: grep for ingress-nginx in all Service selectors across your cluster before starting the migration. Any service with a hardcoded community label selector will silently drop traffic after cutover. One field rename in the ClusterIssuer template — class is deprecated in favour of ingressClassName: Also removed a cert-manager feature gate that was only needed to work around a community ingress-nginx bug (issue #11176) related to path type handling. F5 does not have the bug: F5 exposes Prometheus metrics on port 9113 (the community controller used 8080). The existing Datadog auto-discovery config was pointing at the wrong port. I added an OpenMetrics check: Two things to watch: the file must be named openmetrics.yaml (not nginx-ingress.yaml) for Datadog's catalog to recognise it, and ad_identifiers must match the container name nginx-ingress exactly. The community chart uses the deprecated node label key: F5 values use the stable GA key: Newer AKS node images no longer carry beta.kubernetes.io/os. If your node pool has dropped it, community controller pods won't schedule. Not migration-specific, but worth cleaning up in the same PR. On cold nodes (newly scaled-up node pool), the F5 controller image pull can take longer than Helm's default 3m timeout. --wait --timeout 5m prevents spurious pipeline failures that previously looked like deployment regressions: The conditional IngressClass delete came last because the unconditional delete worked fine on the first run — the rerun risk only became apparent during a pipeline review afterward. Audit every config key before migrating. F5 silently ignores unknown config keys. A pre-migration diff against the F5 config reference would have caught the upstream-keepalive-* and use-gzip removals before they hit production. Test WebSocket apps on a staging cluster first. The keepalive timeout issue was predictable — the default changed between controllers and I didn't check. Grep for ingress-nginx in all Service selectors before starting. Any hardcoded community label selector silently drops traffic after cutover. Add the selector patch to your playbook as a standard pre-upgrade step, not a reactive fix. The migration is complete and stable across all three clusters. Ingress configurations are now easier to reason about — NGINX config is NGINX config, not a translation layer of annotations into nginx.conf directives you can't see. If you're running the community chart and considering the switch, the above should give you a realistic picture of what to budget for. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

# 1. Pull the F5 chart via OCI — no helm repo add needed helm pull oci://ghcr.io/nginx/charts/nginx-ingress \ --version 2.5.1 \ --destination /tmp/charts/ # 2. Verify checksum before touching anything echo "23c866c0531719586570435a4d9a57ac0fb9661fdafd572c8916208cb7b4f225 /tmp/charts/nginx-ingress-2.5.1.tgz" \ | sha256sum --check # 3. One-time IngressClass migration guard CONTROLLER=$(-weight: 500;">kubectl get ingressclass nginx \ -o jsonpath='{.spec.controller}' 2>/dev/null || true) if [ "${CONTROLLER}" = "k8s.io/ingress-nginx" ]; then echo "Removing community IngressClass — allowing F5 takeover" -weight: 500;">kubectl delete ingressclass nginx fi # 4. Helm -weight: 500;">upgrade helm -weight: 500;">upgrade ---weight: 500;">install nginx-ingress /tmp/charts/nginx-ingress-2.5.1.tgz \ --namespace nginx-ingress \ -f values.yaml \ --wait --timeout 5m # 5. Verify the right controller is running -weight: 500;">kubectl get pods -l app.kubernetes.io/name=nginx-ingress -n nginx-ingress # 1. Pull the F5 chart via OCI — no helm repo add needed helm pull oci://ghcr.io/nginx/charts/nginx-ingress \ --version 2.5.1 \ --destination /tmp/charts/ # 2. Verify checksum before touching anything echo "23c866c0531719586570435a4d9a57ac0fb9661fdafd572c8916208cb7b4f225 /tmp/charts/nginx-ingress-2.5.1.tgz" \ | sha256sum --check # 3. One-time IngressClass migration guard CONTROLLER=$(-weight: 500;">kubectl get ingressclass nginx \ -o jsonpath='{.spec.controller}' 2>/dev/null || true) if [ "${CONTROLLER}" = "k8s.io/ingress-nginx" ]; then echo "Removing community IngressClass — allowing F5 takeover" -weight: 500;">kubectl delete ingressclass nginx fi # 4. Helm -weight: 500;">upgrade helm -weight: 500;">upgrade ---weight: 500;">install nginx-ingress /tmp/charts/nginx-ingress-2.5.1.tgz \ --namespace nginx-ingress \ -f values.yaml \ --wait --timeout 5m # 5. Verify the right controller is running -weight: 500;">kubectl get pods -l app.kubernetes.io/name=nginx-ingress -n nginx-ingress # 1. Pull the F5 chart via OCI — no helm repo add needed helm pull oci://ghcr.io/nginx/charts/nginx-ingress \ --version 2.5.1 \ --destination /tmp/charts/ # 2. Verify checksum before touching anything echo "23c866c0531719586570435a4d9a57ac0fb9661fdafd572c8916208cb7b4f225 /tmp/charts/nginx-ingress-2.5.1.tgz" \ | sha256sum --check # 3. One-time IngressClass migration guard CONTROLLER=$(-weight: 500;">kubectl get ingressclass nginx \ -o jsonpath='{.spec.controller}' 2>/dev/null || true) if [ "${CONTROLLER}" = "k8s.io/ingress-nginx" ]; then echo "Removing community IngressClass — allowing F5 takeover" -weight: 500;">kubectl delete ingressclass nginx fi # 4. Helm -weight: 500;">upgrade helm -weight: 500;">upgrade ---weight: 500;">install nginx-ingress /tmp/charts/nginx-ingress-2.5.1.tgz \ --namespace nginx-ingress \ -f values.yaml \ --wait --timeout 5m # 5. Verify the right controller is running -weight: 500;">kubectl get pods -l app.kubernetes.io/name=nginx-ingress -n nginx-ingress if [ "${CONTROLLER}" = "k8s.io/ingress-nginx" ]; then -weight: 500;">kubectl delete ingressclass nginx fi if [ "${CONTROLLER}" = "k8s.io/ingress-nginx" ]; then -weight: 500;">kubectl delete ingressclass nginx fi if [ "${CONTROLLER}" = "k8s.io/ingress-nginx" ]; then -weight: 500;">kubectl delete ingressclass nginx fi controller: config: proxy-read-timeout: "600" load-balance: "ewma" use-gzip: "true" controller: config: proxy-read-timeout: "600" load-balance: "ewma" use-gzip: "true" controller: config: proxy-read-timeout: "600" load-balance: "ewma" use-gzip: "true" controller: config: entries: proxy-read-timeout: "600s" # note: F5 expects the unit suffix lb-method: "ewma" # key renamed # use-gzip has no equivalent — moved to http-snippets controller: config: entries: proxy-read-timeout: "600s" # note: F5 expects the unit suffix lb-method: "ewma" # key renamed # use-gzip has no equivalent — moved to http-snippets controller: config: entries: proxy-read-timeout: "600s" # note: F5 expects the unit suffix lb-method: "ewma" # key renamed # use-gzip has no equivalent — moved to http-snippets controller: kind: deployment enableCustomResources: false # not using VirtualServer CRDs enableSnippets: true telemetryReporting: -weight: 500;">enable: false # no outbound access to oss.edge.df.f5.com ingressClass: name: nginx create: true setAsDefaultIngress: false -weight: 500;">service: annotations: -weight: 500;">service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol: tcp metrics: -weight: 500;">enable: true port: 9113 # changed from community's default serviceMonitor: create: false controller: kind: deployment enableCustomResources: false # not using VirtualServer CRDs enableSnippets: true telemetryReporting: -weight: 500;">enable: false # no outbound access to oss.edge.df.f5.com ingressClass: name: nginx create: true setAsDefaultIngress: false -weight: 500;">service: annotations: -weight: 500;">service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol: tcp metrics: -weight: 500;">enable: true port: 9113 # changed from community's default serviceMonitor: create: false controller: kind: deployment enableCustomResources: false # not using VirtualServer CRDs enableSnippets: true telemetryReporting: -weight: 500;">enable: false # no outbound access to oss.edge.df.f5.com ingressClass: name: nginx create: true setAsDefaultIngress: false -weight: 500;">service: annotations: -weight: 500;">service.beta.kubernetes.io/azure-load-balancer-health-probe-protocol: tcp metrics: -weight: 500;">enable: true port: 9113 # changed from community's default serviceMonitor: create: false # community — applied as ingress annotations nginx.ingress.kubernetes.io/limit-req-rate: "120r/m" nginx.ingress.kubernetes.io/limit-conn: "60" nginx.ingress.kubernetes.io/limit-req--weight: 500;">status: "429" # community — applied as ingress annotations nginx.ingress.kubernetes.io/limit-req-rate: "120r/m" nginx.ingress.kubernetes.io/limit-conn: "60" nginx.ingress.kubernetes.io/limit-req--weight: 500;">status: "429" # community — applied as ingress annotations nginx.ingress.kubernetes.io/limit-req-rate: "120r/m" nginx.ingress.kubernetes.io/limit-conn: "60" nginx.ingress.kubernetes.io/limit-req--weight: 500;">status: "429" controller: config: entries: http-snippets: | geo $app_limit_bypass { default 0; <office-cidr-1> 1; <office-cidr-2> 1; } map $app_limit_bypass $app_limit_key { 0 $binary_remote_addr; 1 ""; } limit_req_zone $app_limit_key zone=app_rpm:10m rate=120r/m; limit_conn_zone $app_limit_key zone=app_conn:10m; controller: config: entries: http-snippets: | geo $app_limit_bypass { default 0; <office-cidr-1> 1; <office-cidr-2> 1; } map $app_limit_bypass $app_limit_key { 0 $binary_remote_addr; 1 ""; } limit_req_zone $app_limit_key zone=app_rpm:10m rate=120r/m; limit_conn_zone $app_limit_key zone=app_conn:10m; controller: config: entries: http-snippets: | geo $app_limit_bypass { default 0; <office-cidr-1> 1; <office-cidr-2> 1; } map $app_limit_bypass $app_limit_key { 0 $binary_remote_addr; 1 ""; } limit_req_zone $app_limit_key zone=app_rpm:10m rate=120r/m; limit_conn_zone $app_limit_key zone=app_conn:10m; annotations: nginx.org/server-snippets: | limit_req zone=app_rpm burst=80 nodelay; limit_req_status 429; limit_conn app_conn 60; limit_conn_status 429; annotations: nginx.org/server-snippets: | limit_req zone=app_rpm burst=80 nodelay; limit_req_status 429; limit_conn app_conn 60; limit_conn_status 429; annotations: nginx.org/server-snippets: | limit_req zone=app_rpm burst=80 nodelay; limit_req_status 429; limit_conn app_conn 60; limit_conn_status 429; controller: config: entries: keepalive-timeout: "60s" http2: "false" # HTTP/2 and WebSocket upgrades conflict; -weight: 500;">disable explicitly controller: config: entries: keepalive-timeout: "60s" http2: "false" # HTTP/2 and WebSocket upgrades conflict; -weight: 500;">disable explicitly controller: config: entries: keepalive-timeout: "60s" http2: "false" # HTTP/2 and WebSocket upgrades conflict; -weight: 500;">disable explicitly annotations: nginx.org/websocket-services: "my-websocket--weight: 500;">service" annotations: nginx.org/websocket-services: "my-websocket--weight: 500;">service" annotations: nginx.org/websocket-services: "my-websocket--weight: 500;">service" app.kubernetes.io/name=ingress-nginx app.kubernetes.io/component=controller app.kubernetes.io/name=ingress-nginx app.kubernetes.io/component=controller app.kubernetes.io/name=ingress-nginx app.kubernetes.io/component=controller -weight: 500;">kubectl patch -weight: 500;">service <legacy--weight: 500;">service-name> \ -n nginx-ingress \ --type='merge' \ -p '{ "spec": { "selector": { "app.kubernetes.io/name": "nginx-ingress" } } }' -weight: 500;">kubectl patch -weight: 500;">service <legacy--weight: 500;">service-name> \ -n nginx-ingress \ --type='merge' \ -p '{ "spec": { "selector": { "app.kubernetes.io/name": "nginx-ingress" } } }' -weight: 500;">kubectl patch -weight: 500;">service <legacy--weight: 500;">service-name> \ -n nginx-ingress \ --type='merge' \ -p '{ "spec": { "selector": { "app.kubernetes.io/name": "nginx-ingress" } } }' # before solvers: - http01: ingress: class: nginx # after solvers: - http01: ingress: ingressClassName: nginx # before solvers: - http01: ingress: class: nginx # after solvers: - http01: ingress: ingressClassName: nginx # before solvers: - http01: ingress: class: nginx # after solvers: - http01: ingress: ingressClassName: nginx # removed from cert-manager values featureGates: "ACMEHTTP01IngressPathTypeExact=false" # removed from cert-manager values featureGates: "ACMEHTTP01IngressPathTypeExact=false" # removed from cert-manager values featureGates: "ACMEHTTP01IngressPathTypeExact=false" # datadog-agent values.yaml confd: openmetrics.yaml: |- ad_identifiers: - nginx-ingress init_config: instances: - openmetrics_endpoint: "http://%%host%%:9113/metrics" namespace: nginx_ingress metrics: - nginx_connections_accepted - nginx_connections_active - nginx_connections_handled - nginx_http_requests_total - nginx_ingress_controller_ingress_resources_total - nginx_ingress_controller_nginx_reloads_total - nginx_ingress_controller_nginx_reload_errors_total - nginx_ingress_controller_nginx_last_reload_milliseconds # datadog-agent values.yaml confd: openmetrics.yaml: |- ad_identifiers: - nginx-ingress init_config: instances: - openmetrics_endpoint: "http://%%host%%:9113/metrics" namespace: nginx_ingress metrics: - nginx_connections_accepted - nginx_connections_active - nginx_connections_handled - nginx_http_requests_total - nginx_ingress_controller_ingress_resources_total - nginx_ingress_controller_nginx_reloads_total - nginx_ingress_controller_nginx_reload_errors_total - nginx_ingress_controller_nginx_last_reload_milliseconds # datadog-agent values.yaml confd: openmetrics.yaml: |- ad_identifiers: - nginx-ingress init_config: instances: - openmetrics_endpoint: "http://%%host%%:9113/metrics" namespace: nginx_ingress metrics: - nginx_connections_accepted - nginx_connections_active - nginx_connections_handled - nginx_http_requests_total - nginx_ingress_controller_ingress_resources_total - nginx_ingress_controller_nginx_reloads_total - nginx_ingress_controller_nginx_reload_errors_total - nginx_ingress_controller_nginx_last_reload_milliseconds beta.kubernetes.io/os=linux beta.kubernetes.io/os=linux beta.kubernetes.io/os=linux kubernetes.io/os=linux kubernetes.io/os=linux kubernetes.io/os=linux helm -weight: 500;">upgrade ---weight: 500;">install nginx-ingress ./nginx-ingress-2.5.1.tgz \ --namespace nginx-ingress \ -f values.yaml \ --wait --timeout 5m helm -weight: 500;">upgrade ---weight: 500;">install nginx-ingress ./nginx-ingress-2.5.1.tgz \ --namespace nginx-ingress \ -f values.yaml \ --wait --timeout 5m helm -weight: 500;">upgrade ---weight: 500;">install nginx-ingress ./nginx-ingress-2.5.1.tgz \ --namespace nginx-ingress \ -f values.yaml \ --wait --timeout 5m - IngressClass name remains nginx in every cluster (no application-level changes needed) - Azure Load Balancer type (internal where it was internal, public where public) - cert-manager ClusterIssuers (one field rename, covered below) - Linkerd injection on controller pods