Tools
Tools: Ultimate Guide: Why Lowering ndots Breaks Alpine Pods (But Not Debian) — A Deep Dive into glibc vs musl Resolvers
The starting point: 5 queries for one domain
The three config files that matter
How a query travels (and where the retry loop lives)
glibc vs musl: same input, different behavior
glibc — falls back gracefully
musl — stops on the first miss
Why doesn't musl just add the fallback?
Reproducing it with kind
Test: resolve kubernetes.default.svc (2 dots)
Result
CoreDNS logs confirm it
Resolver behavior, side by side
What to do about it
Further reading If you're running Alpine-based pods in Kubernetes and someone tells you to lower ndots for better DNS performance — don't. Or at least, read this first. We had 5 DNS queries firing for every external domain lookup. The fix seemed obvious: drop ndots:5 to ndots:2. An AI reviewer warned me it might break internal service resolution. The reasoning didn't hold up when I read the resolver code, so I went ahead — and broke things in a way I didn't expect. The AI was right about the symptom but wrong about the cause. The breakage is real, but it lives in libc, not in the search algorithm. Every external DNS lookup in our cluster was producing 4-5 queries. This is well-known behavior — it's caused by the default ndots:5 combined with Kubernetes' three-entry search list. The path of least resistance is to lower ndots. Most external domains have dots ≥ 2 (google.com, api.example.com), so ndots:2 skips the search-list traversal for them entirely. Before shipping the change, I asked an AI assistant to review it. It warned that internal service resolution might break, with this reasoning: The "then what" was the question. I read the resolver source and concluded the AI was wrong: after the original query fails, the resolver should fall back to search-list traversal. The lookup should still succeed. I was reading the wrong source. Before going deeper, the surface area of "Kubernetes DNS" lives in three files. Knowing which one controls what saves a lot of pain. A Pod's DNS settings come from spec.dnsPolicy, default ClusterFirst, which inherits the pod's resolv.conf: That's the file libc reads. And libc is the part that decides whether to walk the search list or skip it. The key thing to notice in the flow: when a Pod's resolver gets NXDOMAIN, it retries with the next FQDN from the search list. That retry loop is where query amplification comes from. Lowering ndots is appealing because it skips this loop for high-dot names. CoreDNS itself doesn't care about ndots. It just answers whatever FQDN arrives. The retry decision happens entirely on the client side, inside libc. Here's the part the AI got right (in spirit) and I missed: the resolver isn't part of CoreDNS, Kubernetes, or even your app. It's the libc shipped in your container image. Different libcs implement search/ndots differently. Distros: Debian, Ubuntu, CentOS, RHEL, Amazon Linux. When dots ≥ ndots, glibc tries the original first. If that returns NXDOMAIN, it walks the search list anyway as a fallback. One or two extra queries, but resolution succeeds. The fallback logic lives in __res_context_search(). The relevant part: The critical detail: the as-is attempt and the search loop are sequential, not exclusive. Failure of the first does not prevent the second. musl is intentionally minimal. When dots ≥ ndots, it sets *search = 0 and never enters the search loop. From name_from_dns_search(): Setting *search = 0 isn't a bug. It's deliberate. The next question is why. This has come up on the musl mailing list more than once. Maintainer Rich Felker rejects it consistently. The clearest example is from Andrey Arapov's 2019 thread: If you're on musl and want to avoid this entirely: set ndots:1 and don't depend on short names. This is a values disagreement, not a bug. Both libcs are doing what they intended. The mismatch only becomes a Kubernetes problem because Kubernetes hands every Pod a search list and assumes the resolver will use it. Four pods, two libcs, two ndots values. Patch CoreDNS to log every query: The four test pods (full manifest in the original Korean post). Key bits: This name has 2 dots. Under ndots:5, dots < ndots → search first. Under ndots:2, dots ≥ ndots → original first. The libc difference only surfaces in the ndots:2 case. Same query. Same cluster. Same ndots:2. The only thing that changed is the libc. alpine-ndots2 — only the original name arrives at CoreDNS. No search-expanded queries: debian-ndots2 — original first, then the entire search list, then success: This is exactly what the source code predicted. musl exits the search loop on the first iteration; glibc walks every entry. Under the default ndots:5, most names have fewer than 5 dots, so both libcs try search first and the difference doesn't surface. The moment you lower ndots, more names cross into dots ≥ ndots territory — and that's where musl's missing fallback turns into a real outage. If you want to lower ndots and you have any musl-based workloads: The thing I keep coming back to: the abstraction you're tuning (Kubernetes' ndots) and the layer where the behavior actually lives (libc resolver) can be miles apart. The Kubernetes docs talk about ndots. The Pod spec exposes ndots. CoreDNS configures things adjacent to ndots. And none of them are the layer that decides what happens when dots ≥ ndots. The AI reviewer wasn't wrong to flag the risk. It just couldn't see one layer down. Neither could I, until the test pods told me. When something in a layered system behaves unexpectedly, "why" usually doesn't have a clean answer at the layer you're operating in. Tracing the call all the way down to the C source is, surprisingly often, faster than reading another blog post. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
The bigger lesson - DNS for Services and Pods (Kubernetes docs)
- Kubernetes DNS debugging guide — includes warnings for Alpine 3.17 and earlier- musl wiki — Functional differences from glibc- Pracucci — ndots:5 and application performance- NodeLocal DNSCacheThe fuller archaeology — every relevant GitHub issue across CoreDNS, kubernetes/dns, cert-manager, openwhisk, kind, and others — is in the original Korean post's appendix.