System Design in a Hurry: How to Recover When You Realize Your Design Is Wrong

System Design in a Hurry: How to Recover When You Realize Your Design Is Wrong

Source: Dev.to

Why this moment matters more than the initial design ## Defensive vs adaptive behavior ## Common ways System Design answers break mid-interview ## Scale assumptions change ## Latency constraints appear late ## Data growth and fan-out are underestimated ## Reliability requirements surface ## What interviewers want to hear when something goes wrong ## What they do not need ## A simple recovery framework for high-pressure moments ## Step 1: Acknowledge the broken assumption ## Step 2: Identify the bottleneck ## Step 3: Change one thing ## Step 4: Explain the tradeoff ## An illustrative example ## Initial design ## New constraint ## A clean recovery response ## What did not happen ## How recovery expectations differ by seniority ## Junior engineers ## Mid-level engineers ## Senior and staff engineers ## Final thoughts Halfway through the interview, it hits you. You are walking through your system design, explaining components with steady confidence, when the interviewer adds a constraint almost in passing: “By the way, this needs to support ten times that traffic during peak events.” The architecture you just described will not survive that load. The assumptions you made five minutes ago no longer hold. You see the flaw clearly—and you also see the clock. Now comes the real dilemma: Do you acknowledge the problem and risk looking unprepared? Or do you push forward, hoping the interviewer does not dwell on it? This moment feels like failure. In reality, it is one of the most important moments in a System Design interview. System Design interviews are not about never being wrong. They are about how effectively you recognize mistakes, course-correct under pressure, and communicate tradeoffs with clarity. Recovery, not correctness, is what is being evaluated. Interviewers do not expect perfect assumptions. They know that real systems are designed with incomplete information and that requirements evolve mid-stream. In fact, many interviewers deliberately introduce late constraints to see how candidates respond. What distinguishes strong candidates is not that they avoid mistakes, but that they do not become defensive or rigid when mistakes surface. Defensive behavior often looks like: Adaptive reasoning looks like: The recovery moment reveals how you think when certainty disappears. That is far closer to real engineering work than any polished initial design. Most mid-interview failures fall into a few predictable categories. Recognizing them helps normalize the experience. System Design problems are intentionally underspecified. Discovering gaps is part of the exercise. When a flaw surfaces, interviewers listen less to what you change and more to how you talk about the change. They want to hear four things: Clear acknowledgment of the issue Explanation of impact Calm acknowledgment signals control. Over-apologizing signals insecurity. When time is short, recovery works best when it is structured but lightweight. This keeps the conversation moving forward and shows decisiveness. Consider a messaging system. “Under that fan-out, the synchronous push model becomes a bottleneck. The write path would slow down significantly.” The focus stayed on reasoning and consequence. That is what interviewers are listening for. Recovery matters at every level, but expectations change with experience. Across all levels, the common signal is composure. Real systems are never designed correctly on the first pass. They evolve through feedback, failure, and correction. System Design interviews mirror that reality more closely than many candidates realize. Preparation is not about memorizing perfect architectures. It is about: If you can recognize a flaw, explain its impact, and adjust with clarity, you are showing exactly what strong engineers are hired for. Trust that process. Trust your ability to reason in the moment. And remember: realizing your design is wrong is not the end of the interview—it is often the point where it truly begins. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Pushing forward despite a known flaw signals attachment to a solution rather than to the problem. - Pausing, acknowledging the issue, and adapting signals engineering maturity. - Overexplaining why the original choice was “reasonable” - Justifying assumptions instead of reassessing them - Patching the design with unnecessary complexity - Clearly naming the issue - Focusing on impact - Adjusting with minimal change - Explaining tradeoffs calmly - You assume a few million users - The interviewer increases traffic by an order of magnitude - Early estimates are guesses, and interviewers expect them to change - The design relies on synchronous calls or cross-region reads - Later, strict response-time requirements emerge - Latency expectations are often implicit until surfaced - Feeds, messaging systems, and notifications compound quickly - Fan-out effects are easy to gloss over early - Realizing this mid-discussion is a sign of honest reasoning - An interviewer asks about regional outages or partial failures - Your design assumed a simpler failure model - That assumption breaking is expected, not penalized - Clear acknowledgment of the issue Example: “Given this new requirement, the current approach becomes a bottleneck.” - Example: “Given this new requirement, the current approach becomes a bottleneck.” - Explanation of impact What breaks? Latency Throughput Cost Reliability - What breaks? Latency Throughput Cost Reliability - Reliability - A minimal adjustment Change one component or assumption Avoid restarting the entire design - Change one component or assumption - Avoid restarting the entire design - The new tradeoff What improved? What got worse? - What improved? - What got worse? - Example: “Given this new requirement, the current approach becomes a bottleneck.” - What breaks? Latency Throughput Cost Reliability - Reliability - Reliability - Change one component or assumption - Avoid restarting the entire design - What improved? - What got worse? - Long justifications - Apology-heavy language - Overconfidence or panic - State it plainly - Avoid framing it as a personal failure - Be specific about where the design no longer holds - Replace a component - Adjust an assumption - Re-scope a responsibility - What did you gain? - What did you give up? - Clients send messages to a central service - The service writes to a database - Messages are pushed synchronously to recipients - Works well at moderate scale - Some users have millions of followers - Messages must be delivered near real time - Persist the message once - Let downstream consumers fan out asynchronously - Improved write latency and scalability - Eventual delivery for some recipients, which is acceptable - No full redesign - No laundry list of technologies - Awareness matters most - Recognize the issue - Adjust without freezing or deflecting - Clear tradeoffs and prioritization - Explain why one adjustment is better than another - Fast, confident recovery - Aggressive simplification - Strong narrative authority - Adapting when assumptions change - Communicating decisions under uncertainty - Demonstrating judgment, not perfection