Tools: How I Overcame the Optimization Wall by Implementing the Augmented Lagrangian Method (ALM) with AI

Tools: How I Overcame the Optimization Wall by Implementing the Augmented Lagrangian Method (ALM) with AI

Source: Dev.to

How I Overcame the Optimization Wall by Implementing the Augmented Lagrangian Method (ALM) with AI as My Sounding Board ## 1. Stumbling at the Starting Line: Trying to Optimize with Only the LM Method ## 2. An Epiphany on X and Brainstorming with AI ## 3. Basic Knowledge: The Meaning of ∇ (Nabla) ## 4. Lagrange Multipliers and the KKT Conditions ## What is the Method of Lagrange Multipliers? ## Is it true that KKT conditions are an extension of the Lagrange method? ## 5. Implemented Math and Calculation Flow: The ALM × LM Hybrid ## Step 1: Defining the Penalized Objective Function (Inner Loop Goal) ## Step 2: Preventing Calculation Runaways with the smoothMax Function ## 6. The "Underfoot" Tech Supporting ALM (Numerical Calculation Engine) ## ① Finite Difference Method (The sensor to measure terrain slope) ## ② Jacobian Pre-scaling (The safety net preventing calculation collapse) ## ③ Broyden Quasi-Newton Update (The turbo boost for speed) ## 7. A Practical Example in co-opt (URL Sharing and Optimization in Action) ## Conclusion I am currently developing "co-opt", a web-based optical design software that runs entirely in the browser. In this post, I want to share the insights I gained while building a custom "constrained optimization engine" for lens design from scratch. Before we dive in, a quick disclaimer: the methods introduced in this article are likely well-known, standard approaches to experts in mathematical optimization. Also, to prioritize clarity, I will be breaking down the concepts so that anyone who has touched basic calculus in high school or early college can understand them. Please note that this means the article contains expressions that are not strictly mathematically rigorous. In lens design, we have specific goals, such as "I want the focal length to be exactly 50mm" or "I want to minimize optical aberration (blurring)." Initially, I tried to handle this optimization using solely the LM (Levenberg-Marquardt) algorithm. However, I immediately hit a fatal roadblock. While the LM method is ferociously good at "making the score (blur) as small as possible," it completely failed to adhere to constraints like "the center thickness of the lens must be at least 1mm." As a result, in its blind pursuit of a better score, the algorithm ran out of control, producing negative lens thicknesses and causing the whole process to crash. Just as I was agonizing over how to handle these constraints, an expert on X (formerly Twitter) gave me a crucial piece of advice: look into the KKT conditions (Karush-Kuhn-Tucker conditions). Looking back, I think I might have learned this in university, but I had completely forgotten it. Time is cruel, isn't it? I’d like to take this opportunity to deeply thank the person who gave me this hint. It meant the world to me. That said, knowing the term "KKT conditions" and knowing how to translate it into code are two very different things. So, I started using AI as a sounding board, repeating endless cycles of brainstorming and debugging via prompts. After much trial and error, I discovered that I could break through this wall by combining the "Augmented Lagrangian Method (ALM)" with the "LM Method". Before we get into the math, let me explain just one symbol: ∇f(x)\nabla f(x)∇f(x) (read as "nabla"). This represents the gradient (slope) of the function f. Imagine mountain climbing with multiple variables. ∇f(x)\nabla f(x)∇f(x) is simply an arrow (vector) that tells you "the direction of the steepest uphill slope from your current position, and how steep that slope is." In optimization (searching for the bottom of the valley), the basic rule is to move in the exact opposite direction of this arrow. It’s a technique used to solve problems like: "Minimize the score f(x), but absolutely obey the equality rule (constraint) that g(x) = 0." It's a beautiful theorem stating that if we find the point where the derivative of this L is zero ( ∇L=0\nabla L = 0∇L=0 ), we will find the point that minimizes the score while obeying the constraints. Expressed as an equation: This represents a state where the "force trying to walk down the slope of the objective function ( ∇f\nabla f∇f )" and the "repulsive force pushing back from the constraint wall ( λ∇g\lambda \nabla gλ∇g )" are perfectly balanced and at a standstill. We often hear that "the KKT conditions are an extension of the method of Lagrange multipliers," and to get straight to the point: yes, it's true. The method of Lagrange multipliers can only handle "equality constraints (g(x) = 0)". However, in actual design, we absolutely need "inequality constraints," such as "the lens thickness must be 1mm or more," which translates to c(x) ≤ 0 (the thickness constraint violation must be zero or less). The KKT (Karush-Kuhn-Tucker) conditions are exactly the theoretical extension of the Lagrange method that allows it to handle these inequality constraints. The KKT conditions are a collective set of mathematical rules that the optimal solution must satisfy, essentially stating, "It is impossible to improve the score any further without breaking the constraints." So, we understand our target destination: the KKT conditions. But how do we write a program to find that destination? The defining feature of the Augmented Lagrangian Method (ALM) is that it runs a "double loop (inner loop and outer loop)" to gradually learn the repulsive force of the wall and inch closer to the optimal solution. Words alone can be confusing, so first, take a look at the overall calculation flowchart implemented in my software (optimizer-mvp.ts). By nature, the LM method is an algorithm that only minimizes the "sum of squared errors (residuals)." Therefore, to force it to obey the inequality constraint c(x) ≤ 0, we calculate a total score (the Augmented Lagrangian function) that includes a "squared penalty," and pass this to the LM method. Cost: The comprehensive score the LM method will ultimately try to minimize. x: Design variables (lens curvature, etc.). f(x): The original objective function (amount of optical blur). c(x): The amount of constraint violation (how far we've crashed into the wall). μ: The parameter determining the strength of the penalty. λ: The Lagrange multiplier (the memory of the repulsive force from the wall). β: A parameter to smooth out the sharp corners of the constraint wall. The smoothMax function that appears in the formula above is a critically important function I introduced through my brainstorming sessions with AI. Normally, when "penalizing exactly the amount violated," mathematically we use max(0, x). However, the graph of max(0, x) has a sharp, jagged corner at x = 0, making it non-differentiable. For algorithms like LM that navigate using gradients (slopes), encountering a non-differentiable point is fatal—the calculation gets lost there. Therefore, I defined smoothMax, a function that rounds off the corner of max(0, x). The formula is as follows: Here is how I translated this formula into code (excerpt from optimizer-mvp.ts). (Note: In the AI/Deep Learning field, this is identical to what is known as the Softplus function.) I included a condition to return val directly when val * beta > 20. This is a workaround to prevent floating-point operations from overflowing and returning Infinity when calculating massive numbers like e20e^{20}e20 . Step 3: Updating the Multiplier λ (Outer Loop) Once the inner loop (LM method) decides, "Given the current conditions, this is the rock bottom!", we move to the outer loop and update the Lagrange multiplier λ using the following formula: This is the ultimate magic of ALM. It’s not just a blind penalty. Instead, it processes logic like: "If we hit the wall, remember the repulsive force of the wall (λ) proportionally to how deep we crashed into it (c(x)), and carry that memory over to the next inner loop." As this flow (bouncing between inner and outer loops) repeats, eventually the variables x and multipliers λ will lock into place and stop moving. This converged, standstill state automatically and perfectly satisfies the "KKT conditions." Rather than brute-forcing the score down, the system is designed to naturally be sucked into the mathematically correct sweet spot. From the discussion so far, we know that ALM acts as a "navigation system that builds the correct terrain (objective function) to guide us to the KKT conditions." However, how to actually walk (compute) across that complex terrain is an entirely different problem. The browser (TypeScript) environment doesn't have convenient auto-differentiation libraries like Python does. Because of this, I had to build the "underfoot" machinery (the numerical computation engine) from scratch to conquer the terrain ALM created. In my software, the optimization engine is driven by a combination of three main technologies: This is the sensor used to figure out "which way and how far should we step from our current position to lower the score (Jacobian: gradient)." It slightly nudges the variables (e.g., by 10−810^{-8}10−8 ), calculates the change in the score, and deduces the slope. It's very primitive, but in lens design—where complex conditions like total internal reflection of light rays are involved—it is the most reliable approach. In lens design, the "scale" of the numbers we handle is extremely varied. For example, a lens's curvature might be a normal number like 100, but the aspheric coefficients that determine advanced lens shapes can be infinitesimally small, like 10−1210^{-12}10−12 . If we throw these into the same matrix for calculation, the floating-point math completely breaks down (ill-conditioned matrix). To fix this, I implemented a safety mechanism that scales all variables to match their orders of magnitude before calculation, and then scales them back once the calculation is done. The finite difference method has a glaring weakness: it's slow because it requires recalculating everything multiple times for every single variable. So, instead of measuring the slope from scratch every time, I introduced a method (Broyden update) that "guesses and reuses the new slope based on the previous slope and the distance we just moved." This drastically cuts down computation costs, allowing the optimization to run at practical speeds even inside a web browser. A brilliant navigator (ALM) and robust, lightning-fast underfoot machinery. By getting these two wheels to gear together perfectly, I finally reached the ultimate goal: "optimizing lenses at blistering speeds while strictly obeying all constraints." I have actually integrated this engine into "co-opt" on the browser. In co-opt, I've implemented a feature where you can share the state of the optical system and the optimization settings directly via a URL. ▼ Example: co-opt Procedure & URL Sharing Click the link below to open the optical system: Link to pre-optimized optical system An overwrite confirmation window will appear; click "OK". Click the "Optimize" button on the screen. A pop-up window will open; click the "Run" button. Wait a few seconds. The optimization will start, and you will see the optical system updating. When "done" is displayed, the optimization is complete! (Note: Depending on your browser settings, the pop-up window might be blocked and not displayed.) Pre-optimized optical system (Triplet) Post-optimized optical system (Focal length 50mm, F/4.5) Link to post-optimized optical system ![Post-optimized optical system] Without my powerful partner, AI, and the invaluable help from experts on X, this optimization engine would never have been completed. I hope these insights prove useful to someone out there. co-opt https://yassan8.github.io/co-opt/ source code https://github.com/yassan8/co-opt X https://x.com/yassan_8 Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: TypeScript // Smoothmax function: smooth approximation of Math.max(0, x) for differentiability const smoothMax = (val: number, beta: number = 100) => { // Prevent overflow: for large val*beta, approximate linearly if (val * beta > 20) return val; return Math.log(1 + Math.exp(beta * val)) / beta; }; Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: TypeScript // Smoothmax function: smooth approximation of Math.max(0, x) for differentiability const smoothMax = (val: number, beta: number = 100) => { // Prevent overflow: for large val*beta, approximate linearly if (val * beta > 20) return val; return Math.log(1 + Math.exp(beta * val)) / beta; }; COMMAND_BLOCK: TypeScript // Smoothmax function: smooth approximation of Math.max(0, x) for differentiability const smoothMax = (val: number, beta: number = 100) => { // Prevent overflow: for large val*beta, approximate linearly if (val * beta > 20) return val; return Math.log(1 + Math.exp(beta * val)) / beta; }; CODE_BLOCK: TypeScript // Actual code for the update lambdaVec[i] = Math.max(0, lambdaVec[i] + mu * c[i]); Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: TypeScript // Actual code for the update lambdaVec[i] = Math.max(0, lambdaVec[i] + mu * c[i]); CODE_BLOCK: TypeScript // Actual code for the update lambdaVec[i] = Math.max(0, lambdaVec[i] + mu * c[i]); - x: The variables we can change (lens curvature, thickness, etc.) - f(x): The objective function (the amount of blur we want to minimize) - g(x): The equality constraint (e.g., Focal Length - 50mm = 0) - λ: The Lagrange multiplier (the repulsive force pushing back from the constraint wall) - Click the link below to open the optical system: Link to pre-optimized optical system - An overwrite confirmation window will appear; click "OK". - Click the "Optimize" button on the screen. - A pop-up window will open; click the "Run" button. - Wait a few seconds. The optimization will start, and you will see the optical system updating. - When "done" is displayed, the optimization is complete!