Tools

Tools: Why We Banned LLMs from Runtime — And What We Do Instead

2026-02-25 0 views admin

Tools: Why We Banned LLMs from Runtime — And What We Do Instead

Source: Dev.to

The Problem with Runtime LLMs ## The Alternative: Compile Intelligence Away ## The 9-Step Execution Contract ## Where AI Still Matters ## The Tradeoff ## Results Most AI backend tools use LLMs at runtime. Every API call triggers model inference. Every response is probabilistic. We made the opposite choice. When an LLM processes your API request at runtime, you get: For prototypes, this is fine. For production backends handling payments, reservations, and user data? It's a structural risk. Fascia uses AI exclusively at design time. When you describe your business in natural language, AI generates structured specifications - not code. These specs define: At runtime, a deterministic executor (written in Go, ~50ms cold start on Cloud Run) reads the spec and follows it. No LLM inference. No variability. Every API endpoint follows the same sequence: No shortcuts. No "well, this endpoint is special." The rigidity is the feature. AI isn't removed - it's relocated. The Safety Agent runs during the design phase: Red risk = cannot deploy. No override. Fix the design. Examples of Red patterns: All intelligence must be captured in the spec at design time. The runtime cannot "think." This means: This is a real constraint. We think it's the right one. Production backends should be provable, not probabilistic. We're building Fascia in public. Pre-launch, solo founder, 150+ PRs deep. Next in this series: The Risk Engine - How We Classify Green, Yellow, and Red. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - Non-determinism: Same request, different response. Every time. - Latency: 800ms to 3 seconds per request, depending on model load. - Cost: Per-request inference cost that scales linearly with traffic. - Security: A prompt injection surface on every endpoint. - Auditability: "Why did the API return this?" — "The model decided." - Entities: Business objects with fields, relationships, status machines, and invariants - Tools: API endpoints with typed input/output, trigger types, and flow graphs - Policies: Design-time rules that block unsafe patterns before deployment - Validate input against JSON Schema from the spec - Authorize - JWT verification, RBAC role check, row-level ownership - Check policies - design-time rules enforced deterministically - Start transaction - explicit boundary, no auto-commit - Execute flow graph - a DAG of typed nodes (Read, Write, Transform, If/Switch) - Enforce invariants - business rules checked before commit - Commit or rollback - all-or-nothing, no partial state - Write audit log - append-only, unconditional, every execution - Return typed response - matches the output schema from the spec - Multi-model cross-check (Claude + GPT-4, different model families) - Static analysis of flow graphs for unsafe patterns - Risk classification: Green (safe), Yellow (warning), Red (blocked) - Test case generation from spec invariants - Payment call inside a transaction boundary (if tx rolls back, payment can't be undone) - UPDATE without WHERE clause - Write without transaction boundary - Hard delete instead of soft delete - Complex conditional logic must be modeled as flow graph branches - Custom business rules use a restricted Value DSL (no arbitrary code) - External API calls are explicit nodes with retry/timeout configuration

🏷️ Tags

how-totutorialguidedev.toaillmgptswitchnode