Tools: RExSyn Nexus 0.6.1 - Stop Hallucinating Proteins: How We Built a 7D Reasoning Engine with AlphaFold3

Tools: RExSyn Nexus 0.6.1 - Stop Hallucinating Proteins: How We Built a 7D Reasoning Engine with AlphaFold3

Source: Dev.to

The problem nobody likes to admit: “Plausible Trash” ## When a system is “logically convincing” but physically wrong. ## What M-A-D-I-F-P-S actually means ## What RExSyn Nexus is (in one minute) ## Why v0.6.1 matters (the “why”, not the “what”) ## The mental model: 6D vs 7D ## How it works (end-to-end) ## Step 1) Mirror AF3 confidence outputs into a strict schema ## Step 2) Convert confidence metrics into a belief score (0..1) ## Step 3) The Guard: License Compliance as Code ## Step 4) Test it without GPUs (deterministic mock) ## “Okay—but how do I use it?” ## Option A) Use the API (job workflow) ## Option B) Use the LOGOS service in Python (reasoning workflow) ## What makes this “special” (not just another pipeline) ## 1) Structure is not decoration; it’s a reasoning axis ## 2) Schema is treated like governance ## 3) Trust claims are tied to verification artifacts ## What I’m improving next (what to watch) ## Closing Before we talk about “BioAI agents,” we need to admit the real failure mode: LLMs are great at producing hypotheses that sound scientific — and quietly violate physics. In this post, I’ll show: In autonomous biomedical research, LLMs can write hypotheses that sound like science. “Attach a 50kDa PEG chain to the binding pocket of Protein X to improve solubility.” Semantically? Great. Structurally? Often impossible. And this is the failure mode I care about: So my question became: Can we make the pipeline refuse a hypothesis — and explain exactly why — using physics-derived signals? That’s what RExSyn Nexus v0.6.1 is about: adding the 7th dimension to our reasoning model. We moved from M-A-D-I-F-P to M-A-D-I-F-P-S, where S = Structure and it’s computed from AlphaFold3 confidence outputs. M-A-D-I-F-P-S is not a “magic acronym.” It’s a reasoning checklist — seven lenses the engine uses to decide whether a hypothesis deserves to survive. In short: 6D (M-A-D-I-F-P) prevents “logical nonsense.” 7D (+S) prevents “physically impossible but linguistically plausible” ideas. RExSyn Nexus is a pipeline that combines: LOGOS itself is defined as IRF-Calc’s 6D framework + AATS v2.1, plus bridge components like drift control and calibration gates. We wanted this instead: That’s why v0.6.1 is not a “feature party.” It’s a structural hardening release: determinism, traceability, schema rigor, and compliance baked into runtime. This release also formalizes the Sovereign Adapter Pattern with three constraints: Think of it like this: So the engine can finally say: “Your argument is deductively strong… but your structure score is low (clash/disorder). Therefore: reject.” Why? Because ad-hoc dicts are where “silent drift” hides. We mirror AlphaFold3 confidence outputs with strict types (including NumPy arrays for heavy matrices). The key detail: we enforce npt.NDArray[np.float64] because loose typing can create “slop”—ambiguous pipelines that look correct but hide drift and cost. Raw AF3 outputs are not “reasoning scores.” They’re confidence/error signals. So we normalize them into Structure (S): a belief score the reasoning engine can weigh against semantic arguments. Here’s a minimal, explainable scorer (weights are heuristics; we’ll tune them later via validation feedback): The “aha” is not the formula. The “aha” is: Structure becomes a first-class reasoning dimension, not a post-hoc chart. When you integrate restricted research assets, “compliance” can’t be a PDF someone forgets. So we enforce an explicit acknowledgment gate at runtime: This is “compliance by design.” It prevents silent legal drift in real teams. You can’t run structural inference on every CI run. But random mocks create flaky tests. So we use a deterministic mock adapter: same input sequence → same output artifact, every time. Here are two practical entry points. Here’s the fastest way to feel the pipeline: If you see REJECT with has_clash=true or high disorder, that’s the point: semantic plausibility didn’t pass the laws of physics. `python from src.logos.rexsyn_service import RExSynLOGOSService logos = RExSynLOGOSService( enable_irf=True, enable_aats=True, enable_drift_control=True ) structure_result = logos.reason_about_structure( query="Evaluate this prediction for drug discovery", scores={"dockq_score": 0.79, "saxs_chi2": 1.85, "druggability_score": 0.82}, meta={"sequence": "MKFLK.", "target_class": "kinase", "method": "alphafold2"} ) if structure_result["final_decision"] == "PASS": validation = logos.validate_hypothesis( hypothesis="This structure is suitable for structure-based drug design", evidence=structure_result, domain="biomedical" ) print(validation["validation_status"], validation["consensus"]) ` The point: you’re not just getting a structure score — you’re getting a reasoned decision object. S is computed, normalized, and used for accept/reject decisions. Strict types prevent “silent drift” and “slop” patterns. Instead of “trust me,” we ship reproducibility hooks (schema parity + deterministic mocks + validation surfaces). (If you hate symbolic scoring: same. The point isn’t the symbol. The point is: verification is a first-class shipping artifact.) Most systems can generate biomedical text. Few systems can say: “This is coherent, but physically invalid — here’s why.” RExSyn Nexus v0.6.1 is my attempt to build that kind of refusal — auditable, deterministic, and grounded. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: from dataclasses import dataclass from typing import Optional import numpy as np import numpy.typing as npt @dataclass(frozen=True) class AF3PredictionResult: """ Schema-aligned mirror of AlphaFold3 confidence outputs. Strict typing prevents downstream drift. """ ptm_score: float iptm_score: Optional[float] ranking_score: float fraction_disordered: float has_clash: bool chain_pair_pae_mean: npt.NDArray[np.float64] # [samples, chains, chains] pae_ichain: npt.NDArray[np.float64] pae_xchain: npt.NDArray[np.float64] Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: from dataclasses import dataclass from typing import Optional import numpy as np import numpy.typing as npt @dataclass(frozen=True) class AF3PredictionResult: """ Schema-aligned mirror of AlphaFold3 confidence outputs. Strict typing prevents downstream drift. """ ptm_score: float iptm_score: Optional[float] ranking_score: float fraction_disordered: float has_clash: bool chain_pair_pae_mean: npt.NDArray[np.float64] # [samples, chains, chains] pae_ichain: npt.NDArray[np.float64] pae_xchain: npt.NDArray[np.float64] COMMAND_BLOCK: from dataclasses import dataclass from typing import Optional import numpy as np import numpy.typing as npt @dataclass(frozen=True) class AF3PredictionResult: """ Schema-aligned mirror of AlphaFold3 confidence outputs. Strict typing prevents downstream drift. """ ptm_score: float iptm_score: Optional[float] ranking_score: float fraction_disordered: float has_clash: bool chain_pair_pae_mean: npt.NDArray[np.float64] # [samples, chains, chains] pae_ichain: npt.NDArray[np.float64] pae_xchain: npt.NDArray[np.float64] COMMAND_BLOCK: import numpy as np class StructureScorer: """ Convert AF3 confidence metrics into a normalized [0,1] structure belief score. Heuristic weights for now; later calibrated with validators (PoseBusters/DockQ/SAXS). """ def __init__(self, max_fraction_disordered: float = 0.30): self.max_fraction_disordered = max_fraction_disordered @staticmethod def _normalize_ranking(ranking_score: float) -> float: """ Conservative normalization example: Map an approximate (-100..1) range to [0,1]. """ x = (ranking_score + 100.0) / 101.0 return float(np.clip(x, 0.0, 1.0)) def score(self, af3: "AF3PredictionResult") -> float: # Base confidence from AF3 ranking_score ranking_component = 0.55 * self._normalize_ranking(af3.ranking_score) # Disorder penalty (too much disorder weakens structural viability) disorder_penalty = 0.0 if af3.fraction_disordered > self.max_fraction_disordered: excess = af3.fraction_disordered - self.max_fraction_disordered disorder_penalty = 0.25 * float(np.clip(excess / 0.70, 0.0, 1.0)) # Clash penalty (“reality veto”) clash_penalty = 0.40 if af3.has_clash else 0.0 final_score = float(np.clip(ranking_component + 0.35 - disorder_penalty - clash_penalty, 0.0, 1.0)) return final_score Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import numpy as np class StructureScorer: """ Convert AF3 confidence metrics into a normalized [0,1] structure belief score. Heuristic weights for now; later calibrated with validators (PoseBusters/DockQ/SAXS). """ def __init__(self, max_fraction_disordered: float = 0.30): self.max_fraction_disordered = max_fraction_disordered @staticmethod def _normalize_ranking(ranking_score: float) -> float: """ Conservative normalization example: Map an approximate (-100..1) range to [0,1]. """ x = (ranking_score + 100.0) / 101.0 return float(np.clip(x, 0.0, 1.0)) def score(self, af3: "AF3PredictionResult") -> float: # Base confidence from AF3 ranking_score ranking_component = 0.55 * self._normalize_ranking(af3.ranking_score) # Disorder penalty (too much disorder weakens structural viability) disorder_penalty = 0.0 if af3.fraction_disordered > self.max_fraction_disordered: excess = af3.fraction_disordered - self.max_fraction_disordered disorder_penalty = 0.25 * float(np.clip(excess / 0.70, 0.0, 1.0)) # Clash penalty (“reality veto”) clash_penalty = 0.40 if af3.has_clash else 0.0 final_score = float(np.clip(ranking_component + 0.35 - disorder_penalty - clash_penalty, 0.0, 1.0)) return final_score COMMAND_BLOCK: import numpy as np class StructureScorer: """ Convert AF3 confidence metrics into a normalized [0,1] structure belief score. Heuristic weights for now; later calibrated with validators (PoseBusters/DockQ/SAXS). """ def __init__(self, max_fraction_disordered: float = 0.30): self.max_fraction_disordered = max_fraction_disordered @staticmethod def _normalize_ranking(ranking_score: float) -> float: """ Conservative normalization example: Map an approximate (-100..1) range to [0,1]. """ x = (ranking_score + 100.0) / 101.0 return float(np.clip(x, 0.0, 1.0)) def score(self, af3: "AF3PredictionResult") -> float: # Base confidence from AF3 ranking_score ranking_component = 0.55 * self._normalize_ranking(af3.ranking_score) # Disorder penalty (too much disorder weakens structural viability) disorder_penalty = 0.0 if af3.fraction_disordered > self.max_fraction_disordered: excess = af3.fraction_disordered - self.max_fraction_disordered disorder_penalty = 0.25 * float(np.clip(excess / 0.70, 0.0, 1.0)) # Clash penalty (“reality veto”) clash_penalty = 0.40 if af3.has_clash else 0.0 final_score = float(np.clip(ranking_component + 0.35 - disorder_penalty - clash_penalty, 0.0, 1.0)) return final_score COMMAND_BLOCK: from dataclasses import dataclass @dataclass(frozen=True) class AF3LicenseConfig: ack_cc_by_nc_sa: bool = False ack_non_commercial: bool = False ack_prohibited_uses: bool = False def is_valid(self) -> bool: return self.ack_cc_by_nc_sa and self.ack_non_commercial and self.ack_prohibited_uses class LicenseGuard: def __init__(self, config: AF3LicenseConfig): if not config.is_valid(): raise ValueError( "HALTING: License acknowledgement incomplete.\n" "Set ack_cc_by_nc_sa=True, ack_non_commercial=True, ack_prohibited_uses=True." ) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: from dataclasses import dataclass @dataclass(frozen=True) class AF3LicenseConfig: ack_cc_by_nc_sa: bool = False ack_non_commercial: bool = False ack_prohibited_uses: bool = False def is_valid(self) -> bool: return self.ack_cc_by_nc_sa and self.ack_non_commercial and self.ack_prohibited_uses class LicenseGuard: def __init__(self, config: AF3LicenseConfig): if not config.is_valid(): raise ValueError( "HALTING: License acknowledgement incomplete.\n" "Set ack_cc_by_nc_sa=True, ack_non_commercial=True, ack_prohibited_uses=True." ) COMMAND_BLOCK: from dataclasses import dataclass @dataclass(frozen=True) class AF3LicenseConfig: ack_cc_by_nc_sa: bool = False ack_non_commercial: bool = False ack_prohibited_uses: bool = False def is_valid(self) -> bool: return self.ack_cc_by_nc_sa and self.ack_non_commercial and self.ack_prohibited_uses class LicenseGuard: def __init__(self, config: AF3LicenseConfig): if not config.is_valid(): raise ValueError( "HALTING: License acknowledgement incomplete.\n" "Set ack_cc_by_nc_sa=True, ack_non_commercial=True, ack_prohibited_uses=True." ) COMMAND_BLOCK: import hashlib import random import numpy as np class DeterministicMockAF3Adapter: """ Hash-seeded mock: same sequence -> same AF3PredictionResult. Designed for CI/CD reproducibility. """ def __init__(self, salt: str = "rexsyn-af3-mock-v0.6.1"): self.salt = salt def predict(self, sequence: str) -> AF3PredictionResult: h = hashlib.sha256((self.salt + sequence).encode()).hexdigest() seed = int(h[:16], 16) rng = random.Random(seed) # Minimal stable shapes for schema fidelity chain_pair_pae_mean = np.zeros((1, 1, 1), dtype=np.float64) pae_ichain = np.zeros((1, 1), dtype=np.float64) pae_xchain = np.zeros((1, 1), dtype=np.float64) return AF3PredictionResult( ptm_score=rng.uniform(0.2, 0.95), iptm_score=rng.uniform(0.2, 0.95), ranking_score=rng.uniform(-80.0, 0.9), fraction_disordered=rng.uniform(0.0, 0.6), has_clash=(rng.random() < 0.08), chain_pair_pae_mean=chain_pair_pae_mean, pae_ichain=pae_ichain, pae_xchain=pae_xchain, ) Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: import hashlib import random import numpy as np class DeterministicMockAF3Adapter: """ Hash-seeded mock: same sequence -> same AF3PredictionResult. Designed for CI/CD reproducibility. """ def __init__(self, salt: str = "rexsyn-af3-mock-v0.6.1"): self.salt = salt def predict(self, sequence: str) -> AF3PredictionResult: h = hashlib.sha256((self.salt + sequence).encode()).hexdigest() seed = int(h[:16], 16) rng = random.Random(seed) # Minimal stable shapes for schema fidelity chain_pair_pae_mean = np.zeros((1, 1, 1), dtype=np.float64) pae_ichain = np.zeros((1, 1), dtype=np.float64) pae_xchain = np.zeros((1, 1), dtype=np.float64) return AF3PredictionResult( ptm_score=rng.uniform(0.2, 0.95), iptm_score=rng.uniform(0.2, 0.95), ranking_score=rng.uniform(-80.0, 0.9), fraction_disordered=rng.uniform(0.0, 0.6), has_clash=(rng.random() < 0.08), chain_pair_pae_mean=chain_pair_pae_mean, pae_ichain=pae_ichain, pae_xchain=pae_xchain, ) COMMAND_BLOCK: import hashlib import random import numpy as np class DeterministicMockAF3Adapter: """ Hash-seeded mock: same sequence -> same AF3PredictionResult. Designed for CI/CD reproducibility. """ def __init__(self, salt: str = "rexsyn-af3-mock-v0.6.1"): self.salt = salt def predict(self, sequence: str) -> AF3PredictionResult: h = hashlib.sha256((self.salt + sequence).encode()).hexdigest() seed = int(h[:16], 16) rng = random.Random(seed) # Minimal stable shapes for schema fidelity chain_pair_pae_mean = np.zeros((1, 1, 1), dtype=np.float64) pae_ichain = np.zeros((1, 1), dtype=np.float64) pae_xchain = np.zeros((1, 1), dtype=np.float64) return AF3PredictionResult( ptm_score=rng.uniform(0.2, 0.95), iptm_score=rng.uniform(0.2, 0.95), ranking_score=rng.uniform(-80.0, 0.9), fraction_disordered=rng.uniform(0.0, 0.6), has_clash=(rng.random() < 0.08), chain_pair_pae_mean=chain_pair_pae_mean, pae_ichain=pae_ichain, pae_xchain=pae_xchain, ) COMMAND_BLOCK: curl -s -X POST https://YOUR_DOMAIN/api/v1/predict \ -H "Content-Type: application/json" \ -d '{"sequence":"MKFLK...","method":"alphafold3","irf_7d_enabled":true}' | jq # -> {"job_id":"job_01HXYZ...","status":"queued"} curl -s https://YOUR_DOMAIN/api/v1/jobs/job_01HXYZ... | jq # -> {"job_id":"job_01HXYZ...","status":"running"} curl -s https://YOUR_DOMAIN/api/v1/jobs/job_01HXYZ.../result | jq # -> {"final_decision":"REJECT","scores":{"irf7":0.73,"structure_s":0.18}, # "signals":{"has_clash":true,"fraction_disordered":0.41,"ranking_score":-62.3}} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: curl -s -X POST https://YOUR_DOMAIN/api/v1/predict \ -H "Content-Type: application/json" \ -d '{"sequence":"MKFLK...","method":"alphafold3","irf_7d_enabled":true}' | jq # -> {"job_id":"job_01HXYZ...","status":"queued"} curl -s https://YOUR_DOMAIN/api/v1/jobs/job_01HXYZ... | jq # -> {"job_id":"job_01HXYZ...","status":"running"} curl -s https://YOUR_DOMAIN/api/v1/jobs/job_01HXYZ.../result | jq # -> {"final_decision":"REJECT","scores":{"irf7":0.73,"structure_s":0.18}, # "signals":{"has_clash":true,"fraction_disordered":0.41,"ranking_score":-62.3}} COMMAND_BLOCK: curl -s -X POST https://YOUR_DOMAIN/api/v1/predict \ -H "Content-Type: application/json" \ -d '{"sequence":"MKFLK...","method":"alphafold3","irf_7d_enabled":true}' | jq # -> {"job_id":"job_01HXYZ...","status":"queued"} curl -s https://YOUR_DOMAIN/api/v1/jobs/job_01HXYZ... | jq # -> {"job_id":"job_01HXYZ...","status":"running"} curl -s https://YOUR_DOMAIN/api/v1/jobs/job_01HXYZ.../result | jq # -> {"final_decision":"REJECT","scores":{"irf7":0.73,"structure_s":0.18}, # "signals":{"has_clash":true,"fraction_disordered":0.41,"ranking_score":-62.3}} - why “plausible trash” happens in biomedical reasoning, - how RExSyn Nexus v0.6.1 adds Structure as a first-class reasoning dimension, - and how to run it (API + Python), with deterministic CI testing. - M — Methodic: Did we follow a disciplined procedure (inputs, constraints, reproducible steps), or are we hand-waving? - A — Abductive: What is the best explanation that fits the evidence we have right now? (plausible hypothesis generation) - D — Deductive: If the hypothesis is true, what must be true next? (logical consequences; consistency checks) - I — Inductive: Does it generalize from prior cases / datasets / known patterns, or is it a one-off story? - F — Falsification: What would disprove this quickly? (designing refutation tests; “how can this fail?”) - P — Paradigm: Is it compatible with established domain constraints and assumptions (biology, chemistry, protocols), or does it violate the frame? - S — Structure (new in v0.6.1): Even if it’s semantically convincing, is it physically viable? In v0.6.1, S is computed from AlphaFold3 confidence signals (e.g., clash/disorder/PAE-style uncertainty) and can veto a hypothesis. - a LOGOS reasoning core (IRF-Calc + AATS), - semantic anchoring (embeddings), - structural confidence (AlphaFold3 confidence schema), - and scientific validators (PoseBusters / DockQ / SAXS), exposed as an API + job workflow. - Generate hypothesis - Add citations - Ship “confidence” as a vibe - Generate hypothesis - Ask physics if it’s plausible - If physics says “no,” reject — and log the reason - Drift-free execution (same input → same output in deterministic modes) - License sovereignty (explicit acknowledgment gates for restricted terms) - Schema rigor (no fuzzy JSON) - 6D reasoning answers: “Is this coherent?” - 7D reasoning adds: “Would atoms allow it?” - POST /api/v1/predict - GET /api/v1/jobs/{job_id} - GET /api/v1/jobs/{job_id}/result - submit a prediction job - poll status - retrieve result + scores + artifacts - Adaptive weighting: replace heuristic constants with calibration from PoseBusters/DockQ/SAXS feedback loops - Cascade reasoning: early-exit 7D when 6D already fails - Multi-chain interface focus: score binding-interface regions, not only global matrices