Tools

Tools: Ultimate Guide: AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender

2026-05-27 0 views admin

AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender

The Setup

Why This Approach

Pattern 1: One Action, One Verification

Pattern 2: Structured Prompts for the Vision Model

Pattern 3: Clean Scene Between Models

Pattern 4: Auto-Weights Will Fail on Complex Geometry

Pattern 5: Blender → Godot Translation Gotchas

Rotation mode

Rest pose must be identity

Scale on bones is unreliable

Constraints don't export

Forward axis mismatch

Animation speed

Pattern 6: The AI Agent Has Limits

One task per call

Context mode matters

The AI will get stuck

Pattern 7: Post-Solution Patterns (PSP)

Pattern 8: Assert Vision — Tests for 3D

The Complete Workflow for One Model

Results

Evolution: Unified Vision+Coding Model

Key Takeaways

What's Next TL;DR: I built a pipeline where an AI agent operates Blender through MCP (Model Context Protocol), while a vision model validates every step by looking at screenshots. I never opened Blender's GUI for modeling. Here's what worked, what broke, and the patterns that emerged after rigging 6+ animated models for a Godot 4 project. I needed animated 3D fish for a virtual aquarium in Godot 4. I don't know Blender. Instead of learning it, I built a pipeline where AI does the work and I supervise. The human speaks problems. The AI translates them into Blender Python. The vision model confirms whether the result looks correct. Nobody clicks anything in Blender. Traditional 3D pipeline: learn Blender (weeks), model manually (hours per asset), rig by hand (more hours), debug in Godot (pain). AI-orchestrated pipeline: describe what you want, AI executes, vision model validates, iterate until correct. First model takes a couple of hours of prompt debugging. By the tenth model, you're done in 10 minutes. The key insight: you don't automate Blender by writing a perfect script once. You automate it by teaching an AI agent to handle failures through a vision feedback loop. This is the most important pattern. Everything else depends on it. Why not batch operations? If the AI executes 6 bone extrusions in sequence and something breaks at step 2, neither the AI nor you can tell where it went wrong. One action per cycle means deterministic rollback. Why vision validation? Blender's Python API doesn't always tell you the truth about visual results. A bone might report correct coordinates but visually overlap with another bone. Weights might be "assigned" but produce garbage deformation. The viewport screenshot is ground truth. Anti-stuck rule: if the same approach fails 3 times in a row, the AI must switch strategy. Extrude not working? Try moving the bone directly. Auto-weights failing? Switch to manual Gaussian assignment. A naive prompt to a vision model produces naive answers. "Look at this Blender screenshot" gets you "I see some orange lines." You need structured, domain-specific prompts. Three prompt templates that cover 90% of validation: Blender retains actions, armature data, and mesh data even after deleting objects from the scene. If you rig Fish A, then import Fish B without cleaning, Fish A's bone animations leak into Fish B's export. Real incident: Koi bone names appeared in Pterophyllum's GLB export, causing "Animation target not found" warnings in Godot. Mandatory cleanup script before each new model: Rule: one model at a time. Import → rig → weight → test → export → clean. Only then start the next one. Blender's ARMATURE_AUTO weight assignment calculates distance from each bone to each vertex. This works for simple meshes. For thin geometry (fins, veils, tails), all bones appear "close" to all vertices, and the algorithm produces garbage. What works instead: manual Gaussian weight assignment. Follow with normalization and smoothing (vertex_group_smooth(factor=0.3, repeat=1)). Then validate with the vision model. Another common trap: neutral_bone or Root eating all weights. If a bone sits at origin with use_deform=True, auto-weights assign it to everything. Fix: bone.use_deform = False for utility bones, then re-bind. Many things that work in Blender break silently in Godot. These cost the most debugging time. Blender defaults to Quaternion for armatures after GLB import. If your AI writes bone.rotation_euler.x = -0.5, nothing happens. The bone ignores Euler when in Quaternion mode. Fix: always set bone.rotation_mode = 'XYZ' before animating with Euler, or work in Quaternion throughout. If a bone's rest pose isn't aligned to world axes, Godot applies animation offsets relative to a non-identity transform. Result: the jaw nods the entire head instead of opening the mouth. Fix: in Edit Mode, align all bones strictly along X/Y/Z axes. Set roll = 0 for every bone. After posing, clear all transforms — the mesh should not move. If it moves, rest pose is wrong. Godot 4.x sometimes ignores bone scale if rest pose doesn't match skeleton rest. Gill breathing animated via scale.x on a bone worked in Blender but did nothing in Godot. Fix: use Shape Keys (blend shapes) instead of bone scale for facial/gill animation. Shape Keys work deterministically in both Blender and Godot. Bone animation is only for rotation-based movement (swimming, tail wagging). Godot doesn't understand Blender constraints (Copy Rotation, etc). They must be baked before export. Body axis in Blender is X, in Godot is -Z. All models need a 90° rotation on import. Apply transforms before export: bpy.ops.object.transform_apply(location=True, rotation=True, scale=True). Blender animation at 30 FPS plays at half speed in Godot's 60 FPS physics. Set AnimationPlayer.speed_scale = 2.0 or bake at 60 FPS from the start. The coding AI cannot handle multi-step instructions reliably. "Animate Tail1, Tail2, Tail3 and both pectoral fins" produces bpy.ops.pose.select_all and breaks everything. Fix: one bone per call. Animate Tail1 → vision check → animate Tail2 → vision check → ... → bake all together at the end. Blender's API is context-sensitive. Most bpy.ops calls fail with "poll() failed, context is incorrect" if you're in the wrong mode. Rules the AI must follow: After 3 failed attempts with the same approach, force a strategy change. This must be an explicit rule in the agent's instructions, not a hope. After each model, document what broke and how you fixed it. This creates a growing knowledge base that makes each subsequent model faster. Examples from real production: After ~10 models, PSP becomes your real pipeline. The AI reads it before starting each new model and avoids known pitfalls. First model: 3 hours. Tenth model: 20 minutes. The most powerful pattern that emerged: using the vision model as a test framework. This is CI/CD for 3D. If you change weights tomorrow, run the assert suite. If anything breaks, you know immediately. Between steps 7-10, expect 2-5 iterations per bone. This is normal. The feedback loop (AI executes → vision validates → AI adjusts) converges quickly once PSP covers common failure modes. The bottleneck shifted from "learning Blender" to "debugging AI prompts." When the AI makes a mistake, 90% of the time it's because the vision model gave bad feedback. Fix one line in the VLM prompt — the entire system gets smarter. An important optimization emerged during the project. The initial architecture used a small local vision model (Qwen3VL-4B) purely for validation, while a separate coding AI generated the Blender Python. This meant two models, two contexts, two sets of prompts, and a manual bridge between them. Later, I switched to a larger Qwen model accessed through MCP that could both see the viewport and write code. One model that understands what it's looking at AND knows how to fix it. The feedback loop collapsed from "AI writes code → screenshot → VLM checks → human relays feedback → AI adjusts" to "AI writes code → looks at result → adjusts itself." This cut iteration time significantly. The patterns in this article still apply — one action per check, structured prompts, PSP — but the architecture becomes simpler when vision and coding live in the same model. One action, one check. Never let the AI chain operations blindly. Deterministic rollback requires deterministic steps. Vision validation is non-negotiable. Code can report success while the viewport shows garbage. The screenshot is ground truth. Auto-weights fail on thin geometry. Plan for manual Gaussian assignment on fins, veils, and facial features. Blender and Godot speak different languages. Rest pose identity, quaternion rotation, Shape Keys over bone scale, baked constraints — learn these once, document in PSP, never debug again. PSP is the real product. The pipeline isn't the code. It's the accumulated knowledge of what breaks and how to fix it. Each model teaches the system. The human role is supervisor, not operator. You describe problems in natural language. The AI translates to code. The VLM validates visually. You make decisions when the system gets stuck. The same architecture — AI agent + MCP tool + vision validation — applies beyond Blender. Any GUI-heavy professional tool that exposes an API can be orchestrated this way. The patterns (one action/one check, structured VLM prompts, PSP accumulation) are universal. The agents aren't replacing 3D artists. They're making 3D accessible to people who have ideas but not the specialized skills to execute them. The quality ceiling is still set by human judgment — but the floor has risen dramatically. Tested on: Linux Mint 22.3, Blender 4.0+, Godot 4.x, NVIDIA RTX 5060 Ti (eGPU via Thunderbolt 4)

MCP Server: BlenderMCP 1.27.1Vision Models: Qwen3VL-4B (local, llama.cpp) → later Qwen (larger, unified vision+coding via MCP) Author: Aleksandr Kossarev, Jõgeva, Estonia

Project: Arche Iscrin This article is based on 2300+ lines of production notes from rigging 6 animated fish models for a Godot virtual aquarium, using an AI-orchestrated pipeline without manual Blender operation. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Code Block

Copy

Human (instructions) → AI Agent (generates bpy code) → MCP Protocol (JSON-RPC over stdio) → Blender Addon (socket :9876, executes Python) → Viewport Screenshot → Vision Model (validates result) → AI Agent (adjusts or proceeds) → Export GLB → Godot Human (instructions) → AI Agent (generates bpy code) → MCP Protocol (JSON-RPC over stdio) → Blender Addon (socket :9876, executes Python) → Viewport Screenshot → Vision Model (validates result) → AI Agent (adjusts or proceeds) → Export GLB → Godot Human (instructions) → AI Agent (generates bpy code) → MCP Protocol (JSON-RPC over stdio) → Blender Addon (socket :9876, executes Python) → Viewport Screenshot → Vision Model (validates result) → AI Agent (adjusts or proceeds) → Export GLB → Godot 1. AI executes ONE Blender operation 2. Take a screenshot of the viewport 3. Vision model checks the result 4. If OK → next step. If FAIL → undo → try different approach. 1. AI executes ONE Blender operation 2. Take a screenshot of the viewport 3. Vision model checks the result 4. If OK → next step. If FAIL → undo → try different approach. 1. AI executes ONE Blender operation 2. Take a screenshot of the viewport 3. Vision model checks the result 4. If OK → next step. If FAIL → undo → try different approach. "Check the skeleton" "Check the skeleton" "Check the skeleton" "You are a rigging tech lead. Count the bones in the armature. Check: 1) All bone heads connect to previous bone tails? 2) Last bone reaches the end of the mesh? Answer strictly: bones=N|chain_ok=true/false|tail_reach=true/false" "You are a rigging tech lead. Count the bones in the armature. Check: 1) All bone heads connect to previous bone tails? 2) Last bone reaches the end of the mesh? Answer strictly: bones=N|chain_ok=true/false|tail_reach=true/false" "You are a rigging tech lead. Count the bones in the armature. Check: 1) All bone heads connect to previous bone tails? 2) Last bone reaches the end of the mesh? Answer strictly: bones=N|chain_ok=true/false|tail_reach=true/false" import bpy # Delete all scene objects for obj in list(bpy.context.scene.objects): bpy.data.objects.remove(obj, do_unlink=True) # Purge all orphan data blocks bpy.ops.outliner.orphans_purge( do_local_ids=True, do_linked_ids=False, do_recursive=True ) # Verify: everything should be zero print(f"Objects: {len(bpy.data.objects)}, " f"Actions: {len(bpy.data.actions)}, " f"Armatures: {len(bpy.data.armatures)}, " f"Meshes: {len(bpy.data.meshes)}") import bpy # Delete all scene objects for obj in list(bpy.context.scene.objects): bpy.data.objects.remove(obj, do_unlink=True) # Purge all orphan data blocks bpy.ops.outliner.orphans_purge( do_local_ids=True, do_linked_ids=False, do_recursive=True ) # Verify: everything should be zero print(f"Objects: {len(bpy.data.objects)}, " f"Actions: {len(bpy.data.actions)}, " f"Armatures: {len(bpy.data.armatures)}, " f"Meshes: {len(bpy.data.meshes)}") import bpy # Delete all scene objects for obj in list(bpy.context.scene.objects): bpy.data.objects.remove(obj, do_unlink=True) # Purge all orphan data blocks bpy.ops.outliner.orphans_purge( do_local_ids=True, do_linked_ids=False, do_recursive=True ) # Verify: everything should be zero print(f"Objects: {len(bpy.data.objects)}, " f"Actions: {len(bpy.data.actions)}, " f"Armatures: {len(bpy.data.armatures)}, " f"Meshes: {len(bpy.data.meshes)}") import math sigma = 0.03 # adjust per bone size for v in mesh.data.vertices: v_local = arm.matrix_world.inverted() @ mesh.matrix_world @ v.co d = (v_local - bone_head).length if d < sigma * 3: w = math.exp(-d*d / (2*sigma*sigma)) if w > 0.05: group.add([v.index], w, 'REPLACE') import math sigma = 0.03 # adjust per bone size for v in mesh.data.vertices: v_local = arm.matrix_world.inverted() @ mesh.matrix_world @ v.co d = (v_local - bone_head).length if d < sigma * 3: w = math.exp(-d*d / (2*sigma*sigma)) if w > 0.05: group.add([v.index], w, 'REPLACE') import math sigma = 0.03 # adjust per bone size for v in mesh.data.vertices: v_local = arm.matrix_world.inverted() @ mesh.matrix_world @ v.co d = (v_local - bone_head).length if d < sigma * 3: w = math.exp(-d*d / (2*sigma*sigma)) if w > 0.05: group.add([v.index], w, 'REPLACE') bpy.ops.nla.bake( frame_start=1, frame_end=60, visual_keying=True, # bake constraint results clear_constraints=True, # remove constraints from export bake_types={'POSE'} ) bpy.ops.nla.bake( frame_start=1, frame_end=60, visual_keying=True, # bake constraint results clear_constraints=True, # remove constraints from export bake_types={'POSE'} ) bpy.ops.nla.bake( frame_start=1, frame_end=60, visual_keying=True, # bake constraint results clear_constraints=True, # remove constraints from export bake_types={'POSE'} ) Symptom: [what you observed] Cause: [root cause] Fix: [code or procedure] Applies to: [which model types] Symptom: [what you observed] Cause: [root cause] Fix: [code or procedure] Applies to: [which model types] Symptom: [what you observed] Cause: [root cause] Fix: [code or procedure] Applies to: [which model types] def assert_vision(question, expected_answer): result = vlm_ask(screenshot(), question) if expected_answer.lower() not in result.lower(): raise AssertionError( f"Vision assert failed: expected '{expected_answer}', got '{result}'" ) def assert_vision(question, expected_answer): result = vlm_ask(screenshot(), question) if expected_answer.lower() not in result.lower(): raise AssertionError( f"Vision assert failed: expected '{expected_answer}', got '{result}'" ) def assert_vision(question, expected_answer): result = vlm_ask(screenshot(), question) if expected_answer.lower() not in result.lower(): raise AssertionError( f"Vision assert failed: expected '{expected_answer}', got '{result}'" ) # After rigging assert_vision("Tail3 rotated 45°. What bent? A) Only tip B) Whole tail C) Entire body", "A") # After weight painting assert_vision("Head changed position?", "NO") # After animation bake assert_vision("Frame 1 and frame 60. Same pose?", "YES") # After export and Godot import assert_vision("Skeleton visible? Tail bends?", "YES") # After rigging assert_vision("Tail3 rotated 45°. What bent? A) Only tip B) Whole tail C) Entire body", "A") # After weight painting assert_vision("Head changed position?", "NO") # After animation bake assert_vision("Frame 1 and frame 60. Same pose?", "YES") # After export and Godot import assert_vision("Skeleton visible? Tail bends?", "YES") # After rigging assert_vision("Tail3 rotated 45°. What bent? A) Only tip B) Whole tail C) Entire body", "A") # After weight painting assert_vision("Head changed position?", "NO") # After animation bake assert_vision("Frame 1 and frame 60. Same pose?", "YES") # After export and Godot import assert_vision("Skeleton visible? Tail bends?", "YES") 1. Clean Blender scene (purge orphans) 2. Import GLB from Meshy.ai 3. Orient body along X axis (rotate Z -90°, apply transforms) 4. Decimate to target polycount (ratio 0.15-0.3) 5. Create armature: spine chain + fins + jaw 6. Parent mesh to armature with empty vertex groups 7. Assign weights: Gaussian for each bone, normalize, smooth 8. Vision check: rotate each bone → "only target deforms?" 9. Selective zero: remove weight leaks from body to face bones 10. Vision check: jaw/gills move independently? 11. Create swim animation: sin wave on spine chain, 60 frames 12. Vision check: frame 1 = frame 60? Natural motion? 13. Bake action: visual_keying=True, clear_constraints=True 14. Export GLB with animations and Shape Keys 15. Import in Godot, verify animation plays correctly 16. Clean Blender scene for next model 1. Clean Blender scene (purge orphans) 2. Import GLB from Meshy.ai 3. Orient body along X axis (rotate Z -90°, apply transforms) 4. Decimate to target polycount (ratio 0.15-0.3) 5. Create armature: spine chain + fins + jaw 6. Parent mesh to armature with empty vertex groups 7. Assign weights: Gaussian for each bone, normalize, smooth 8. Vision check: rotate each bone → "only target deforms?" 9. Selective zero: remove weight leaks from body to face bones 10. Vision check: jaw/gills move independently? 11. Create swim animation: sin wave on spine chain, 60 frames 12. Vision check: frame 1 = frame 60? Natural motion? 13. Bake action: visual_keying=True, clear_constraints=True 14. Export GLB with animations and Shape Keys 15. Import in Godot, verify animation plays correctly 16. Clean Blender scene for next model 1. Clean Blender scene (purge orphans) 2. Import GLB from Meshy.ai 3. Orient body along X axis (rotate Z -90°, apply transforms) 4. Decimate to target polycount (ratio 0.15-0.3) 5. Create armature: spine chain + fins + jaw 6. Parent mesh to armature with empty vertex groups 7. Assign weights: Gaussian for each bone, normalize, smooth 8. Vision check: rotate each bone → "only target deforms?" 9. Selective zero: remove weight leaks from body to face bones 10. Vision check: jaw/gills move independently? 11. Create swim animation: sin wave on spine chain, 60 frames 12. Vision check: frame 1 = frame 60? Natural motion? 13. Bake action: visual_keying=True, clear_constraints=True 14. Export GLB with animations and Shape Keys 15. Import in Godot, verify animation plays correctly 16. Clean Blender scene for next model - CLI — my entry point, natural language instructions - AI coding agent (via MCP) — writes and executes Blender Python code - Blender MCP addon — exposes Blender operations as MCP tools over a local socket - Vision model (VLM) — looks at viewport screenshots and validates results - Meshy.ai — converts reference photos to 3D models with textures - Godot 4 — final destination for rigged, animated GLB files - Never ask the VLM to count precisely. It hallucinates numbers on complex scenes. Instead, ask it to compare: "Are there MORE, FEWER, or SAME number of bones as the reference (7)?" - Use multiple-choice format: "What bent? A) Only the tip B) Whole tail C) Entire body. Answer with one letter." Comparisons work better than open-ended questions. - Force the viewport angle before taking screenshots. Side view for spine/tail, front view for gills. The AI must set the camera programmatically before each screenshot. - Force a redraw before screenshotting: {% raw %}bpy.ops.wm.redraw_timer(type='DRAW_WIN_SWAP', iterations=1). Without this, the screenshot captures a stale frame. - "No solution found for one or more bones" - Root bone influences 100% of vertices - Entire body deforms when you rotate one fin - Before mode_set(mode='POSE') → set active = armature - Before mode_set(mode='WEIGHT_PAINT') → set active = mesh - Before mode_set(mode='EDIT') for armature → first go to OBJECT, then set active, then EDIT - select_all(action='DESELECT') only works in OBJECT mode - One action, one check. Never let the AI chain operations blindly. Deterministic rollback requires deterministic steps. - Vision validation is non-negotiable. Code can report success while the viewport shows garbage. The screenshot is ground truth. - Auto-weights fail on thin geometry. Plan for manual Gaussian assignment on fins, veils, and facial features. - Blender and Godot speak different languages. Rest pose identity, quaternion rotation, Shape Keys over bone scale, baked constraints — learn these once, document in PSP, never debug again. - PSP is the real product. The pipeline isn't the code. It's the accumulated knowledge of what breaks and how to fix it. Each model teaches the system. - The human role is supervisor, not operator. You describe problems in natural language. The AI translates to code. The VLM validates visually. You make decisions when the system gets stuck.

Share this article

Twitter Facebook LinkedIn Reddit

🏷️ Tags

toolsutilitiessecurity toolsultimateguideorchestratedassetpipelinereadywithout

More from Tools

Tools: When you bring your data home, who is going to keep an eye on it? (2026)

2026-05-27 0

Tools: keys per tenant: ditching our custom LLM billing layer Virtual

2026-05-27 0

Tools: How to Deploy on Air-Gapped AWS EKS for Regulated Financial Services (2026)

2026-05-27 0

Tools: Latest: Remetric: find waste in self-hosted Prometheus, Grafana, and Loki

2026-05-27 0

Trending

1

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

2025-10-27 • 189 views

2

CVE-2025-43939: Dell Unity OS Command Injection (High)

2025-10-30 • 148 views

3

Google disputes false claims of massive Gmail data breach

2025-10-30 • 130 views

4

Microsoft: DNS outage impacts Azure and Microsoft 365 services

2025-10-30 • 88 views

5

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting

2025-11-25 • 81 views

InfinitSec - Latest Cybersecurity, Technology & Gaming News

Tools: Ultimate Guide: AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender

AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender

The Setup

Why This Approach

Pattern 1: One Action, One Verification

Pattern 2: Structured Prompts for the Vision Model

Pattern 3: Clean Scene Between Models

Pattern 4: Auto-Weights Will Fail on Complex Geometry

Pattern 5: Blender → Godot Translation Gotchas

Rotation mode

Rest pose must be identity

Scale on bones is unreliable

Constraints don't export

Forward axis mismatch

Animation speed

Pattern 6: The AI Agent Has Limits

One task per call

Context mode matters

The AI will get stuck

Pattern 7: Post-Solution Patterns (PSP)

Pattern 8: Assert Vision — Tests for 3D

The Complete Workflow for One Model

Results

Evolution: Unified Vision+Coding Model

Key Takeaways

🏷️ Tags

More from Tools

Tools: When you bring your data home, who is going to keep an eye on it? (2026)

Tools: keys per tenant: ditching our custom LLM billing layer Virtual

Tools: How to Deploy on Air-Gapped AWS EKS for Regulated Financial Services (2026)

Tools: Latest: Remetric: find waste in self-hosted Prometheus, Grafana, and Loki

Trending

CVE-2025-61481: Critical Remote Code Execution Vulnerability in MikroTik RouterOS & SwitchOS

CVE-2025-43939: Dell Unity OS Command Injection (High)

Google disputes false claims of massive Gmail data breach

Microsoft: DNS outage impacts Azure and Microsoft 365 services

3.5B Accounts, 1 Critical Flaw: Meta Closes WhatsApp Data-Harvesting