Tools
Tools: Ultimate Guide: AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender
AI-Orchestrated 3D Asset Pipeline: From JPEG to Game-Ready GLB Without Touching Blender
The Setup
Why This Approach
Pattern 1: One Action, One Verification
Pattern 2: Structured Prompts for the Vision Model
Pattern 3: Clean Scene Between Models
Pattern 4: Auto-Weights Will Fail on Complex Geometry
Pattern 5: Blender → Godot Translation Gotchas
Rotation mode
Rest pose must be identity
Scale on bones is unreliable
Constraints don't export
Forward axis mismatch
Animation speed
Pattern 6: The AI Agent Has Limits
One task per call
Context mode matters
The AI will get stuck
Pattern 7: Post-Solution Patterns (PSP)
Pattern 8: Assert Vision — Tests for 3D
The Complete Workflow for One Model
Results
Evolution: Unified Vision+Coding Model
Key Takeaways
What's Next TL;DR: I built a pipeline where an AI agent operates Blender through MCP (Model Context Protocol), while a vision model validates every step by looking at screenshots. I never opened Blender's GUI for modeling. Here's what worked, what broke, and the patterns that emerged after rigging 6+ animated models for a Godot 4 project. I needed animated 3D fish for a virtual aquarium in Godot 4. I don't know Blender. Instead of learning it, I built a pipeline where AI does the work and I supervise. The human speaks problems. The AI translates them into Blender Python. The vision model confirms whether the result looks correct. Nobody clicks anything in Blender. Traditional 3D pipeline: learn Blender (weeks), model manually (hours per asset), rig by hand (more hours), debug in Godot (pain). AI-orchestrated pipeline: describe what you want, AI executes, vision model validates, iterate until correct. First model takes a couple of hours of prompt debugging. By the tenth model, you're done in 10 minutes. The key insight: you don't automate Blender by writing a perfect script once. You automate it by teaching an AI agent to handle failures through a vision feedback loop. This is the most important pattern. Everything else depends on it. Why not batch operations? If the AI executes 6 bone extrusions in sequence and something breaks at step 2, neither the AI nor you can tell where it went wrong. One action per cycle means deterministic rollback. Why vision validation? Blender's Python API doesn't always tell you the truth about visual results. A bone might report correct coordinates but visually overlap with another bone. Weights might be "assigned" but produce garbage deformation. The viewport screenshot is ground truth. Anti-stuck rule: if the same approach fails 3 times in a row, the AI must switch strategy. Extrude not working? Try moving the bone directly. Auto-weights failing? Switch to manual Gaussian assignment. A naive prompt to a vision model produces naive answers. "Look at this Blender screenshot" gets you "I see some orange lines." You need structured, domain-specific prompts. Three prompt templates that cover 90% of validation: Blender retains actions, armature data, and mesh data even after deleting objects from the scene. If you rig Fish A, then import Fish B without cleaning, Fish A's bone animations leak into Fish B's export. Real incident: Koi bone names appeared in Pterophyllum's GLB export, causing "Animation target not found" warnings in Godot. Mandatory cleanup script before each new model: Rule: one model at a time. Import → rig → weight → test → export → clean. Only then start the next one. Blender's ARMATURE_AUTO weight assignment calculates distance from each bone to each vertex. This works for simple meshes. For thin geometry (fins, veils, tails), all bones appear "close" to all vertices, and the algorithm produces garbage. What works instead: manual Gaussian weight assignment. Follow with normalization and smoothing (vertex_group_smooth(factor=0.3, repeat=1)). Then validate with the vision model. Another common trap: neutral_bone or Root eating all weights. If a bone sits at origin with use_deform=True, auto-weights assign it to everything. Fix: bone.use_deform = False for utility bones, then re-bind. Many things that work in Blender break silently in Godot. These cost the most debugging time. Blender defaults to Quaternion for armatures after GLB import. If your AI writes bone.rotation_euler.x = -0.5, nothing happens. The bone ignores Euler when in Quaternion mode. Fix: always set bone.rotation_mode = 'XYZ' before animating with Euler, or work in Quaternion throughout. If a bone's rest pose isn't aligned to world axes, Godot applies animation offsets relative to a non-identity transform. Result: the jaw nods the entire head instead of opening the mouth. Fix: in Edit Mode, align all bones strictly along X/Y/Z axes. Set roll = 0 for every bone. After posing, clear all transforms — the mesh should not move. If it moves, rest pose is wrong. Godot 4.x sometimes ignores bone scale if rest pose doesn't match skeleton rest. Gill breathing animated via scale.x on a bone worked in Blender but did nothing in Godot. Fix: use Shape Keys (blend shapes) instead of bone scale for facial/gill animation. Shape Keys work deterministically in both Blender and Godot. Bone animation is only for rotation-based movement (swimming, tail wagging). Godot doesn't understand Blender constraints (Copy Rotation, etc). They must be baked before export. Body axis in Blender is X, in Godot is -Z. All models need a 90° rotation on import. Apply transforms before export: bpy.ops.object.transform_apply(location=True, rotation=True, scale=True). Blender animation at 30 FPS plays at half speed in Godot's 60 FPS physics. Set AnimationPlayer.speed_scale = 2.0 or bake at 60 FPS from the start. The coding AI cannot handle multi-step instructions reliably. "Animate Tail1, Tail2, Tail3 and both pectoral fins" produces bpy.ops.pose.select_all and breaks everything. Fix: one bone per call. Animate Tail1 → vision check → animate Tail2 → vision check → ... → bake all together at the end. Blender's API is context-sensitive. Most bpy.ops calls fail with "poll() failed, context is incorrect" if you're in the wrong mode. Rules the AI must follow: After 3 failed attempts with the same approach, force a strategy change. This must be an explicit rule in the agent's instructions, not a hope. After each model, document what broke and how you fixed it. This creates a growing knowledge base that makes each subsequent model faster. Examples from real production: After ~10 models, PSP becomes your real pipeline. The AI reads it before starting each new model and avoids known pitfalls. First model: 3 hours. Tenth model: 20 minutes. The most powerful pattern that emerged: using the vision model as a test framework. This is CI/CD for 3D. If you change weights tomorrow, run the assert suite. If anything breaks, you know immediately. Between steps 7-10, expect 2-5 iterations per bone. This is normal. The feedback loop (AI executes → vision validates → AI adjusts) converges quickly once PSP covers common failure modes. The bottleneck shifted from "learning Blender" to "debugging AI prompts." When the AI makes a mistake, 90% of the time it's because the vision model gave bad feedback. Fix one line in the VLM prompt — the entire system gets smarter. An important optimization emerged during the project. The initial architecture used a small local vision model (Qwen3VL-4B) purely for validation, while a separate coding AI generated the Blender Python. This meant two models, two contexts, two sets of prompts, and a manual bridge between them. Later, I switched to a larger Qwen model accessed through MCP that could both see the viewport and write code. One model that understands what it's looking at AND knows how to fix it. The feedback loop collapsed from "AI writes code → screenshot → VLM checks → human relays feedback → AI adjusts" to "AI writes code → looks at result → adjusts itself." This cut iteration time significantly. The patterns in this article still apply — one action per check, structured prompts, PSP — but the architecture becomes simpler when vision and coding live in the same model. One action, one check. Never let the AI chain operations blindly. Deterministic rollback requires deterministic steps. Vision validation is non-negotiable. Code can report success while the viewport shows garbage. The screenshot is ground truth. Auto-weights fail on thin geometry. Plan for manual Gaussian assignment on fins, veils, and facial features. Blender and Godot speak different languages. Rest pose identity, quaternion rotation, Shape Keys over bone scale, baked constraints — learn these once, document in PSP, never debug again. PSP is the real product. The pipeline isn't the code. It's the accumulated knowledge of what breaks and how to fix it. Each model teaches the system. The human role is supervisor, not operator. You describe problems in natural language. The AI translates to code. The VLM validates visually. You make decisions when the system gets stuck. The same architecture — AI agent + MCP tool + vision validation — applies beyond Blender. Any GUI-heavy professional tool that exposes an API can be orchestrated this way. The patterns (one action/one check, structured VLM prompts, PSP accumulation) are universal. The agents aren't replacing 3D artists. They're making 3D accessible to people who have ideas but not the specialized skills to execute them. The quality ceiling is still set by human judgment — but the floor has risen dramatically. Tested on: Linux Mint 22.3, Blender 4.0+, Godot 4.x, NVIDIA RTX 5060 Ti (eGPU via Thunderbolt 4)
MCP Server: BlenderMCP 1.27.1Vision Models: Qwen3VL-4B (local, llama.cpp) → later Qwen (larger, unified vision+coding via MCP) Author: Aleksandr Kossarev, Jõgeva, Estonia
Project: Arche Iscrin This article is based on 2300+ lines of production notes from rigging 6 animated fish models for a Godot virtual aquarium, using an AI-orchestrated pipeline without manual Blender operation. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse