WSL2 Host vLLM on 0.0.0.0:8000 Docker bridge: 172.18.0.1 (yours will differ) | openshell-cluster container (172.18.0.2) k3s cluster Pod main namespace (10.200.0.1) | Sandbox namespace (10.200.0.2) <-- you are here
WSL2 Host vLLM on 0.0.0.0:8000 Docker bridge: 172.18.0.1 (yours will differ) | openshell-cluster container (172.18.0.2) k3s cluster Pod main namespace (10.200.0.1) | Sandbox namespace (10.200.0.2) <-- you are here
WSL2 Host vLLM on 0.0.0.0:8000 Docker bridge: 172.18.0.1 (yours will differ) | openshell-cluster container (172.18.0.2) k3s cluster Pod main namespace (10.200.0.1) | Sandbox namespace (10.200.0.2) <-- you are here
sudo iptables -I DOCKER-USER 1 \ -i br-<your-bridge> -p tcp --dport 8000 -j ACCEPT sudo iptables -I FORWARD 1 \ -i br-<your-bridge> -o eth0 -p tcp --dport 8000 -j ACCEPT
sudo iptables -I DOCKER-USER 1 \ -i br-<your-bridge> -p tcp --dport 8000 -j ACCEPT sudo iptables -I FORWARD 1 \ -i br-<your-bridge> -o eth0 -p tcp --dport 8000 -j ACCEPT
sudo iptables -I DOCKER-USER 1 \ -i br-<your-bridge> -p tcp --dport 8000 -j ACCEPT sudo iptables -I FORWARD 1 \ -i br-<your-bridge> -o eth0 -p tcp --dport 8000 -j ACCEPT
nvidia_inference: endpoints: - { host: integrate.api.nvidia.com, port: 443 } - { host: 10.200.0.1, port: 8000 } - { host: 172.18.0.1, port: 8000 }
nvidia_inference: endpoints: - { host: integrate.api.nvidia.com, port: 443 } - { host: 10.200.0.1, port: 8000 } - { host: 172.18.0.1, port: 8000 }
nvidia_inference: endpoints: - { host: integrate.api.nvidia.com, port: 443 } - { host: 10.200.0.1, port: 8000 } - { host: 172.18.0.1, port: 8000 }
# relay.py — runs in pod main namespace
server.bind(("10.200.0.1", 8000))
backend.connect(("172.18.0.1", 8000)) # -> host vLLM
# relay.py — runs in pod main namespace
server.bind(("10.200.0.1", 8000))
backend.connect(("172.18.0.1", 8000)) # -> host vLLM
# relay.py — runs in pod main namespace
server.bind(("10.200.0.1", 8000))
backend.connect(("172.18.0.1", 8000)) # -> host vLLM
SANDBOX_PID=$(docker exec openshell-cluster-nemoclaw \ kubectl exec master-impala -n openshell -- \ cat /var/run/sandbox.pid) docker exec openshell-cluster-nemoclaw \ kubectl exec master-impala -n openshell -- \ nsenter -t $SANDBOX_PID -n \ iptables -I OUTPUT 1 -d 10.200.0.1 -p tcp --dport 8000 -j ACCEPT
SANDBOX_PID=$(docker exec openshell-cluster-nemoclaw \ kubectl exec master-impala -n openshell -- \ cat /var/run/sandbox.pid) docker exec openshell-cluster-nemoclaw \ kubectl exec master-impala -n openshell -- \ nsenter -t $SANDBOX_PID -n \ iptables -I OUTPUT 1 -d 10.200.0.1 -p tcp --dport 8000 -j ACCEPT
SANDBOX_PID=$(docker exec openshell-cluster-nemoclaw \ kubectl exec master-impala -n openshell -- \ cat /var/run/sandbox.pid) docker exec openshell-cluster-nemoclaw \ kubectl exec master-impala -n openshell -- \ nsenter -t $SANDBOX_PID -n \ iptables -I OUTPUT 1 -d 10.200.0.1 -p tcp --dport 8000 -j ACCEPT
<TOOLCALL>[{"name":"read_file","arguments":{"path":"app.py"}}]</TOOLCALL>
<TOOLCALL>[{"name":"read_file","arguments":{"path":"app.py"}}]</TOOLCALL>
<TOOLCALL>[{"name":"read_file","arguments":{"path":"app.py"}}]</TOOLCALL>
opencode -> Gateway (:8000) -> vLLM (:8100)
opencode -> Gateway (:8000) -> vLLM (:8100)
opencode -> Gateway (:8000) -> vLLM (:8100)
# Inside the sandbox
~/ask "Explain PagedAttention in 3 sentences"
# -> hits local RTX 5090 opencode
# -> AI coding agent with tool execution, powered by local GPU
# Inside the sandbox
~/ask "Explain PagedAttention in 3 sentences"
# -> hits local RTX 5090 opencode
# -> AI coding agent with tool execution, powered by local GPU
# Inside the sandbox
~/ask "Explain PagedAttention in 3 sentences"
# -> hits local RTX 5090 opencode
# -> AI coding agent with tool execution, powered by local GPU - With tools parameter: When a client sends a tools parameter in the API request, vLLM can use a custom tool parser plugin to convert the text. I wrote a parser registered via @ToolParserManager.register_module(name="nemotron_toolcall") that extracts <TOOLCALL> blocks and returns structured tool call objects. This works for direct API calls (e.g. curl with tools in the request body).
- Without tools parameter: opencode doesn't send tools as an API parameter — it embeds tool definitions in the system prompt instead. This means vLLM's parser never activates, and the <TOOLCALL> text comes back as plain content. - Strips the <TOOLCALL> text from content
- Parses the JSON inside
- Injects structured tool_calls into the final SSE response - Read and write files via tool calls
- Execute shell commands
- Iterate on code with multi-turn tool use