$ ssh root@your_server_ip COMMAND_BLOCK: ssh root@your_server_ip COMMAND_BLOCK: ssh root@your_server_ip CODE_BLOCK: OutputThe authenticity of host 'your_server_ip (your_server_ip)' can't be established.....Are you sure you want to continue connecting (yes/no/[fingerprint])? CODE_BLOCK: OutputThe authenticity of host 'your_server_ip (your_server_ip)' can't be established.....Are you sure you want to continue connecting (yes/no/[fingerprint])? CODE_BLOCK: OutputThe authenticity of host 'your_server_ip (your_server_ip)' can't be established.....Are you sure you want to continue connecting (yes/no/[fingerprint])? COMMAND_BLOCK: -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install python3 python3--weight: 500;">pip COMMAND_BLOCK: -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install python3 python3--weight: 500;">pip COMMAND_BLOCK: -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install python3 python3--weight: 500;">pip COMMAND_BLOCK: -weight: 500;">pip -weight: 500;">install vllm COMMAND_BLOCK: -weight: 500;">pip -weight: 500;">install vllm COMMAND_BLOCK: -weight: 500;">pip -weight: 500;">install vllm COMMAND_BLOCK: -weight: 500;">wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py COMMAND_BLOCK: -weight: 500;">wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py COMMAND_BLOCK: -weight: 500;">wget https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/resolve/main/nano_v3_reasoning_parser.py CODE_BLOCK: vllm serve --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \ --max-num-seqs 8 \ --tensor-parallel-size 1 \ --max-model-len 262144 \ --port 8000 \ --trust-remote-code \ --reasoning-parser-plugin nano_v3_reasoning_parser.py \ --reasoning-parser nano_v3 CODE_BLOCK: vllm serve --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \ --max-num-seqs 8 \ --tensor-parallel-size 1 \ --max-model-len 262144 \ --port 8000 \ --trust-remote-code \ --reasoning-parser-plugin nano_v3_reasoning_parser.py \ --reasoning-parser nano_v3 CODE_BLOCK: vllm serve --model nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 \ --max-num-seqs 8 \ --tensor-parallel-size 1 \ --max-model-len 262144 \ --port 8000 \ --trust-remote-code \ --reasoning-parser-plugin nano_v3_reasoning_parser.py \ --reasoning-parser nano_v3 CODE_BLOCK: import requests url = "http://your_server_ip:8000/v1/completions" data = { "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000 } response = requests.post(url, json=data) message = response.json()['choices'][0]['message']['content'] print(message) CODE_BLOCK: import requests url = "http://your_server_ip:8000/v1/completions" data = { "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000 } response = requests.post(url, json=data) message = response.json()['choices'][0]['message']['content'] print(message) CODE_BLOCK: import requests url = "http://your_server_ip:8000/v1/completions" data = { "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000 } response = requests.post(url, json=data) message = response.json()['choices'][0]['message']['content'] print(message) CODE_BLOCK: Output The capital of France is **Paris**. CODE_BLOCK: Output The capital of France is **Paris**. CODE_BLOCK: Output The capital of France is **Paris**. CODE_BLOCK: data = { "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000, "chat_template_kwargs": {"enable_thinking": False}, } CODE_BLOCK: data = { "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000, "chat_template_kwargs": {"enable_thinking": False}, } CODE_BLOCK: data = { "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 1000, "chat_template_kwargs": {"enable_thinking": False}, } - NVIDIA has announced Nemotron 3, a new addition to their Nemotron model lineup. Nemotron 3 consists of three new models, Nano (30B), Super (49B), and Ultra (253B). - As of January, 2026, the smallest model, Nano, is the only one currently available for use. Super and Ultra are scheduled for release later in 2026. - All of the models are open-weight, allowing for open access for commercial use and modification. The models’ architectures employ novel efficiency improvements to increase model throughput. - Mistral 3 Models on DigitalOcean - How to Build Parallel Agentic Workflows with Python - Run gpt-oss 120B on vLLM with an AMD Instinct MI300X GPU Droplet