$ -weight: 500;">git clone https://github.com/ggml-org/llama.cpp ~/llama.cpp
cd ~/llama.cpp
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=86 -DLLAMA_CURL=ON
cmake --build build --config Release -j$(nproc)
-weight: 500;">git clone https://github.com/ggml-org/llama.cpp ~/llama.cpp
cd ~/llama.cpp
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=86 -DLLAMA_CURL=ON
cmake --build build --config Release -j$(nproc)
-weight: 500;">git clone https://github.com/ggml-org/llama.cpp ~/llama.cpp
cd ~/llama.cpp
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=86 -DLLAMA_CURL=ON
cmake --build build --config Release -j$(nproc)
huggingface-cli download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir ~/models
~/llama.cpp/build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf \ -ngl 99 --host 127.0.0.1 --port 8080 \ -c 8192 --jinja
huggingface-cli download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir ~/models
~/llama.cpp/build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf \ -ngl 99 --host 127.0.0.1 --port 8080 \ -c 8192 --jinja
huggingface-cli download unsloth/Qwen3.6-27B-GGUF Qwen3.6-27B-Q4_K_M.gguf --local-dir ~/models
~/llama.cpp/build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf \ -ngl 99 --host 127.0.0.1 --port 8080 \ -c 8192 --jinja
~/llama.cpp/build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf \ -ngl 99 --host 127.0.0.1 --port 8080 \ -c 262144 \ -fa on -ctk q4_0 -ctv q4_0 \ --parallel 1 \ --jinja
~/llama.cpp/build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf \ -ngl 99 --host 127.0.0.1 --port 8080 \ -c 262144 \ -fa on -ctk q4_0 -ctv q4_0 \ --parallel 1 \ --jinja
~/llama.cpp/build/bin/llama-server \ -m ~/models/Qwen3.6-27B-Q4_K_M.gguf \ -ngl 99 --host 127.0.0.1 --port 8080 \ -c 262144 \ -fa on -ctk q4_0 -ctv q4_0 \ --parallel 1 \ --jinja - BIOS 2.41 or newer on the T5820. Verify in System Information.
- Disable Secure Boot, set boot mode to UEFI only, and leave Primary Video on Auto.
- 3090 Ti in slot 1 (top, CPU lanes) or slot 4 (also CPU lanes). Slot 1 is x8 on a Xeon W-2223 build and slot 4 is x16, but PCIe Gen3 x8 does not bottleneck a single GPU inference workload. Pick on clearance.
- 12VHPWR seated until you hear the latch click. Three separate PSU cables to the 3-to-1 adapter. Y-splitters and pigtails are a fire hazard at 450 W, so all three 8-pin inputs need to be populated from three independent rails.
- Both PSUs powered before you press the Dell power button. If you are running dual-PSU, bring up the GPU PSU first.
- First boot may power-cycle five to seven times before POST. Do not abort early. The BIOS is retraining the PCIe link.
- After Linux boots, -weight: 600;">sudo -weight: 500;">apt -weight: 500;">install nvidia-driver-580, reboot, then verify with nvidia-smi. - Fresh Ubuntu 25.10 -weight: 500;">install on the box, SSH access via Tailscale.
- lspci saw only the Quadro M4000 (PCH-attached, x4) on bus 04. The 3090 Ti did not enumerate, both CPU PCIe root ports were empty, and the card was inert with no fan twitch on power-on and no LED.
- False lead. I assumed PSU sequencing was the issue, pulled the card to the bench, and jumped PS_ON on the 1 kW supply. The card stayed inert. Twenty minutes in I remembered that most retail Add-In Board (AIB) cards refuse to light or spin until they are seated in a PCIe slot, which makes bench tests unreliable for "is the card alive" checks on Ampere-class GPUs.
- Real cause: the 12VHPWR connector was loose. I reseated it firmly until it clicked, the card LED lit up, and it was alive.
- Installed in slot 1 of the T5820, powered up the GPU PSU first, then pressed the Dell power button.
- Boot loop. The box power-cycled four, five, six, seven times before settling into another black screen with no SSH, and I aborted.
- Searched Dell forums and LinusTechTips and found multiple unresolved threads. Dell's official guidance qualifies the RTX 3090 for slots 2 and 4 of the T5820, the two x16 CPU slots.
- Tried slot 4, same boot-loop pattern, aborted again.
- Pulled the M4000 entirely and booted to BIOS on the 3090 Ti to confirm Secure Boot disabled, UEFI only, Primary Video on Auto. The BIOS 2.41 System Setup screen does not expose a user-facing toggle for memory-mapped I/O (MMIO) above 4 GB on this revision, and the firmware appears to handle the mapping automatically.
- Reinstalled in slot 1 (more clearance for the 3.5-slot girth than slot 4, accepting the x8 lane drop). The boot loop returned, but this time I waited instead of aborting and the box POSTed cleanly on the seventh cycle. SSH came back. lspci showed 0000:b3:00.0 GA102 [GeForce RTX 3090 Ti] [10de:2203] on a CPU root complex, which is what was supposed to happen all along.
- Installed nvidia-driver-580 via -weight: 500;">apt, rebooted, and nvidia-smi came up clean. - All three 8-pin inputs need to be populated from three separate PSU rails with three separate cables. Y-splitters and pigtails are a fire hazard at 450 W. The 2023 CableMod recall covered angled adapters where the connector could shift loose under cable tension, but the underlying physics (partial contact at 35-40 A) is the same failure mode you create when you split rails or share them.
- The 12VHPWR latch needs to audibly click. A connector seated 95% of the way will pass continuity tests, fail under load, and on some cards melt the connector housing. The audible click is the only reliable signal, so push until it clicks. If it does not click, the card is not seated.
- The card will not light on the bench. Ampere-generation cards keep the fans and the LED off until they are seated in a PCIe slot, so you cannot validate "is this card alive" by jumping PS_ON on the PSU and looking for fan spin. The card has to be installed. - --parallel 1 is a single slot, which is fine for solo use. Concurrent users will queue, which is rarely what you want.
- KV cache quality at q4 over very long contexts is empirically untested for this model. Long-document recall could degrade in ways that throughput numbers will never show, so a needle-in-haystack pass is a precondition for trusting this configuration for real long-document work. - nvidia-smi -pl 350 power-limit to drop heat with a marginal throughput cost. The card is still running at the 450 W default.
- vLLM comparison on the same model. llama.cpp wins on single-user latency. vLLM should win on batched throughput, so it is worth measuring.
- RAM -weight: 500;">upgrade in transit. 16 GB is anemic for a Skylake-W board, so 4×32 GB RDIMMs are ordered to take the C422 chipset to its quad-channel ceiling. - Dell Community: Precision 5820 with RTX 3090 boot loop: unresolved
- Dell Community: Anyone running a 3090 Ti in a 5820: speculative, no working fix
- Dell Community: 5820 boot loop when Tesla P100 installed: same pattern, different GPU
- Dell Precision 5820 Owner's Manual, PCIe slots: slot lane assignments
- keturk/llm_on_rtx_3090: closest competitor (Ubuntu + Docker + Ollama, does not address the boot loop)
- llama.cpp: primary upstream