hipMallocManaged
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT - NPU handles efficient inference at 50 TOPS / 2W — your always-on workhorse
- iGPU handles flexible parallel compute — batch processing, larger models
- CPU orchestrates, preprocesses, and fills gaps
- A scheduler (running on the NPU itself) learns which ops run best where on your chip - NPU inference via FastFlowLM: Llama 3.2 1B at ~60 tok/s, under 2 watts
- XDNA kernel driver mainlined in Linux 6.14
- iGPU inference via Vulkan llama.cpp (and it's 60% faster than ROCm — more on that below)
- All three processors sharing physical memory - ONNX Runtime's Vitis AI EP is completely broken on Linux
- hipMallocManaged returns "not supported" on the 890M
- No DMA-BUF bridge between the GPU and NPU drivers
- Nobody has run all three processors simultaneously for inference - It runs at <2W, always-on without thermal impact
- XDNA 2 supports dynamic spatial partitioning at column boundaries
- The remaining NPU columns still handle inference workloads
- It's literally a neural processor running a neural scheduling policy - NPU-as-scheduling-agent for CPU+GPU workload orchestration
- Persistent hardware personality — an evolving model of your chip's specific behavior over weeks/months
- Three-processor dynamic operator placement on a single SoC (CPU+GPU is studied; all three is not)
- Cross-model transfer learning for on-device scheduling (learning from Model A improves scheduling of Model B)
- Vulkan+XRT memory bridge — combining Vulkan's superior unified memory access with XRT buffer objects via CPU-mediated sharing
- NPU-bookended assembly line — NPU dispatches at the start, assembles at the end; CPU and GPU are decoupled async producers. 1000:1 speed ratio makes scheduling overhead effectively zero