Cyber: Update: Ollama Out-of-Bounds Read Vulnerability Allows Remote Process Memory Leak
Cybersecurity researchers have disclosed a critical security vulnerability in Ollama that, if successfully exploited, could allow a remote, unauthenticated attacker to leak its entire process memory. The out-of-bounds read flaw, which likely impacts over 300,000 servers globally, is tracked as CVE-2026-7482 (CVSS score: 9.1). It has been codenamed Bleeding Llama by Cyera. Ollama is a popular open-source framework that allows large language models (LLMs) to be run locally instead of on the cloud. On GitHub, the project has more than 171,000 stars and has been forked over 16,100 times. "Ollama before 0.17.1 contains a heap out-of-bounds read vulnerability in the GGUF model loader," according to a description of the flaw in CVE.org. "The /api/create endpoint accepts an attacker-supplied GGUF file in which the declared tensor offset and size exceed the file's actual length; during quantization in fs/ggml/gguf.go and server/quantization.go (WriteTo()), the server reads past the allocated heap buffer." GGUF, short for GPT-Generated Unified Format, is a file format that's used to store large language models so that they can be easily loaded and executed locally. The problem, at its core, stems from Ollama's use of the unsafe package when creating a model from a GGUF file, specifically in a function named "WriteTo()," thereby making it possible to execute operations that bypass the memory safety guarantees of the programming language. In a hypothetical attack scenario, a bad actor can send a specially crafted GGUF file to an exposed Ollama server with the tensor's shape set to a very large number to trigger the out-of-bounds heap read during model creation using the /api/create endpoint. Successful exploitation of the vulnerability could leak sensitive data from the Ollama process memory. This may include environment variables, API keys, system prompts, and concurrent users' conversation data. This data can be exfiltrated by uploading the resulting model artifact thr
Source: The Hacker News