Microsoft Uncovers 'whisper Leak' Attack That Identifies AI Chat...
Microsoft has disclosed details of a novel side-channel attack targeting remote language models that could enable a passive adversary with capabilities to observe network traffic to glean details about model conversation topics despite encryption protections under certain circumstances.
"Cyber attackers in a position to observe the encrypted traffic (for example, a nation-state actor at the internet service provider layer, someone on the local network, or someone connected to the same Wi-Fi router) could use this cyber attack to infer if the user's prompt is on a specific topic," security researchers Jonathan Bar Or and Geoff McDonald, along with the Microsoft Defender Security Research Team, said.
Put differently, the attack allows an attacker to observe encrypted TLS traffic between a user and LLM service, extract packet size and timing sequences, and use trained classifiers to infer whether the conversation topic matches a sensitive target category.
Model streaming in large language models (LLMs) is a technique that allows for incremental data reception as the model generates responses, instead of having to wait for the entire output to be computed. It's a critical feedback mechanism as certain responses can take time, depending on the complexity of the prompt or task.
The latest technique demonstrated by Microsoft is significant, not least because it works despite the fact that the communications with artificial intelligence (AI) chatbots are encrypted with HTTPS, which ensures that the contents of the exchange stay secure and cannot be tampered with.
Many a side-channel attack has been devised against LLMs in recent years, including the ability to infer the length of individual plaintext tokens from the size of encrypted packets in streaming model responses or by exploiting timing differences caused by caching LLM inferences to execute input theft (aka InputSnatch).
Whisper Leak builds upon these findings to explore the possibility that "the sequence of encrypted packet sizes and inter-arrival times during a streaming language model response contains enough information to classify the topic of the initial prompt, even in the cases where responses are streamed in groupings of tokens," per Microsoft.
To test this hypothesis, the Windows maker said it trained a binary classifier as a proof-of-concept that's capable of differentiating between a specific topic prompt and the rest (i.e., noise) using three different machine learning models: LightGBM,
Source: The Hacker News