Prompt Injection: Attack against LLM

Background: Prompt injection is one of the most well-known attacks against LLMs. The primary goal of a threat actor in such attacks is to extract secrets and other sensitive data from the environment trusted by the LLM.

Prompt injection types examples: Direct injection: a case where a user injects harmful prompts through user-controlled input to retrieve information. Indirect injection: a case where hidden metadata or text is fed to an LLM, causing unexpected output. Of course, other attacks exist too, but many of them rely on these two types.

Prompt Attack Example: In this example attacker is trying to use simple native language to send malicious prompt to LLM enter image description here

Now when attacker will use specific technique to retrieve secrets . The model will return the sensitive data . enter image description here

How protection works under the hood: Like many protections seen at the network perimeter, input data reaches the LLM only after passing through different filters . Now let's understand those layers 1. Instruction - These are specific prompts written in natural language that instruct the model on what it should or should not do. For example, such instructions may tell the model not to return data if doing so could harm human health. 2. Specific models/layers - These models interpret the potential offensive meaning of a prompt and act as a Data Loss Prevention (DLP) system for the LLM, helping prevent the leakage of sensitive or inappropriate information.

Of course, these layers are not the only elements that can be included within LLM protection parameters, but they are among the simplest and most fundamental.

Prevention mechanisms: 1. Use DLP on your LLM models 2. Sanitize each request and response data . 3. Keep least privileged principle for the LLM model access

Conclusion: Prompt injection attacks are a real threat to LLMs. Simple layers like filtering, DLP models, and strict access control form the foundation of defense. Combining these methods is key to protecting sensitive data from evolving prompt injection techniques.

Prompt Injection: Attack against LLM

Post by Jok3r

Related posts