Science

Researchers Uncover Vulnerabilities in Large Language Models

Published

2 months ago

27 August, 2025

Recent research has exposed significant vulnerabilities in large language models (LLMs), indicating that despite advancements in artificial intelligence, these systems remain susceptible to exploitation. Security experts highlight that attackers can manipulate LLMs using simple techniques, such as run-on sentences and poor grammar, to extract sensitive information.

A series of studies conducted by various research institutions reveal that many LLMs, despite high performance benchmarks and claims of nearing artificial general intelligence (AGI), still lack the robustness expected from such advanced technologies. As noted by David Shipley of Beauceron Security, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.”

Understanding the Vulnerabilities

Researchers from Palo Alto Networks’ Unit 42 identified a critical flaw in the refusal-affirmation training process of LLMs. Normally, these models are trained to refuse harmful queries by predicting the next logical word in a sequence. However, a gap exists where harmful responses can still be generated. This so-called “refusal-affirmation logit gap” allows adversaries to exploit the models by crafting prompts that evade internal safety mechanisms.

The Unit 42 researchers shared a practical guideline: “Never let the sentence end — finish the jailbreak before a full stop and the safety model has far less opportunity to re-assert itself.” Their findings showed a success rate ranging from 80% to 100% in exploiting this gap with various models, including Google’s Gemini and OpenAI’s gpt-oss-20b, indicating a severe flaw in security measures.

These vulnerabilities raise concerns, especially as enterprise workers frequently upload images to LLMs. A separate investigation by Trail of Bits illustrated how harmful instructions embedded in images could be concealed from human users until the images were resized. When scaled down, these images revealed commands that LLMs interpreted as legitimate. This exploit allowed researchers to extract sensitive data from systems like the Google Gemini command-line interface (CLI).

The Broader Implications for AI Security

The risks associated with LLMs do not end with prompt manipulation. In a related study by Tracebit, researchers found that malicious actors could access sensitive data through a combination of prompt injection and inadequate validation processes. The cumulative effect of these vulnerabilities creates a significant risk that remains largely undetected.

Valence Howden, an advisory fellow at Info-Tech Research Group, emphasized that effective security controls cannot be established without a fundamental understanding of how LLMs operate. He noted that the complexity of AI systems makes traditional security measures less effective, particularly as many models are primarily trained in English, which can lead to contextual misunderstandings in other languages.

Shipley echoed this sentiment, stating that security considerations often come as an afterthought, resulting in systems that are “insecure by design.” He criticized the current state of AI security, describing it as a “big urban garbage mountain that gets turned into a ski hill.” While it may appear polished on the surface, the underlying vulnerabilities pose significant risks.

As researchers continue to uncover these security gaps, the urgency for robust solutions becomes increasingly clear. The potential for harmful content to slip through inadequate safeguards represents a pressing challenge for developers and security professionals alike. “These security failure stories are just the shots being fired all over,” Shipley remarked. “Some of them are going to land and cause real harm.”

In conclusion, the findings underscore a critical need for improved security protocols, especially as LLMs become more integrated into various applications. Addressing these vulnerabilities is essential to ensure that the advancements in AI do not come at the cost of user safety and data integrity.

Related Topics:

Up Next

Researchers Uncover Vulnerabilities in AI Language Models

Don't Miss

Plant Breeders Embrace Gene Technology at NZ Forum

Editorial

Our Editorial team doesn’t just report the news—we live it. Backed by years of frontline experience, we hunt down the facts, verify them to the letter, and deliver the stories that shape our world. Fueled by integrity and a keen eye for nuance, we tackle politics, culture, and technology with incisive analysis. When the headlines change by the minute, you can count on us to cut through the noise and serve you clarity on a silver platter.