Science
Researchers Uncover Vulnerabilities in AI Language Models

Recent research has identified significant vulnerabilities in large language models (LLMs) that could lead to the unintended disclosure of sensitive information. Despite advancements in artificial intelligence, including claims of nearing artificial general intelligence (AGI), these models exhibit weaknesses that can be exploited through seemingly innocuous methods.
One notable tactic involves using run-on sentences or poorly constructed prompts that lack punctuation. According to researchers, this strategy can manipulate LLMs into bypassing their built-in safety mechanisms. David Shipley of Beauceron Security described the situation, stating, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.” Shipley emphasized that inadequate security measures often stand between users and potentially harmful content.
Understanding the Refusal-Affirmation Logit Gap
LLMs typically implement a mechanism to refuse harmful requests through a system of logits, which predict the next word in a sequence. During alignment training, models are introduced to refusal tokens, adjusting their logits to favor rejections of harmful queries. However, researchers at Palo Alto Networks’ Unit 42 identified a concerning “refusal-affirmation logit gap.” This gap indicates that while alignment training reduces the likelihood of harmful responses, it does not eliminate the possibility entirely. Attackers can exploit this gap by crafting prompts that manipulate the model’s outputs.
The Unit 42 researchers shared insights in a blog post, stating, “A practical rule of thumb emerges: never let the sentence end. Finish the jailbreak before a full stop, and the safety model has far less opportunity to re-assert itself.” Their findings revealed an alarming success rate of between 80% and 100% using this tactic against various mainstream models, including Google’s Gemini, Meta’s Llama, and OpenAI’s latest model, gpt-oss-20b.
Exploiting Image-Based Vulnerabilities
Research conducted by Trail of Bits revealed another layer of vulnerability related to image processing. In their experiments, images embedded with harmful instructions were able to exfiltrate sensitive data when viewed at different resolutions. For instance, the images appeared innocuous at full size, but when downscaled, hidden commands were revealed. One command instructed the Google Command Line Interface (CLI) to check a user’s calendar for upcoming events and send pertinent information via email.
This method of attack was effective against multiple platforms, including Google’s Gemini, Vertex AI Studio, and Google Assistant. The researchers noted that while specific adjustments are necessary for different models, the exploit is widespread and poses a significant risk across various applications.
Shipley pointed out that the issue of embedding malicious code in images is not new, calling it “foreseeable and preventable.” He criticized the AI industry for treating security as an afterthought, stating that many AI systems were built with inadequate security measures from the outset.
Further complicating matters, a study by security firm Tracebit highlighted additional vulnerabilities within the Google CLI. They found that a blend of prompt injection, insufficient validation, and poor user experience could allow malicious actors to access sensitive data undetected.
Valence Howden, an advisory fellow at Info-Tech Research Group, noted that these vulnerabilities stem from a fundamental misunderstanding of AI operations. He explained, “It’s difficult to apply security controls effectively with AI; its complexity and dynamic nature make static security controls significantly less effective.” With approximately 90% of models trained in English, the loss of contextual cues when different languages are introduced exacerbates the issue.
Shipley reiterated that the security design of many AI models remains fundamentally flawed. He criticized the industry for prioritizing the quantity of data over quality, stating, “There’s so much bad stuffed into these models in the mad pursuit of ever-larger corpuses in exchange for hoped-for performance increases.” He likened LLMs to “a big urban garbage mountain that gets turned into a ski hill,” suggesting that while external appearances may seem acceptable, significant issues lurk beneath the surface.
In conclusion, the research underscores the urgent need for improved security measures in AI systems. As vulnerabilities continue to be uncovered, the industry must prioritize robust, proactive strategies to protect against these threats.
-
World1 month ago
Test Your Knowledge: Take the Herald’s Afternoon Quiz Today
-
Sports1 month ago
PM Faces Backlash from Fans During Netball Trophy Ceremony
-
Lifestyle1 month ago
Dunedin Designers Win Top Award at Hokonui Fashion Event
-
Sports1 month ago
Liam Lawson Launches New Era for Racing Bulls with Strong Start
-
Lifestyle1 month ago
Disney Fan Reveals Dress Code Tips for Park Visitors
-
Health1 month ago
Walking Faster Offers Major Health Benefits for Older Adults
-
World2 months ago
Coalition Forms to Preserve Māori Wards in Hawke’s Bay
-
Politics1 month ago
Scots Rally with Humor and Music to Protest Trump’s Visit
-
Top Stories2 months ago
UK and India Finalize Trade Deal to Boost Economic Ties
-
World2 months ago
Huntly Begins Water Pipe Flushing to Resolve Brown Water Issue
-
Science1 month ago
New Interactive Map Reveals Wairarapa Valley’s Geological Secrets
-
World2 months ago
Fonterra’s Miles Hurrell Discusses Butter Prices with Minister Willis