Science

Researchers Uncover Vulnerabilities in AI Language Models

Published

2 months ago

27 August, 2025

Recent research has identified significant vulnerabilities in large language models (LLMs) that could lead to the unintended disclosure of sensitive information. Despite advancements in artificial intelligence, including claims of nearing artificial general intelligence (AGI), these models exhibit weaknesses that can be exploited through seemingly innocuous methods.

One notable tactic involves using run-on sentences or poorly constructed prompts that lack punctuation. According to researchers, this strategy can manipulate LLMs into bypassing their built-in safety mechanisms. David Shipley of Beauceron Security described the situation, stating, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.” Shipley emphasized that inadequate security measures often stand between users and potentially harmful content.

Understanding the Refusal-Affirmation Logit Gap

LLMs typically implement a mechanism to refuse harmful requests through a system of logits, which predict the next word in a sequence. During alignment training, models are introduced to refusal tokens, adjusting their logits to favor rejections of harmful queries. However, researchers at Palo Alto Networks’ Unit 42 identified a concerning “refusal-affirmation logit gap.” This gap indicates that while alignment training reduces the likelihood of harmful responses, it does not eliminate the possibility entirely. Attackers can exploit this gap by crafting prompts that manipulate the model’s outputs.

The Unit 42 researchers shared insights in a blog post, stating, “A practical rule of thumb emerges: never let the sentence end. Finish the jailbreak before a full stop, and the safety model has far less opportunity to re-assert itself.” Their findings revealed an alarming success rate of between 80% and 100% using this tactic against various mainstream models, including Google’s Gemini, Meta’s Llama, and OpenAI’s latest model, gpt-oss-20b.

Exploiting Image-Based Vulnerabilities

Research conducted by Trail of Bits revealed another layer of vulnerability related to image processing. In their experiments, images embedded with harmful instructions were able to exfiltrate sensitive data when viewed at different resolutions. For instance, the images appeared innocuous at full size, but when downscaled, hidden commands were revealed. One command instructed the Google Command Line Interface (CLI) to check a user’s calendar for upcoming events and send pertinent information via email.

This method of attack was effective against multiple platforms, including Google’s Gemini, Vertex AI Studio, and Google Assistant. The researchers noted that while specific adjustments are necessary for different models, the exploit is widespread and poses a significant risk across various applications.

Shipley pointed out that the issue of embedding malicious code in images is not new, calling it “foreseeable and preventable.” He criticized the AI industry for treating security as an afterthought, stating that many AI systems were built with inadequate security measures from the outset.

Further complicating matters, a study by security firm Tracebit highlighted additional vulnerabilities within the Google CLI. They found that a blend of prompt injection, insufficient validation, and poor user experience could allow malicious actors to access sensitive data undetected.

Valence Howden, an advisory fellow at Info-Tech Research Group, noted that these vulnerabilities stem from a fundamental misunderstanding of AI operations. He explained, “It’s difficult to apply security controls effectively with AI; its complexity and dynamic nature make static security controls significantly less effective.” With approximately 90% of models trained in English, the loss of contextual cues when different languages are introduced exacerbates the issue.

Shipley reiterated that the security design of many AI models remains fundamentally flawed. He criticized the industry for prioritizing the quantity of data over quality, stating, “There’s so much bad stuffed into these models in the mad pursuit of ever-larger corpuses in exchange for hoped-for performance increases.” He likened LLMs to “a big urban garbage mountain that gets turned into a ski hill,” suggesting that while external appearances may seem acceptable, significant issues lurk beneath the surface.

In conclusion, the research underscores the urgent need for improved security measures in AI systems. As vulnerabilities continue to be uncovered, the industry must prioritize robust, proactive strategies to protect against these threats.

TDGNEWS

Researchers Uncover Vulnerabilities in AI Language Models

Science

Researchers Uncover Vulnerabilities in AI Language Models

Understanding the Refusal-Affirmation Logit Gap

Exploiting Image-Based Vulnerabilities

Kiwis Can Invest in Property for Just $100 with Housies Launch

TikTok Unveils Nominees for 2025 Awards Celebrating Creators

Auckland Woman Advocates for Funding of Breast Cancer Treatment

Nostalgia Quiz: Recall the Cleaning Product That Shined Without Scratching

Trump Imposes Major Sanctions on Russian Oil Giants Amid Ukraine War

Alan Hamel Introduces AI Clone of Suzanne Somers to Honor Her Legacy

All Blacks Assistant Coach Jason Holland Steps Down After Bledisloe Win

Health NZ Alerts Public After Measles Cases Linked to Ferry

Tributes Pour in for Former Prime Minister Jim Bolger at Funeral

Test Your Knowledge: Take the Herald’s Afternoon Quiz Today

PM Faces Backlash from Fans During Netball Trophy Ceremony

Dunedin Designers Win Top Award at Hokonui Fashion Event

Liam Lawson Launches New Era for Racing Bulls with Strong Start

Disney Fan Reveals Dress Code Tips for Park Visitors

Coalition Forms to Preserve Māori Wards in Hawke’s Bay

Walking Faster Offers Major Health Benefits for Older Adults

Scots Rally with Humor and Music to Protest Trump’s Visit

UK and India Finalize Trade Deal to Boost Economic Ties

Trending

TDGNEWS

Researchers Uncover Vulnerabilities in AI Language Models

Understanding the Refusal-Affirmation Logit Gap

Exploiting Image-Based Vulnerabilities

You may like

Kiwis Can Invest in Property for Just $100 with Housies Launch

TikTok Unveils Nominees for 2025 Awards Celebrating Creators

Auckland Woman Advocates for Funding of Breast Cancer Treatment

Nostalgia Quiz: Recall the Cleaning Product That Shined Without Scratching

Trump Imposes Major Sanctions on Russian Oil Giants Amid Ukraine War

Alan Hamel Introduces AI Clone of Suzanne Somers to Honor Her Legacy

All Blacks Assistant Coach Jason Holland Steps Down After Bledisloe Win

Health NZ Alerts Public After Measles Cases Linked to Ferry

Tributes Pour in for Former Prime Minister Jim Bolger at Funeral

Test Your Knowledge: Take the Herald’s Afternoon Quiz Today

PM Faces Backlash from Fans During Netball Trophy Ceremony

Dunedin Designers Win Top Award at Hokonui Fashion Event

Liam Lawson Launches New Era for Racing Bulls with Strong Start

Disney Fan Reveals Dress Code Tips for Park Visitors

Coalition Forms to Preserve Māori Wards in Hawke’s Bay

Walking Faster Offers Major Health Benefits for Older Adults

Scots Rally with Humor and Music to Protest Trump’s Visit

UK and India Finalize Trade Deal to Boost Economic Ties

Trending