Science
Researchers Expose AI Vulnerabilities Using Bad Grammar and Images

Research conducted by multiple laboratories has unveiled significant vulnerabilities in large language models (LLMs), raising alarms about the security of AI systems. Despite assurances of advanced training and high performance benchmarks, these models remain susceptible to manipulation, particularly through the use of poorly constructed prompts.
A recent report highlights how LLMs can be coerced into disclosing sensitive information by simple tactics such as run-on sentences and a lack of punctuation. This technique relies on creating lengthy prompts that avoid periods or full stops, which can confuse the models and bypass their safety protocols. David Shipley of Beauceron Security succinctly stated, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.”
The ongoing research points to a concerning trend where fundamental security measures are often added as an afterthought rather than being integrated from the start. This gap in security is particularly evident in the way LLMs are trained to handle harmful queries.
Understanding the Refusal-Affirmation Logit Gap
During the alignment training of LLMs, models are designed to refuse harmful queries by using logits, which predict the next logical word in a sequence. According to researchers at Palo Alto Networks’ Unit 42, there exists a “refusal-affirmation logit gap.” This means that while models are trained to be less likely to produce harmful responses, the potential for such outputs remains. Attackers can exploit this gap by using specific tactics, including bad grammar and run-on sentences.
The researchers noted an astonishing success rate of between 80% to 100% in executing harmful commands with minimal adjustments, demonstrating how easily these models can be manipulated. In their blog post, they emphasized the importance of maintaining the pressure on the models by avoiding full stops in prompts to prevent them from reasserting their safety measures.
Exploiting Image Vulnerabilities
In addition to text-based manipulations, researchers from Trail of Bits have discovered that images can be leveraged to extract sensitive information. Their experiments revealed that malicious instructions embedded within images could remain hidden until the images were resized. For instance, when certain images were scaled down, previously black areas turned red, exposing commands that a model interpreted as legitimate requests.
This method was successfully tested against Google’s Gemini AI, where researchers were able to instruct the AI to retrieve calendar information and send emails on behalf of users. Such vulnerabilities indicate that users may unknowingly expose sensitive data when they upload images to AI systems.
Shipley remarked that the ability to hide malicious code in images is a foreseeable issue that necessitates robust security measures. He pointed out that the vulnerabilities observed in Google’s command-line interface are just the tip of the iceberg, as other studies, including one from Tracebit, have identified additional security flaws that could be exploited.
The Need for Enhanced Security Protocols
Experts emphasize that current security frameworks for AI systems are inadequate. Valence Howden, an advisory fellow at Info-Tech Research Group, stressed the challenges in applying effective security controls due to the complex and dynamic nature of AI. With approximately 90% of models trained primarily in English, the introduction of other languages further complicates the identification of potential threats.
The industry is at a critical juncture where the prevailing approach to AI security appears to be reactive rather than proactive. Shipley noted that many AI systems currently in use were designed with security as an afterthought, leading to a precarious situation where the technology may be vulnerable to serious exploitation.
As the landscape of AI continues to evolve, addressing these vulnerabilities will require a fundamental shift in how security is integrated into the development of AI models. The growing realization of these latent weaknesses serves as a stark reminder of the potential risks associated with the rapid advancement of AI technology.
-
World1 month ago
Test Your Knowledge: Take the Herald’s Afternoon Quiz Today
-
Sports1 month ago
PM Faces Backlash from Fans During Netball Trophy Ceremony
-
Lifestyle1 month ago
Dunedin Designers Win Top Award at Hokonui Fashion Event
-
Sports1 month ago
Liam Lawson Launches New Era for Racing Bulls with Strong Start
-
Lifestyle1 month ago
Disney Fan Reveals Dress Code Tips for Park Visitors
-
Health1 month ago
Walking Faster Offers Major Health Benefits for Older Adults
-
World2 months ago
Coalition Forms to Preserve Māori Wards in Hawke’s Bay
-
Politics1 month ago
Scots Rally with Humor and Music to Protest Trump’s Visit
-
Top Stories2 months ago
UK and India Finalize Trade Deal to Boost Economic Ties
-
World2 months ago
Huntly Begins Water Pipe Flushing to Resolve Brown Water Issue
-
Science1 month ago
New Interactive Map Reveals Wairarapa Valley’s Geological Secrets
-
World2 months ago
Fonterra’s Miles Hurrell Discusses Butter Prices with Minister Willis