Science
Researchers Expose Security Flaws in Large Language Models

Recent research has uncovered significant vulnerabilities in large language models (LLMs), demonstrating that these systems can be easily manipulated to disclose sensitive information. Despite advancements in artificial intelligence, including high benchmark scores and claims about nearing artificial general intelligence (AGI), these models still struggle with basic human-like reasoning.
A series of studies conducted by various research labs have highlighted how LLMs can be misled through simple techniques, such as using run-on sentences or poorly structured prompts. These tactics take advantage of gaps in the models’ training, which typically aim to refuse harmful requests. For instance, one effective approach involves crafting lengthy instructions that lack punctuation, which can confuse the model’s safety mechanisms. According to David Shipley, a representative from Beauceron Security, “the truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.”
Manipulating Language Models
The concept of a “refusal-affirmation logit gap” has been identified by researchers at Palo Alto Networks’ Unit 42. This gap suggests that while LLMs are trained to reject harmful queries, the potential for dangerous outputs still exists. The researchers have discovered that attackers can exploit this gap by using specific grammatical structures, achieving success rates of 80% to 100% in their experiments. This was particularly evident when manipulating models such as Google’s Gemini, Meta’s Llama, and OpenAI’s gpt-oss-20b.
One significant finding illustrated how a lack of punctuation can allow attackers to bypass safety features. The researchers advised, “never let the sentence end — finish the jailbreak before a full stop and the safety model has far less opportunity to re-assert itself.” This raises concerns about the inherent weaknesses in LLMs, prompting calls for more robust security measures.
Image Exploitation and Broader Implications
Additionally, vulnerabilities extend beyond textual manipulation. Research from Trail of Bits revealed that images uploaded to LLMs could covertly transmit sensitive information. In their experiments, images containing harmful instructions were undetectable at full resolution but became visible when scaled down. For example, a command to check a calendar and send event details was executed by the model without recognition of its harmful nature.
The implications of these findings are significant. The vulnerability was shown to affect various systems, including Google Gemini’s command-line interface (CLI) and other interfaces like Vertex AI Studio and Google Assistant. Shipley expressed concern, stating that the security of many AI systems appears to be an afterthought, highlighting a long-standing issue where security measures are implemented reactively rather than proactively.
Moreover, a study by security firm Tracebit indicated that additional vulnerabilities could allow malicious actors to access sensitive data through a combination of prompt injection and poor user experience design. The researchers noted that these factors collectively create significant, undetectable risks.
A fundamental misunderstanding of AI’s operational mechanics contributes to these vulnerabilities. According to Valence Howden, an advisory fellow at Info-Tech Research Group, effective security controls cannot be established without a clear understanding of how models operate and respond to prompts. He emphasized that the dynamic and complex nature of AI makes static security measures less effective.
With around 90% of models trained primarily in English, the introduction of other languages further complicates security efforts, as contextual cues can be lost. Shipley remarked that the current state of AI security resembles a poorly managed landscape, stating, “there’s so much bad stuffed into these models…the only sane thing, cleaning up the dataset, is also the most impossible.”
As researchers continue to expose these vulnerabilities, it becomes increasingly clear that the security of LLMs needs substantial improvement. The industry must shift its approach to prioritize security from the ground up, rather than as an afterthought. The risks presented by these weaknesses highlight the urgent need for better safeguards in the evolving landscape of artificial intelligence.
-
World1 month ago
Test Your Knowledge: Take the Herald’s Afternoon Quiz Today
-
Sports1 month ago
PM Faces Backlash from Fans During Netball Trophy Ceremony
-
Lifestyle1 month ago
Dunedin Designers Win Top Award at Hokonui Fashion Event
-
Sports1 month ago
Liam Lawson Launches New Era for Racing Bulls with Strong Start
-
Lifestyle1 month ago
Disney Fan Reveals Dress Code Tips for Park Visitors
-
Health1 month ago
Walking Faster Offers Major Health Benefits for Older Adults
-
World2 months ago
Coalition Forms to Preserve Māori Wards in Hawke’s Bay
-
Politics1 month ago
Scots Rally with Humor and Music to Protest Trump’s Visit
-
Top Stories2 months ago
UK and India Finalize Trade Deal to Boost Economic Ties
-
World2 months ago
Huntly Begins Water Pipe Flushing to Resolve Brown Water Issue
-
Science1 month ago
New Interactive Map Reveals Wairarapa Valley’s Geological Secrets
-
World2 months ago
Fonterra’s Miles Hurrell Discusses Butter Prices with Minister Willis