Slopsquatting: A silent threat born from the hallucinations of LLMs

The arrival of generative artificial intelligence tools has quickly transformed how we develop software. These days, it is common to see professionals coding with the help of assistants based on artificial intelligence (AI), which are sometimes directly integrated into their development environments (such as Github Copilot, Cursor, or the best-known example, ChatGPT). These tools can suggest code fragments, explain errors, and even help us understand complex libraries or APIs without leaving the editor.

This has produced a paradigm shift in how many people who work with code perform their everyday tasks. The impact has been so noticeable that even popular online platforms like Stack Overflow have reported significant decreases in traffic and participation. Why? Because many developers, especially younger ones, prefer to find answers to their questions by asking a large language model rather than by searching traditional forums.

But there is some bad news. While these tools may help speed up our workflow, they have also brought some new challenges that we cannot ignore. One of these involves a characteristic that has now come to be associated with this technology, known as “hallucinations”.

When AI gets too creative

An AI hallucination occurs when a large language model (LLM) generates a response that seems convincing, but which is inaccurate or even entirely untrue [1]. And this phenomenon is not some sort of occasional bug. Instead, it is an inherent part of how large language models work, so they can use a certain amount of creativity when generating texts [2].

The problem is that these models have no actual understanding of the real world, and in general, they have no verification mechanisms to determine when there is something that they simply don’t “know” [3]. They are trained to use statistics to complete linguistic patterns, which means that when faced with an unusual or poorly formulated prompt, they may provide a well-written but excessively “inventive” response [4].

In the context of programming, where many of the questions they are asked refer to entirely new situations, their inventiveness can translate into suggestions for functions, classes, or even entire packages that sound legitimate... but don’t actually exist. And this is where the problem gets more serious.

A new threat is born: slopsquatting

The term “slopsquatting” was coined by Seth Larson, a security developer at the Python Software Foundation. It refers to a malicious technique that is specifically based on those hallucinations. When an AI model suggests a package that does not actually exist, a cyberattacker can register it in public repositories such as PyPI or npm. If this occurs, an unwary programmer may inadvertently download a version that contains malicious code.

Unlike the better-known phenomenon of typosquatting, which takes advantage of common typographical errors made by users, with slopsquatting, the “confusion” is introduced by the large language model. This can be seen as a more sophisticated threat because it is able to take advantage of the trust we place in the suggestions being generated by these intelligent tools. What this means is that when a developer blindly trusts the suggestions given by an AI tool, and therefore includes those packages in a project, an unwanted door into the system running the code may be left open.

What are researchers saying?

A recent study [5] performed at a group of American universities (including Virginia Tech and the University of Texas at San Antonio) analyzed more than 500,000 code fragments generated using AI. Among its conclusions:

19.7% of the packages suggested by the models were non‑existent.
Open-source models, such as Code Llama and WizardCoder, showed higher hallucination rates (up to 21.7%) compared to commercial models like GPT‑4 Turbo (3.59%).
Many of these hallucinations were recurrent and persistent: 43% of them continued to appear when the same prompts were submitted 10 times in a row.
38% of the hallucinated package names were surprisingly similar to the names of real packages, with some even valid in other programming languages. This increases the chance that they might go unnoticed.

Based on these figures, we can clearly state that these are not occasional, one‑off errors. They are a pattern, and that pattern can be exploited as a vulnerability, with the aim of infiltrating the software supply chain. And this can have serious consequences for any project, regardless of its size.

What does this mean for those of us who develop software?

At GMV, we are fully aware of the benefits that AI assistance tools can offer. However, we also know that those tools are far from infallible, and we have studied their current limitations [5]. We use them to help us work in a more efficient way, but while always keeping our guard up concerning the results they generate.

Although there are no magic formulas, we apply a series of best practices to minimize our risks. For example:

We manually verify the existence of any package suggested by an AI tool, and we make sure that it comes from a trustworthy source.
We use dependence analysis tools to identify potential vulnerabilities and alert us to any suspicious components.
We test the code in secure environments before it is integrated into any actual systems.
We provide ongoing training to our teams on the subjects of security and responsible use of AI tools.
We encourage peer review, especially whenever code suggested by an AI tool is being introduced.

This combination of technological safeguards and human judgment is essential, so we can ensure the safety and quality of the products we develop.

How can we mitigate this risk?

At GMV, we continue to explore and use these tools with enthusiasm, but never losing sight of our responsibility as developers of critical systems. Proper use of any technology depends upon the decisions we make. AI can be a good ally, but it has to be applied with good human judgment. In the end, what can really set us apart is not the ability to use intelligent suggestions or lines of automated code. Instead, it is the ability to apply our professional standards, experience, and commitment to a job well done.

Author: Dr. David Miraut

References

[1] R. Tenajas-Cobo and D. Miraut-Andrés, ‘Riesgos en el uso de Grandes Modelos de Lenguaje para la revisión bibliográfica en Medicina’ (‘Risks in the use of Large Language Models for literature review in Medicine’), Investig. en Educ. Médica, vol. 13, no. 49, art. no. 49, Jan. 2024, doi: 10.22201/fm.20075057e.2024.49.23560.

[2] R. Tenajas and D. Miraut, ‘The 24 Big Challenges of Artificial Inteligence Adoption in Healthcare: Review Article’, Acta Medica Ruha, vol. 1, no. 3, art. no. 3, Sep. 2023, doi: 10.5281/zenodo.8340188.

[3] R. Tenajas and D. Miraut, ‘The Risks of Artificial Intelligence in Academic Medical Writing’, Ann. Fam. Med., no. 2023, p. eLetter, Feb. 2024.

[4] R. Tenajas and D. Miraut, ‘El pulso de la Inteligencia Artificial y la alfabetización digital en Medicina: Nuevas herramientas, viejos desafíos’ (‘The pulse of Artificial Intelligence and digital literacy in medicine: new tools, old challenges’), Rev. Medica Hered., vol. 34, no. 4, pp. 232–233, Oct. 2023, doi: 10.20453/rmh.v34i4.5153.

[5] J. Spracklen, R. Wijewickrama, A. H. M. N. Sakib, A. Maiti, B. Viswanath, and M. Jadliwala, ‘We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs’, Mar. 2, 2025, arXiv: arXiv:2406.10279. doi: 10.48550/arXiv.2406.10279.

Slopsquatting: A silent threat born from the hallucinations of LLMs

When AI gets too creative

A new threat is born: slopsquatting

What are researchers saying?

What does this mean for those of us who develop software?

How can we mitigate this risk?

Comments

Plain text

Related