Evaluating Prompt Injection Defenses in Large Language Models: A Multi-Model Empirical Study of Security–Usability Trade-offs

Robert Kemp

doi:10.70715/jitcai.2026.v3.i3.071

Authors

Robert Kemp University of Portsmouth Author

DOI:

https://doi.org/10.70715/jitcai.2026.v3.i3.071

Keywords:

Large Language Models, Prompt Injection, Artificial Intelligence Security, Adversarial Machine Learning, Usability Trade-offs, Experimental Evaluation

Abstract

The increasing integration of Large Language Models (LLMs) into organisational and consumer-facing systems has introduced novel security vulnerabilities, among which prompt injection attacks represent a particularly critical threat. These attacks exploit the natural language interface of LLMs to manipulate model behaviour without requiring access to underlying architectures or parameters. Despite a growing body of research, there remains a lack of comprehensive empirical evaluations examining the effectiveness of mitigation strategies across multiple models. This study presents a systematic experimental evaluation of prompt injection attacks and layered defensive controls across four open-source LLMs: Gemma 3, Llama 3, Mistral, and Phi-3 Mini. The analysis employs a set of quantitative metrics, including detection rate, false positive rate, attack success rate reduction, accuracy degradation, and performance overhead. The results indicate that while the implementation of layered controls significantly reduces the success rate of prompt injection attacks, these improvements are accompanied by measurable reductions in model usability and output fidelity. Furthermore, the findings reveal that certain defensive mechanisms may inadvertently alter prompt semantics in ways that diminish the effectiveness of intrinsic safety features, thereby enabling previously unsuccessful attacks. These outcomes suggest that prompt injection constitutes a structural vulnerability inherent to current LLM architectures. The paper concludes by advocating for adaptive and context-aware defence strategies and outlines key directions for future research.

Downloads

Download data is not yet available.

References

[1] L. Banh and G. Strobel, “Generative Artificial Intelligence,” Electronic Markets, vol. 33, no. 1, pp. 1–17, 2023. doi: 10.1007/s12525-023-00680-1.

[2] R. Y. Choi, A. S. Coyner, J. Kalpathy-Cramer, M. F. Chiang and J. P. Campbell, “Introduction to Machine Learning, Neural Networks, and Deep Learning,” Translational Vision Science & Technology, vol. 9, no. 2, p. 14, 2020. doi: 10.1167/tvst.9.2.14.

[3] S. Abdelnabi, K. Greshake, S. Mishra, C. Endres, T. Holz and M. Fritz, “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” 2023. doi: 10.1145/3605764.3623985.

[4] E. Derner et al., “A Security Risk Taxonomy for Prompt-Based Interaction with Large Language Models,” IEEE Access, vol. 12, pp. 126176–126187, 2024. doi: 10.1109/ACCESS.2024.3450388.

[5] V. Benjamin et al., “Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures,” arXiv preprint arXiv:2410.23308, 2024. doi: 10.48550/arxiv.2410.23308.

[6] T. Geng, Z. Xu, Y. Qu, and W. E. Wong, “Prompt Injection Attacks on Large Language Models: A Survey of Attack Methods, Root Causes, and Defense Strategies,” Computers, Materials & Continua, vol. 0, no. 0, pp. 1–10, 2025, doi: https://doi.org/10.32604/cmc.2025.074081.

[7] A. Alzahrani, “PromptGuard a structured framework for injection resilient language models,” Scientific Reports, vol. 16, no. 1, pp. 1277–1277, Jan. 2026, doi: https://doi.org/10.1038/s41598-025-31086-y.

[8] S. Gulyamov et al., “Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms,” Information, vol. 17, no. 1, p. 54, Jan. 2026, doi: https://doi.org/10.3390/info17010054.

[9] A. Alobaid, M. J. Roca, C. Castillo and J. Vendrell, “The Echo Chamber Multi-Turn LLM Jailbreak,” arXiv preprint arXiv:2601.05742, 2026.

[10] S. A. Akheel, “Guardrails for Large Language Models: A Review of Techniques and Challenges,” Journal of Artificial Intelligence, Machine Learning and Data Science, vol. 3, no. 1, pp. 2504–2512, 2025. doi: 10.51219/jaimld/syed-arham-akheel/536.

[11] J. Dai et al., “Safe RLHF: Safe Reinforcement Learning from Human Feedback,” arXiv preprint arXiv:2310.12773, 2023. doi: 10.48550/arxiv.2310.12773.

[12] Raden Budiarto Hadiprakoso, Wiyar Wilujengning, and Amiruddin Amiruddin, “Adaptive Multi Layer Framework for Detecting and Mitigating Prompt Injection Attacks in Large Language Models,” Journal of Information Systems Engineering and Business Intelligence, vol. 11, no. 3, pp. 473–487, Oct. 2025, doi: https://doi.org/10.20473/jisebi.11.3.473-487.

[13] ‌S. Gulyamov et al., “Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms,” Information, vol. 17, no. 1, p. 54, Jan. 2026, doi: https://doi.org/10.3390/info17010054.

[14] L. Chen and G. Varoquaux, “What is the Role of Small Models in the LLM Era: A Survey,” arXiv preprint arXiv:2409.06857, 2024. doi: 10.48550/arxiv.2409.06857.

[15] L. Thode, U. Iftikhar, and D. Mendez, “Exploring the use of LLMs for the Selection phase in systematic literature studies,” Information and Software Technology, p. 107757, May 2025, doi: https://doi.org/10.1016/j.infsof.2025.107757.

[16] ‌T. A. Slocum, S. E. Pinkelman, P. R. Joslyn, and B. Nichols, “Threats to Internal Validity in Multiple-Baseline Design Variations,” Perspectives on Behavior Science, vol. 45, no. 3, Jan. 2022, doi: https://doi.org/10.1007/s40614-022-00326-1.

[17] S. Beckers, “Large Language Models as Nondeterministic Causal Models,” arXiv.org, 2025. https://arxiv.org/abs/2509.22297 (accessed Apr. 29, 2026).

[18] A. J. Averitt, P. B. Ryan, C. Weng, and A. Perotte, “A conceptual framework for external validity,” Journal of Biomedical Informatics, vol. 121, p. 103870, Sep. 2021, doi: https://doi.org/10.1016/j.jbi.2021.103870.

[19] S. Abdelnabi, K. Greshake, S. Mishra, C. Endres, T. Holz and M. Fritz,

“Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” in Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2023. doi: 10.1145/3605764.3623985.

[20] E. Derner et al., “A Security Risk Taxonomy for Prompt-Based Interaction with Large Language Models,” IEEE Access, vol. 12, pp. 126176–126187, 2024.

doi: 10.1109/ACCESS.2024.3450388.

[21] S. A. Akheel, “Guardrails for Large Language Models: A Review of Techniques and Challenges,” Journal of Artificial Intelligence, Machine Learning and Data Science, vol. 3, no. 1, pp. 2504–2512, 2025. doi: 10.51219/jaimld/syed-arham-akheel/536.

[22] D. Ayzenshteyn, R. Weiss and Y. Mirsky, “Cloak, Honey, Trap: Proactive Defenses Against LLM Agents,” in Proceedings of the 34th USENIX Security Symposium, 2025, pp. 8095–8114.

[23] V. Benjamin et al., “Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures,” arXiv preprint arXiv:2410.23308, 2024.

doi: 10.48550/arXiv.2410.23308.

Evaluating Prompt Injection Defenses in Large Language Models: A Multi-Model Empirical Study of Security–Usability Trade-offs

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Data Availability Statement

Issue

Section

License

How to Cite

Share

Make a Submission

About US

ISSN

Policies

Browse Articles

Indexing

Keywords