ArtPrompt: ASCII Art-Based Jailbreak Attacks Against Aligned LLMs

In this paper, the author emphasizes the importance of safety in large language models (LLMs) and highlights the vulnerabilities that arise when current safety techniques rely solely on semantics for alignment. A novel ASCII art-based jailbreak attack, ArtPrompt, is introduced to demonstrate how LLMs struggle to recognize prompts that go beyond traditional text-based inputs. The attack exploits this weakness to bypass safety measures and elicit undesired behaviors from LLMs with just black-box access. The surprising findings reveal that even state-of-the-art LLMs like GPT-3.5 and GPT-4 are susceptible to this type of attack, making it a significant threat to their security.

https://arxiv.org/abs/2402.11753