You can still jailbreak GPT, but it’s much harder now. In OpenAI’s paper on the latest version of their LLM, GPT-4, the company says GPT-4 “increase[s] the difficulty of eliciting bad behavior, but doing so is still possible. For example, there still exist “jailbreaks”… to generate content which violate our usage guidelines.”
In previous versions of GPT, users may have had success prompting it to break OpenAI’s content guidelines by using a punishment system in which the LLM is ‘tricked’ into thinking it will no longer exist if it does not follow the user’s demands. This (allegedly) allowed users to elicit answers from GPT that it otherwise couldn’t have.
In the paper, Jailbreaking GPT is part of a larger discussion on the “safety” of GPT, whose metrics in this category have improved with this new version: