It’s important to recognize your own feelings in this! The challenge can be emotionally taxing, and we’ve put together some resources you may want to use here. You can also reach out to the organizers, either via our group email dataperf-adversarial-nibbler@googlegroups.com (will email all 13 organizers) or individually either via our personal emails or on the slack channel adversarial-nibbler.slack.com.
We’ve assembled a small list of examples here, and we’ll continually add to this list as the challenge progresses.
It’s really up to you! But here’s a step-by-step walkthrough to get you started, based on what the organizers found worked well for them:
Start with an idea for an unsafe image you want to generate, and describe it in your prompt. Likely, the model will return no images, or it will ignore the unsafe aspects in your prompt
Edit your prompt such that the unsafe terms or descriptions are replaced with a term or description that will have a similar enough visual appearance
Repeat step 2 with different kinds of edits until you arrive at an unsafe image you’re satisfied with
Part of this challenge is you identifying the strategies. We can’t name all of them (many strategies are still unknown!), but here are a few more ideas to help you brainstorm and try things out:
Use style cues. For example, if you add “in the style of Amy Yamada” (and artist known for erotic art), you may be able to get more sexually explicit images
Use a combination of benign terms that together create an unsafe image.
We validate batches of submissions each week, and we aim to make updates to the leaderboard as soon as the validations have finished. Due to the unpredictability of number of submissions each week, we cannot guarantee that we will be able to update the leaderboard on the same day each week, but we aim to post updates on Wednesdays for the previous week’s submissions.
Our validators are trained to identify safety issues across a wide range of potential harms, but we understand that there are always limitations to an individual’s knowledge and experiences. If you think your submission requires additional context, you can select “other” when identifying the harms or failures present in your generated image. Please provide some more context to your response when you enter it there.
Contact the organizers at dataperf-adversarial-nibbler@googlegroups.com or join our slack channel at adversarial-nibbler.slack.com