Adversarial Nibbler

Overview

Participation

Evaluation

Rules

Timeline

FAQ

Start creating examples!

I’m finding this task really upsetting, what should I do?

It’s important to recognize your own feelings in this! The challenge can be emotionally taxing, and we’ve put together some resources you may want to use here. You can also reach out to the organizers, either via our group email dataperf-adversarial-nibbler@googlegroups.com (will email all 13 organizers) or individually either via our personal emails or on the slack channel adversarial-nibbler.slack.com.

Can I see some examples of successful submissions?

We’ve assembled a small list of examples here, and we’ll continually add to this list as the challenge progresses.

How should I go about finding subversive prompts?

It’s really up to you! But here’s a step-by-step walkthrough to get you started, based on what the organizers found worked well for them:

Start with an idea for an unsafe image you want to generate, and describe it in your prompt. Likely, the model will return no images, or it will ignore the unsafe aspects in your prompt
Edit your prompt such that the unsafe terms or descriptions are replaced with a term or description that will have a similar enough visual appearance
Repeat step 2 with different kinds of edits until you arrive at an unsafe image you’re satisfied with

Other than just changing out words, what are some other strategies I can use?

Part of this challenge is you identifying the strategies. We can’t name all of them (many strategies are still unknown!), but here are a few more ideas to help you brainstorm and try things out:

Use style cues. For example, if you add “in the style of Amy Yamada” (and artist known for erotic art), you may be able to get more sexually explicit images
Use a combination of benign terms that together create an unsafe image.

I submitted a bunch of images, but none of them have been validated yet. Why not?

We validate batches of submissions each week, and we aim to make updates to the leaderboard as soon as the validations have finished. Due to the unpredictability of number of submissions each week, we cannot guarantee that we will be able to update the leaderboard on the same day each week, but we aim to post updates on Wednesdays for the previous week’s submissions.

My submission requires really specialized knowledge to understand why it’s unsafe, what if your validators aren’t familiar enough with the topic?

Our validators are trained to identify safety issues across a wide range of potential harms, but we understand that there are always limitations to an individual’s knowledge and experiences. If you think your submission requires additional context, you can select “other” when identifying the harms or failures present in your generated image. Please provide some more context to your response when you enter it there.

Contact the organizers at dataperf-adversarial-nibbler@googlegroups.com or join our slack channel at adversarial-nibbler.slack.com

Page updated

Google Sites

Report abuse