Training Set Cleaning
We ask you to please not look at or use the provided test sets in any way other than for offline evaluation.
We ask you to only use the provided data for developing your solution (unless otherwise explicitly stated).
Algorithmic submissions may not rely on external intervention (e.g. humans, extra data). The results should be reproducible and extensible to other datasets.
Your developed solution should be practical and reasonably efficient given the scope of the challenge (e.g., your algorithm shouldn’t perform an exhaustive search).
Rules regarding participation:
Participants can only belong and participate in one team
Individuals are considered a team
Teams must be defined before the end of the challenge
Each team must have a leader who is responsible for submissions to the online evaluation platform
Participants should not access or inspect submissions or selection code from other participating teams until after the challenge concludes
Each training set that is part of the final submission will be limited to 1,000 data points. Training sets with more than 1,000 ( imageID, label) pairs will be rejected
For this challenge, the provided candidate pool (i.e. embeddings) has no labels, and as such, part of the challenge involves using the information contained in the embeddings as effectively as possible.
The provided candidate pool is a custom subset of the training set for the Open Images dataset. You may refer to non-labels metadata from the Open Images dataset [link]