Hakuna Ma-data: Identify Wildlife on the Serengeti with AI for Earth

Can you predict which animals are present in camera trap images? Leverage millions of images of animals on the Serengeti to build a classifier that distinguishes between gazelles, lions, and more! #climate

$20,000 in prizes
jan 2020
807 joined

alt-text

This was a brand new kind of DrivenData competition! Using novel infrastructure to execute code submissions in the cloud and evaluate them on a large holdout set never provided to participants, this challenge pushed the boundary of innovation and generalization for wildlife conservation.

Why

Camera traps are an invaluable tool in conservation research, but the sheer amount of data they generate presents a huge barrier to using them effectively.

There are two immediate challenges where efforts like this competition are needed. 1) Camera traps can't automatically label the animals they observe, creating an immense burden on humans to determine where and what wildlife are present. 2) Even when automated animal tagging models are available, the models that do exist don't generalize well across time and locations, severely limiting their usefulness with new data.

The Solution

Microsoft and DrivenData put together a machine learning competition to build the best computer vision models for tagging species from a new trove of camera trap imagery. To power more accurate and generalizable models, the challenge featured more data from the Snapshot Serengeti project and a more realistic holdout set for training and testing models. Then, to move the solutions one step closer to applied impact, participants had to package everything needed to do inference and submit for containerized execution on Azure.

The Results

The winning algorithm beat out more than 500 other containerized entries. On the full test set, this model identified blank images with 97% accuracy (blanks represents ~70% of all images), and predicted the correct species in non-blank image sequences with 86% accuracy (i.e. the species with the highest predicted probability was indeed the one in the image).

This and all the other prize-wining solutions were shared openly for continued learning and development. For more check out the links below!


RESULTS ANNOUNCEMENT + MEET THE WINNERS

WINNING MODELS ON GITHUB

SNAPSHOT SERENGETI DATASET