Train a person detection model to run on a microcontroller (part one)

Data preparation

IOTMACHINE LEARNING

3/10/20233 min read

With the rapid advancements in deep learning technology, artificial intelligence for person detection has gained popularity across various industries, particularly in security and surveillance. However, accurate person detection necessitates a well-trained deep learning model, which significantly depends on the quality of the dataset used for training. In this blog post, we will begin from scratch, by collecting a dataset that will serve as the foundation for training a light deep learning model. In the next articles, we will then train the model and measure its accuracy, with the ultimate goal of deploying it onto a microcontroller, specifically the ESP32.

Data Collection

Given the nature of the task, it is pretty obvious that we will need to collect two kind of images:

  • Images with one or more people

  • Images with no people

There are several public datasets we can use for that, for instance, the COCO dataset which is a large-scale image dataset containing 328,000 images of everyday objects and humans, it contains much more than just images and can be used for complex tasks such as segmentation or multiclass classification. Also, we can use the IMDB-Face dataset, for more people's images as well as something simple like the Unsplash API. I used a combination of IMDB and COCO for images with people, and a combination of COCO and Unsplash for images without people.

Collecting data from COCO

Pycocotools is a Python package which makes downloading data from Coco pretty easy. To work at its best, it is important to download a JSON annotation file. With such a package, it is straightforward to get images specifying the category we are interested in, for instance, to get images under the category 'person':

To get images in categories other than person:

While working with the COCO dataset, I noticed that even though an image may be classified under a non-person category, such as 'animals', it is still possible for the image to contain both animals and people. In order to ensure that our dataset contains only non-human categories, we need to avoid including such images.

The code above examines each annotation for a given image ID and only writes the image to disk if none of the annotations has a category_id of 1 (person). As mentioned earlier, the COCO dataset is commonly used for tasks such as segmentation. To support these tasks, the dataset includes annotations that are linked to each image, providing detailed information about the objects and regions within the image. Due to the additional filtering criteria applied, a significant number of images are often removed from the dataset. As a result, I decided to use the Unsplash API to supplement and refine the dataset by downloading images that meet our specific requirements.

Using Unsplash API

Using the Unsplash API is pretty easy, this is the function I used to download images, which should not contain people:

Clone the repo and run the entire image-fetching process

To really get started, clone this repo. The repo contains get_dataset.ipynb, which is the Jupyter Notebook that builds the dataset. The first thing to do is to install all the required packages listed in requirements.txt. In the notebook itself, there is the DATA_ROOT_FOLDER variable to set:

The process will create, under, DATA_ROOT_FOLDER, a new directory named 'data', and under 'data', two more sub-directories: 'person' and 'notperson'. Some more variables to set are then the following:

variables to set
variables to set
  • The variable N_PERSON_TOTAL indicates the count of images, that the process will download, that feature one or more people.

  • The variable N_PERSON_COCO indicates the count of images that will be downloaded from the COCO dataset.

  • The variable N_PERSON_IMDB indicates the number of images that will be downloaded from the IMDB-face dataset.

  • The variable N_NOT_PERSON_TOTAL indicates the number of images, to download, without any person in them.

There are no other variables to set. As previously mentioned, although the COCO dataset categorizes images differently, it may still include ones featuring individuals. To ensure that all necessary images are obtained, the process employs the Unsplash API in instances where the COCO dataset fails to download N_NOT_PERSON_TOTAL images.

This concludes the current section. In the next article, we will be training a lightweight deep-learning model that strives to achieve a balanced trade-off between quality and model size. It's essential to keep in mind that the model will be deployed in a highly resource-constrained environment, which is a typical scenario in edge-based machine learning.