With the rapid advancements in deep learning technology, artificial intelligence for person detection has gained popularity across various industries, particularly in security and surveillance. However, accurate person detection necessitates a well-trained deep learning model, which significantly depends on the quality of the dataset used for training. In this blog post, we will begin from scratch, by collecting a dataset that will serve as the foundation for training a light deep learning model. In the next articles, we will then train the model and measure its accuracy, with the ultimate goal of deploying it onto a microcontroller, specifically the ESP32.
Data Collection
Given the nature of the task, it is pretty obvious that we will need to collect two kind of images:
Images with one or more people
Images with no people



