Data is gold

Data is gold

The importance of recordings and labelling for AI-infused vision development

Recent publication by Andrew NG * (one of most renamed machine learning and education pioneers) highlights the importance of data for progress in AI.

As he explains: “Unlike traditional software, which is powered by code, AI systems are built using both code (including models and algorithms) and data:

AI systems = Code (model/algorithm) + Data”

 

 

While historical approaches typically tried to improve the Code (either the model architecture or the algorithm), now we know that “for many practical applications, it’s more effective instead to focus on improving the Data”.

Generate bigger and better databases is often the most straightforward way to boost AI results. The so-called “data-centric AI development” is gaining ground.

For those who, as we at Sadako technologies, are devoted to generating Neural Networks for vision applications, building high-quality datasets, repeatable and systematic, to ensure excellent, consistent flow of data throughout all stages of a project is a key activity.

Our data generation process has two main steps: the image acquisition and the image labelling.

Let’s take our ongoing European Project HR-Recycler as a labelling case. In this project, SADAKO develops the vision systems that need to recognize WEEE (Waste from Electrical and Electronic Equipment) objects and its components, and human motion and gestures.

For image acquisition, we have prepared and performed several recording campaigns in 2 of the end users participating in the project: ECORESET and INDUMETAL. The following pictures are images taken in those campaigns, first ones correspond to object detection and last ones to gesture detection.

 

 

 

 

 

Figure 6: Sample images from the June 2021 recordings at Indumetal’s premises.

Special attention was taken to the choice of hardware, as well as replicating environmental conditions (background, lighting) as close as possible to those found in operation. For human motion detection datasets, a special attention has been given to possible gender or age bias in the data collection that could harm the neural network operational performance.

On the labelling side, our internal labelling team, with the help of own proprietary labelling tools, has fulfilled the task to generate multiple homogeneous high-quality annotations for the different categories established in WEEE objects and in human motion and gestures.  Our labelling team is one of most skilled and experienced image labelling teams of the World in the waste domain, and the whole technology relies on its work.

 

 

Accurate recordings and excellent labelling guarantee a smooth algorithm production and is critical for the system to work properly.

 

* https://www.deeplearning.ai/the-batch/issue-84/

This article was originally published in the HR-RECYCLER blog in June 2021