How do Deep Learning and neural networks reliably find the right results? In collaboration with our AI expert Wolfgang Andris, we take a closer look at how the algorithms work, how they are adapted and how they are trained.
Preparing neural networks for their use in particular often presents users with a problem: In order for the algorithms to reliably recognize images, they need to be prepared for their task. This requires a sufficiently large and easily identifiable training set on which the neural networks can work and perfect themselves. Only when they successfully complete this training can the actual task be solved satisfactorily.
But how exactly does the process of deep learning for machine vision actually work? We’d like to take a closer look at that today….
Pre-trained networks help to use Deep Learning economically
Today, images contain vast amounts of information. Even a camera with one megapixel turns out to be a million numbers at the development level. Fortunately, today’s Deep Learning algorithms can easily handle this mass of data: Classification, object recognition and segmentation are among the standard applications for machine vision here. For all areas, there are well-functioning models, libraries as well as pre-trained networks. The latter help to shorten the training time of an own neural network.
Fortunately, the expert community is very concerned about exchange and improvement. For classification tasks, for example, there is the annual ILSVRC – the ImageNet Large Scale Visual Recognition Challenge. In this international competition and its successors on the data science community Kaggle, newly developed neural networks compete against each other. They often originate from very large companies or university working groups, as the development and (pre-)training of the networks require immense effort.
The ILSVRC or comparable competitions help to decide on which neural network and which approaches one’s own further or product development should be based on. Smaller AI teams in particular reduce their risk in this way, since they can rely on a proven foundation. At the same time, they increase their development speed. However, this by no means that there is no further development behind it – as is well known, the devil is in the details. Challenge winners must also be adapted to the task-specific conditions and, if necessary, expanded.
No training without a data set
Before that, a separate data set is necessary on which the future deployment is to be based. For this, individual classes for the neural network must first be defined. These classes reflect in each case the detection IO/ NIO (Alright/ Not Alright). For object detection or in the case of multiple defect options, it is possible to assign additional NIO classes – such as NIO/Scratch, NIO/Smudge. For this purpose, the AI “splits” an image capture into several subsections. These can then be uniquely classified.
As the term network suggests, all information is interconnected via countless internal parameters (also: weights). The properties of the training images are hidden in the values of these weights. This is why task-related training is so important – only in this way do the weights also fit new data. What exactly a specific weight stands for, however, is not apparent from a neuronal network.
But how does the AI know that it is arriving at correct results? So-called labels, which have to be assigned manually for a training set, help here. These are the actual classes of the captured images. They are therefore also called ground truth. In a well trained neuronal network all predicted classes match the labels. If this is not yet the case, the neural network will adjust itself.
This adjustment is done by changing the internal parameters of the neuronal network: If an image is incorrectly classified, a backpropagation algorithm is used to calculate which weights are “responsible” for the incorrect classification. These are then adjusted so that the same image is better classified in the next run. In this way, the network is optimized to the existing dataset as training progresses.
How can it be ensured that the model works reliably in practice?
What initially sounds like a successful method for optimizing results leads to problems in practice every now and then. The culprit is the fact that the neuronal network has only a limited data set available for training. It can adapt to this set as much as possible by repeated runs. The system then quickly overshoots its target – and the AI achieves worse results for new data than during training. This phenomenon is called overfitting. Developers avoid this in addition to so-called methods for regularization by splitting the available data set:
- 70% for training the neuronal network.
- 15 % to validate training data after each epoch and check if overfitting has already occurred.
- 15 % for the final testing on completely unknown data. Accordingly, this should be performed only once for a training.
In this way, it can be ensured that a neuronal network works as well as possible on unknown data.
How many test parts does Deep Learning need?
This question cannot be answered definitively, since many factors such as the complexity of the recognition tasks, deviations, recording quality, etc. play a role. But with two rules of thumb you can estimate the effort required for a Deep Learning application:
- Approximately 25 samples for a successful proof of concept.
- 100 samples to implement an industrial use on the store floor.
What sounds like a lot at first, however, comes about quickly when the recording of this test data is embedded in the previous object testing. For example, employees at test stands performing visual inspection can easily hold parts in front of a recording device and mark them as IO/NIO via software. For precise location of a defect, a bounding box can also be defined. Alternatively, a robot can pick up listed parts and present them to the camera. This results in a valid database in a very short time.
Once the network has been trained, it can be successfully launched into industrial use. Over time, the neuronal network may still miss one or two workpieces. These (or their recordings) can be used at a later time for a so-called post-training. The procedure is the same as for the previous training – but with one advantage: The network already knows its way around and achieves an even higher hit rate in the future thanks to the additional information.