Vincent NOBLET is an industrial vision project manager at Psycle. At the end of the morning, he launches the analysis of a batch of images from a production line. Here, AI must automatically detect any defects that may be present on small chocolate-filled cookies before packaging. Because we don’t want to risk spoiling the children’s snack time. So, to find out if the algorithm is delivering on its promises, Vincent opens a table. The famous confusion matrix. This simple table, organized into “predictions vs. reality,” will give its opinion on the reliability of Psycle’s calculations and suggest adjustments if necessary. In short, it is a concrete way of transforming a visual stream into usable data.

What is the purpose of the confusion matrix?

The confusion matrix is a fundamental tool for evaluating a supervised artificial intelligence model. It compares AI predictions with human annotations. Each image can then be classified as “correct” or “incorrect.” This allows four cases to be quantified: true positives, false positives, true negatives, and false negatives.

Let’s take a concrete example of a quality control project. Here, a matrix may reveal that out of 10,000 parts inspected, 9,991 are correctly identified. We then say that the diagnosis is 99% reliable. This also means that 9 false negatives slip through. This is a crucial risk for the manufacturer. But thanks to this diagnosis, Psycle can adjust its algorithms before deployment in order to achieve near-zero defects.

Understanding how it works: accuracy and recall

The matrix is organized into a table with as many rows and columns as there are types of defects.

True Positive (TP): the AI detects a defect, and the operator confirms it.

False Positive (FP): the AI reports a defect, but the part is good.

True Negative (TN): the AI does not identify a defect, the part is good.

False Negative (FN): the AI does not detect a real defect.

These values are used to calculate accuracy (proportion of correct alerts) and recall (ability to catch all defects).

These metrics help determine whether a model is suitable for industrial use. For example, for 100% quality control, recall is preferred. It is better to generate a few false alerts than to miss a defect.

Thresholds, confidence, and production: when the matrix alone is not enough

The confusion matrix is constructed with a confidence threshold. Depending on this threshold, the same predictions may be considered positive or rejected. A threshold of 90% guarantees few false positives, but may miss defects. Here, false negatives increase. Conversely, a low threshold (e.g., 1%) reduces false negatives but inflates false alarms.

Thus, the matrix may seem pessimistic compared to actual performance observed in production, or, conversely, overly optimistic if the thresholds are poorly chosen. The real test remains actual use in production lines, with real volumes and conditions.

The confusion matrix is much more than a table; it is an indispensable tool for evaluating and refining machine vision solutions. But it is not magic. It must be interpreted with care, as we cannot rely on a single metric or forget the impact of different thresholds. To do this, it is essential to combine accuracy, recall, and field experience, and of course to test the models in real-world conditions. At Psycle, it is this rigor that enables us to transform images into confidence and data into industrial performance.