[Preprint] Panoptic Quality : not always a good metric

Adrien Foucart, PhD in biomedical engineering.

Get sparse and irregular email updates by subscribing to https://adfoucart.substack.com. This website is guaranteed 100% human-written, ad-free and tracker-free.

This is the last preprint to come out of the work done during my thesis: “Why Panoptic Quality should be avoided as a metric for assessing cell nuclei segmentation and classification in digital pathology,” co-authored by my PhD advisors Christine Decaestecker and Olivier Debeir. The preprint is available on ResearchSquare, or on this website. The code to replicate the results and figures is available on GitHub.

This completes a set of studies that we’ve done, largely but not only based on the results of the MoNuSAC nuclei instance segmentation and classification challenge. In the first one (Foucart, Debeir, and Decaestecker 2022a) (PDF), we discovered some errors in the challenge’s evaluation code, which I previously talked about in this blog. In the second one (Foucart, Debeir, and Decaestecker 2022b) (PDF), we looked at the problem of using “entangled” metrics (i.e. metrics that combine multiple independent subtasks into a single score) - such as Panoptic Quality - for ranking challenges. In this one, finally, we round up our analysis and demonstrate why this metric is inadequate - and should be abandoned - for the task of nuclei instance segmentation and classification, where it is becoming “standard.”

In addition to the previously mentioned problem of combining multiple metrics (in this case, the segmentation IoU with the detection F1-Score), we show that the problem that the metric is designed to evaluate, “Panoptic Segmentation” (Kirillov et al. 2019), has some fundamental differences with the problem of “instance segmentation and classification” that we are trying to evaluate here. We also demonstrate that the IoU, used for the segmentation part, is inadequate for small objects with fuzzy, uncertain boundaries, such as nuclei.

Those three problems (entanglement, task mismatch and inadequate segmentation score) together should disqualify Panoptic Quality as a metric for ranking algorithms on the task of nuclei instance segmentation and classification.

References

Foucart, Adrien, Olivier Debeir, and Christine Decaestecker. 2022a. “Comments on MoNuSAC2020: A Multi-Organ Nuclei Segmentation and Classification Challenge.” IEEE Transactions on Medical Imaging 41 (4): 997–99. https://doi.org/10.1109/TMI.2022.3156023.
———. 2022b. “Evaluating Participating Methods in Image Analysis Challenges: Lessons from MoNuSAC 2020.” https://doi.org/10.13140/RG.2.2.11627.00801.
Kirillov, Alexander, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. 2019. “Panoptic Segmentation.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2019-June: 9396–9405. https://doi.org/10.1109/CVPR.2019.00963.