Interpretability is essential for a deep neural network approach when applied to crucial scenarios such as medical image processing. Current gradient-based  and counterfactual image-based  interpretability approaches can only provide information of where the evidence is. We also want to know what the evidence is. In this master thesis project, we will build an inherently interpretable classification method. This classifier can learn disentangled features that are semantically meaningful and, in the future, corresponding to related clinical concepts.
This project based on a previous proposed visual feature attribution method in . This method can generate class relevant attribution map for a given input disease image. We will extend this method to generate class relevant shape variations and design an inherently interpretable classifier only using the disentangled features (class relevant intensity variation and shape variation). This method can be further extended by disentangling more semantically meaningful and causal independent features such as texture, shape, and background as the work in .
 Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
 Cher Bass, Mariana da Silva, Carole Sudre, Petru-Daniel Tudosiu, Stephen M Smith, and Emma C Robinson. Icam: Interpretable classifi- cation via disentangled representations and feature attribution mapping. arXiv preprint arXiv:2006.08287, 2020.
 Christian F Baumgartner, Lisa M Koch, Kerem Can Tezcan, Jia Xi Ang, and Ender Konukoglu. Visual feature attribution using wasserstein gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8309–8319, 2018.
 Axel Sauer and Andreas Geiger. Counterfactual generative networks. arXiv preprint arXiv:2101.06046, 2021.